Patient Health Record Linkage Methods

 
Patient Health Record
Linkage Methods
 
Toan C. Ong, PhD
 
Agenda
 
 
Definition and objectives of record linkage
Linkage process
Linkage variables
Linkage methods
 
 
Definition
 
 
 
Record linkage: a process to determine if two or
more records belong to the same individual
 
Objectives
 
Data de-duplication versus data enrichment using record-linkage
CURL Honest Broker
 
 
Site B
 
 
 
Site A
 
 
 
(U of Colorado)
 
 
 
 
CU Record Linkage (CURL) Data Flow
 
Site A
Source data
 
Color code:
Hashed PHI
Clear-text PHI
CURL Key Master
 
 
 
Linkage
Configurations
 
Hashed
data
Clean
and
profile
data
Load,
block and
profile
hashed
data
Review
Hashed
Data
Generate
Key
Link
data
 
Linked data
 
Data transfer
 
Site B
Source data
 
Hashed
data
Clean
and
profile
data
Review
Hashed
Data
 
Hash key
 
NWI
 
NWI = Network identifier
Hash
Data
Hash
Data
www.curl.center
Linkage variables
Linkability measures
 
Missing data ratio
Unique data ratio
Group size
Information amount
Shannon entropy
Maximum theoretical entropy
Percent of Maximum theoretical entropy
 
Linkability measures
 
R
e
s
u
l
t
s
 
o
f
 
k
e
y
 
q
u
a
l
i
t
y
 
a
n
d
 
l
i
n
k
a
b
i
l
i
t
y
 
m
e
a
s
u
r
e
s
 
u
s
i
n
g
 
t
h
e
 
3
0
0
K
 
c
o
r
r
u
p
t
e
d
 
s
y
n
t
h
e
t
i
c
 
d
a
t
a
 
s
e
t
Data normalization
 
First and last names: non-alphabetical
characters, including spaces, were removed.
Middle name: shortened to middle initial.
Gender: all genders were converted to one of
three possible values: F, M, or NULL.
Date of birth: converted to YYYYMMDD format.
SSN4: Extract last 4 digits from full SSN.
 
Data hashing
 
Data encryption/hashing
Advanced encryption standard (AES) methods
One-way hashing methods
Bloom filters, locality-sensitivity hashing
Dice coefficient =
Deterministic linkage methods
 
Probabilistic linkage methods
 
 
 
w
i
 : Weight of linkage variable i – Value Range: 0-1
d : Edit distance – Value Range: 0-100
 
Expectation Maximization (EM) for optimal weights
Edit distance
Linkage classification
 
Two classes
match_score ≥ threshold
 Matches
match_score < threshold 
 Non-Matches
 
Two classes
match_score ≥ threshold
 Matches
match_score < threshold 
 Non-Matches
Validation method
 
Sample: Potential linkages
Adjudicators:
Two reviewers
One tie-breaker
Validation result
True match: >=2 yeses
Incorrect match: >=2 nos OR 1 no + 1 maybe
Undecided: >= 2 maybes
Linkage methods
 
Hybrid linkage methods
Summary
 
CURL = CU Record Linkage
Linkage data
Linkability measures
Data normalization
Data hashing
Linkage method
Deterministic
Probabilistic
Hybrid
 
Funding
 
D2V
CDIFund
PCORI
 
Acknowledgement
 
Michael Kahn
Lisa Schilling
Chan Voong
Bethany Kwan
Jenna Reno
Chris Uhrich
 
Gali Baler
Tessa Crume
Lindsey Duca
Jessica Toth
Hossein Esteri
Ibrahim Lazrig
 
Andrew Hill
Will Carter
Rachel Zucker
Kimberly Muller
Doreen Molk
James Roberts
 
Thank You!
Slide Note
Embed
Share

Explore the methods and processes involved in linking patient health records to ensure data accuracy and integrity. Learn about objectives, data de-duplication, encryption, data normalization, and linkage variables. Discover CU Record Linkage (CURL) data flow and key quality measures. Dive into data hashing techniques for security and confidentiality.

  • Patient health
  • Record linkage
  • Data de-duplication
  • Encryption methods
  • Data normalization

Uploaded on Jul 15, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Patient Health Record Linkage Methods Toan C. Ong, PhD

  2. Agenda Definition and objectives of record linkage Linkage process Linkage variables Linkage methods

  3. Definition Record linkage: a process to determine if two or more records belong to the same individual

  4. Objectives Data de-duplication versus data enrichment using record-linkage

  5. CU Record Linkage (CURL) Data Flow Hashed PHI Clear-text PHI Color code: (U of Colorado) Linkage Configurations CURL Key Master Generate Key Hash key Site A Clean and profile data Review Hashed Data Hash Data Hashed data Site A Source data Site B Clean and profile data Data transfer Review Hashed Data Hash Data Hashed data Site B Source data NWI CURL Honest Broker Load, block and profile hashed data Link data Linked data NWI = Network identifier

  6. www.curl.center

  7. Linkage variables

  8. Linkability measures Missing data ratio Unique data ratio Group size Information amount Shannon entropy Maximum theoretical entropy Percent of Maximum theoretical entropy

  9. Linkability measures Results of key quality and linkability measures using the 300K corrupted synthetic data set

  10. Data normalization First and last names: non-alphabetical characters, including spaces, were removed. Middle name: shortened to middle initial. Gender: all genders were converted to one of three possible values: F, M, or NULL. Date of birth: converted to YYYYMMDD format. SSN4: Extract last 4 digits from full SSN.

  11. Data hashing Data encryption/hashing Advanced encryption standard (AES) methods One-way hashing methods Bloom filters, locality-sensitivity hashing Dice coefficient =

  12. Deterministic linkage methods Exact match Michael = Michael Micheal Michael Fuzzy match Micheal Michael M240 = M240 Key match - Key = 3 FN initial + birth year Mic2010 = Mic2010

  13. Probabilistic linkage methods wi: Weight of linkage variable i Value Range: 0-1 d : Edit distance Value Range: 0-100 Expectation Maximization (EM) for optimal weights Edit distance

  14. Linkage classification Two classes match_score threshold Matches match_score < threshold Non-Matches Two classes match_score threshold Matches match_score < threshold Non-Matches

  15. Validation method Sample: Potential linkages Adjudicators: Two reviewers One tie-breaker Validation result True match: >=2 yeses Incorrect match: >=2 nos OR 1 no + 1 maybe Undecided: >= 2 maybes

  16. Linkage methods

  17. Hybrid linkage methods

  18. Summary CURL = CU Record Linkage Linkage data Linkability measures Data normalization Data hashing Linkage method Deterministic Probabilistic Hybrid

  19. Funding D2V CDIFund PCORI

  20. Acknowledgement Michael Kahn Lisa Schilling Chan Voong Bethany Kwan Jenna Reno Chris Uhrich Gali Baler Tessa Crume Lindsey Duca Jessica Toth Hossein Esteri Ibrahim Lazrig Andrew Hill Will Carter Rachel Zucker Kimberly Muller Doreen Molk James Roberts

  21. Thank You!

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#