blindfolded record linkage
DESCRIPTION
Blindfolded Record Linkage. Presented by Gautam Sanka. Susan C. Weber, Henry Lowe, Amar Das, Todd Ferris. Introduction and Objectives. Challenges Patient Privacy vs. Building Cross-Site records Solutions Mandate that identifiers be disclosed Privacy officers find this unacceptable - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/1.jpg)
Blindfolded Record Linkage
Presented by Gautam Sanka
Susan C. Weber, Henry Lowe, Amar Das, Todd Ferris
![Page 2: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/2.jpg)
Introduction and Objectives Challenges
Patient Privacy vs. Building Cross-Site records Solutions
Mandate that identifiers be disclosed Privacy officers find this unacceptable
Keep only de-identified information in the registry but share an algorithm to Third Parties for generating an anonymous identifier
![Page 3: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/3.jpg)
De-identification Explained This anonymous identifier will be
created in such a way that: Probability of same identifier generated at
two different sites is high for the same person
And low for different people
![Page 4: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/4.jpg)
What can be used? Using SSN – Bad Idea Using names and DOB may seem best
but: Nicknames at one site and full name at
another Misspellings Different Titles (Mr. Ms. Mrs.)
![Page 5: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/5.jpg)
Goal of Project Breast Cancer Patients at PAMF (Palo Alto Medical
Foundation) and Stanford University Medical Center Merge the Data with de-identification under HIPAA
and IRB approval
![Page 6: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/6.jpg)
Interesting Approaches Bigrams
For the names Ann and Anne [AN, NN] [AN, NN, NE] The Dice Co-efficient is 2 * (2/5) = 4/5
Bloom Filter Both were not implemented due to the
complexities
![Page 7: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/7.jpg)
A single SHA-1 string was constructed based on Gender DOB Zip Three letter Prefix of last name
In their case, only first two letters of patients’ first and last names were used
![Page 8: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/8.jpg)
Composite Identifier Felt that a combination of DOB and the first
two letters of names would uniquely identify Most applicable when:
Compliance restrictions preclude the exchange of actual identifiers
Total number of comparisons is less than 10^8
Names and DOB are easily available DOB has a low error rate
![Page 9: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/9.jpg)
Methods Measured Rate of false positives in data
Dropped name prefixes Dropped DOB stating 1/1/1900 and
1/1/1901 Performed a self-join on three sets of
1.5M rows, 0.5M rows and 10,000 rows
![Page 10: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/10.jpg)
Specificity based on Data Set Size
![Page 11: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/11.jpg)
Measure False Negative Both sites exchanged cryptographic
hashes based on SSNs The number of matches found by
matching SSNs and not composite identifiers became the Lower Bound for False Negatives
Removal of all False Positives based on real identifiers
![Page 12: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/12.jpg)
Sensitivity: Specificity:
![Page 13: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/13.jpg)
PAMF8,166
Stanford
10,939
2087 Common Patients
![Page 14: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/14.jpg)
Total found by
Composite Identifier
2028
Exact Matches in Names +
DOB1824
Confirmed by Full
Identifiers Later204
“This was a very interesting result in that it provided us with a measure of how much better our approach is compared to using full names rather than two-letter prefixes.”
![Page 15: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/15.jpg)
Reasons for False Negatives in Composite Identification
Found by SSN and later confirmed manually
![Page 16: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/16.jpg)
Simply Using SSN SSNs found only 1806 out of 2028 Rate of false negatives is 10% higher
than a composite identifier Reasons
172 of the 222 with false negatives had a missing SSN
![Page 17: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/17.jpg)
What about the other 50?
In conclusion, 57 False Positives for SSN matches3 False Positives for Composite Identifier20 times worse
![Page 18: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/18.jpg)
![Page 19: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/19.jpg)
Which identifiers are best?
![Page 20: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/20.jpg)
![Page 21: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/21.jpg)
When should we use this tool? Most useful where privacy policies preclude
the full exchange of the identifiers required by more sophisticated and sensitive linkage algorithms
For Data Sets of High quality, this approach (in comparison to complex algorithms) Easy to explain Adheres to minimum rules set by HIPAA Faster and less cumbersome
![Page 22: Blindfolded Record Linkage](https://reader035.vdocuments.net/reader035/viewer/2022062222/56815ee7550346895dcd9293/html5/thumbnails/22.jpg)
Suggestions