practical de-identification methods

19
Practical De-identification Methods Khaled El Emam, Privacy Analytics Inc.

Upload: mauli

Post on 22-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Practical De-identification Methods. Khaled El Emam, Privacy Analytics Inc. Definition of De-identified Data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Practical De-identification Methods

Practical De-identification MethodsKhaled El Emam, Privacy Analytics Inc.

Page 2: Practical De-identification Methods

Health information that does not identify an individual and with respect to which there is

no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable

health information.

Definition of De-identified Data

Page 3: Practical De-identification Methods

Direct Identifiers• Fields that would uniquely identify individuals

in a database• Name, address, telephone number, fax

number, MRN, health card number, health plan beneficiary number, license plate number, email address, photograph, biometrics, SSN, SIN, implanted device number

Page 4: Practical De-identification Methods

Quasi-Identifiers• sex, date of birth or age, geographic locations (such

as postal codes, census geography, information about proximity to known or unique landmarks), language spoken at home, ethnic origin, aboriginal identity, total years of schooling, marital status, criminal history, total income, visible minority status, activity difficulties/reductions, profession, event dates (such as admission, discharge, procedure, death, specimen collection, visit/encounter), codes (such as diagnosis codes, procedure codes, and adverse event codes), country of birth, birth weight, and birth plurality

Page 5: Practical De-identification Methods

De-identification Standards• The HIPAA Privacy Rule specifies two de-

identification standards (45 CFR 164.514):– Safe Harbor– Statistical method (also known as the expert

statistician method)

Page 6: Practical De-identification Methods

Safe Harbor Direct Identifiers and Quasi-identifiers

1. Names2. ZIP Codes (except

first three)3. All elements of dates

(except year)4. Telephone numbers5. Fax numbers6. Electronic mail

addresses7. Social security

numbers8. Medical record

numbers9. Health plan

beneficiary numbers10.Account numbers11. Certificate/license

numbers

HIPAA Safe Harbor

12.Vehicle identifiers and serial numbers, including license plate numbers

13.Device identifiers and serial numbers

14.Web Universal Resource Locators (URLs)

15. Internet Protocol (IP) address numbers

16.Biometric identifiers, including finger and voice prints

17.Full face photographic images and any comparable images;

18. Any other unique identifying number, characteristic, or code

Page 7: Practical De-identification Methods
Page 8: Practical De-identification Methods
Page 9: Practical De-identification Methods
Page 10: Practical De-identification Methods
Page 11: Practical De-identification Methods
Page 12: Practical De-identification Methods

Statistical Method (HIPAA)• A person with appropriate knowledge of and

experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable:I. Applying such principles and methods, determines that

the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information; and

II. Documents the methods and results of the analysis that justify such determination

Page 13: Practical De-identification Methods

Re-identification Risk Spectrum

Page 14: Practical De-identification Methods

Managing Re-identification Risk

Page 15: Practical De-identification Methods

Example – CA Hospital Discharges• Context: data release to a data analytics company

who will sign a data use agreement, good practices for managing sensitive health information

• There were ~2.1m patients who had ~3m visits• Risk threshold = 0.2; use average risk across all

patients• Variables:

– Year of birth– Gender– Year of admission– Days since last visit– Length of stay

Page 16: Practical De-identification Methods

Risk Level

Page 17: Practical De-identification Methods

Hierarchy

Page 18: Practical De-identification Methods

De-identified Data

Page 19: Practical De-identification Methods

www.privacyanalytics.ca

More Information

@PrivacyAnalytic