health information that does not identify an individual and with respect to which there is no...

19
Practical De-identification Methods Khaled El Emam, Privacy Analytics Inc.

Upload: winfred-miller

Post on 29-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Practical De-identification MethodsKhaled El Emam, Privacy Analytics Inc.

Health information that does not identify an individual and with respect to which there is

no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable

health information.

Definition of De-identified Data

Direct Identifiers• Fields that would uniquely identify individuals

in a database• Name, address, telephone number, fax

number, MRN, health card number, health plan beneficiary number, license plate number, email address, photograph, biometrics, SSN, SIN, implanted device number

Quasi-Identifiers• sex, date of birth or age, geographic locations (such

as postal codes, census geography, information about proximity to known or unique landmarks), language spoken at home, ethnic origin, aboriginal identity, total years of schooling, marital status, criminal history, total income, visible minority status, activity difficulties/reductions, profession, event dates (such as admission, discharge, procedure, death, specimen collection, visit/encounter), codes (such as diagnosis codes, procedure codes, and adverse event codes), country of birth, birth weight, and birth plurality

De-identification Standards• The HIPAA Privacy Rule specifies two de-

identification standards (45 CFR 164.514):– Safe Harbor– Statistical method (also known as the expert

statistician method)

Safe Harbor Direct Identifiers and Quasi-identifiers

1. Names2. ZIP Codes (except

first three)3. All elements of dates

(except year)4. Telephone numbers5. Fax numbers6. Electronic mail

addresses7. Social security

numbers8. Medical record

numbers9. Health plan

beneficiary numbers10.Account numbers11. Certificate/license

numbers

HIPAA Safe Harbor

12.Vehicle identifiers and serial numbers, including license plate numbers

13.Device identifiers and serial numbers

14.Web Universal Resource Locators (URLs)

15. Internet Protocol (IP) address numbers

16.Biometric identifiers, including finger and voice prints

17.Full face photographic images and any comparable images;

18. Any other unique identifying number, characteristic, or code

Statistical Method (HIPAA)• A person with appropriate knowledge of and

experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable:I. Applying such principles and methods, determines that

the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information; and

II. Documents the methods and results of the analysis that justify such determination

Re-identification Risk Spectrum

Managing Re-identification Risk

Example – CA Hospital Discharges• Context: data release to a data analytics company

who will sign a data use agreement, good practices for managing sensitive health information

• There were ~2.1m patients who had ~3m visits• Risk threshold = 0.2; use average risk across all

patients• Variables:

– Year of birth– Gender– Year of admission– Days since last visit– Length of stay

Risk Level

Hierarchy

De-identified Data

www.privacyanalytics.ca

More Information

@PrivacyAnalytic