accelerate responsible clinical trials data sharing while safeguarding participant privacy
DESCRIPTION
TRANSCRIPT
www.privacyanalytics.ca | [email protected]
251 Laurier Avenue, Suite 200Ottawa, Ontario, Canada K1P 5J6
WEBINAR: Accelerate Responsible Clinical Trials Data
Sharing While Safeguarding Participant Privacy
© 2014 Privacy Analytics, Inc. 2
Presenters
Chris Wright, Vice President, Marketing and
Today’s Moderator, Privacy Analytics, Inc.
Dr. Khaled El Emam, CEO and founder of
Privacy Analytics, Inc.
© 2014 Privacy Analytics, Inc. 3
Presenter
Chris Wright, Vice President, Marketing and
Today’s Moderator, Privacy Analytics, Inc.
© 2014 Privacy Analytics, Inc. 4
1. Please be sure to mute your phones
2. We’ll have a Q&A after the webinar. Please craft your questions in the dialogue box you see to your right
3. And we’re giving away copies of our Anonymizing Health Data. Please click the link below to fill out the form. We’ll send the presentation to everyone after the webinar
Some Housecleaning
http://info.privacyanalytics.ca/anonymizinghealthcaredata.html
© 2014 Privacy Analytics, Inc. 5
1. Overview of Privacy Analytics
2. Background on clinical trials transparency
3. Special considerations when anonymizing clinical trials data
4. A risk-based methodology for data anonymization
Agenda
© 2014 Privacy Analytics, Inc. 6
About Privacy Analytics
For organizations that want to safeguard and enable their data for
secondary use …
• Software that automates the de-identification
and masking of data using a risk-based
approach to anonymize personal information
• Integrated capabilities to anonymize
structured and unstructured data from
multiple sources
• Peer-reviewed methodologies and value-
added services that certify data as de-
identified using the expert statistical method
under HIPAA
© 2014 Privacy Analytics, Inc. 7
Presenter
Dr. Khaled El Emam, CEO and founder of
Privacy Analytics, Inc.
© 2014 Privacy Analytics, Inc. 8
1. Overview of Privacy Analytics
2. Background on clinical trials transparency
3. Special considerations when anonymizing clinical trials data
4. A risk-based methodology for data anonymization
Agenda
© 2014 Privacy Analytics, Inc. 9
Industry Principles
© 2014 Privacy Analytics, Inc. 10
• 30 April 2013: Final advice to the European Medicines
Agency from the clinical trial advisory group on protecting
patient confidentiality
• 24 June 2013: Publication and access to clinical trials data
(draft policy)
• 14 May 2014: Finalisation of EMA policy on publication of
and access to clinical trial data
• 12 June 2014: European Medicines Agency agrees policy on
publication of clinical trial data with more user-friendly
amendments
© 2014 Privacy Analytics, Inc. 11
“Adequately de-identified data
should be made available for wide
access”
© 2014 Privacy Analytics, Inc. 12
© 2014 Privacy Analytics, Inc. 13
What About the FDA ?
© 2014 Privacy Analytics, Inc. 14
Direct & Quasi-identifiers
Examples of direct identifiers: Name, address, telephone
number, fax number, MRN, health card number, health plan
beneficiary number, VID, license plate number, email address,
photograph, biometrics, SSN, SIN, device number, clinical trial
record number
Examples of quasi-identifiers: sex, date of birth or age,
geographic locations (such as postal codes, census geography,
information about proximity to known or unique landmarks),
language spoken at home, ethnic origin, total years of
schooling, marital status, criminal history, total income, visible
minority status, profession, event dates, number of children,
high level diagnoses and procedures
© 2014 Privacy Analytics, Inc. 15
Anonymization Landscape
© 2014 Privacy Analytics, Inc. 16
De-identification Standards
© 2014 Privacy Analytics, Inc. 17
HIPAA Safe Harbor Method
Safe Harbor Direct Identifiers and Quasi-identifiers
1. Names
2. ZIP Codes (except
first three)
3. All elements of dates
(except year)
4. Telephone numbers
5. Fax numbers
6. Electronic mail
addresses
7. Social security
numbers
8. Medical record
numbers
9. Health plan
beneficiary numbers
10.Account numbers
11.Certificate/license
numbers
12.Vehicle identifiers
and serial numbers,
including license
plate numbers
13.Device identifiers
and serial numbers
14.Web Universal
Resource Locators
(URLs)
15. Internet Protocol (IP)
address numbers
16.Biometric identifiers,
including finger and
voice prints
17.Full face
photographic images
and any comparable
images;
18. Any other unique
identifying number,
characteristic, or
code
© 2014 Privacy Analytics, Inc. 18
Safe Harbor Implementations - I
© 2014 Privacy Analytics, Inc. 19
Safe Harbor Implementations - II
© 2014 Privacy Analytics, Inc. 20
Expert Determination (Statistical) Method
• A person with appropriate knowledge of and experience
with generally accepted statistical and scientific principles
and methods for rendering information not individually
identifiable:
I. Applying such principles and methods; determines that the risk is
“very small” that the information could be used, alone or in
combination with other reasonably available information by an
anticipated recipient to identify an individual who is a subject of the
information; and
II. Documents the methods and results of the analysis that justify such
determination
© 2014 Privacy Analytics, Inc. 21
Section Takeaways
• European regulators are
moving in the direction of
requiring clinical trials data
release
• In two stages: redacted CSRs
and then data
• Industry is taking the
initiative to develop
mechanism for data sharing
already
• There is a dearth of good
standards to address privacy
concerns
Current Status
© 2014 Privacy Analytics, Inc. 22
1. Overview of Privacy Analytics
2. Background on clinical trials transparency
3. Special considerations when anonymizing clinical trials data
4. A risk-based methodology for data anonymization
Agenda
© 2014 Privacy Analytics, Inc. 23
Anonymization Approaches
• Microdata release: individual-level participant data (IPD) is
being provided to data recipients as flat files (CSV or SAS) or
database files
– Microdata can be public or available through controlled
access
• Online portal: data recipients can access IPD through a
portal and perform their analysis through the portal only
– No raw data download allowed (different control
mechanisms used)
– Online portal registration can be public or through a
qualification process
© 2014 Privacy Analytics, Inc. 24
No Zero Risk
© 2014 Privacy Analytics, Inc. 25
Anonymizing Portal Access
• Is it necessary to anonymize data if it is on a portal ?
– There are three types of attack:
• Deliberate attack by recipient – manage that risk
through contracts and audit trails
• Data breach – managed by manufacturer through
portal controls
• Inadvertent re-identification – could happen if data
recipient lives in the same geography as some the
participants
– It is inadvertent disclosure risk that needs to be
managed in a portal – anonymization is still needed
© 2014 Privacy Analytics, Inc. 26
Rare Diseases
• Clinical trials on participants with rare diseases have very
small cohorts – can that data be anonymized ?
• This depends on a number of factors:
– Whether the trial participants represent a fraction of all
patients in the relevant geographies with the disease
– Whether the rare disease is visible or not
– Whether an adversary would know if someone has a
rare disease
– Whether a portal is used or not
• It should not be taken for granted that it is not possible to
anonymize rare disease trials
© 2014 Privacy Analytics, Inc. 27
Data Quality Balance
© 2014 Privacy Analytics, Inc. 28
Replicating Results
• Disclosed data should replicate the results of any published
studies from the clinical trial
• This imposes a stringent standard on any anonymization
techniques that are used
• It would be challenging for a manufacturer if it was not
possible to replicate the results from published studies
© 2014 Privacy Analytics, Inc. 29
What to Expect When Anonymizing
• With sophisticated anonymization techniques, the
anonymized data analysis will replicate the conclusions but
not necessarily the exact values
• With basic anonymization techniques, the conclusions may
not be replicated
© 2014 Privacy Analytics, Inc. 30
Anonymizing Dates
• Can convert all dates to intervals from enrollment
• However, if the enrollment period was short then reversing
a range of possible enrollment dates may be plausible
– That risk should be measured rather than assumed
– Will depend on whether geography is also known
• Date shifting is another scheme which allows the disclosure
of precise dates and can still provides assurances about re-
identification risk
© 2014 Privacy Analytics, Inc. 31
Anonymizing Patient Locations
• Most clinical trials do not collect that information for
analysis purposes
• However, if that information is needed then geo-clustering
of ZIP/postal codes is a good technique for protecting
location information
• It maintains geospatial specificity
© 2014 Privacy Analytics, Inc. 32
Poor Selection of Pseudonyms
© 2014 Privacy Analytics, Inc. 33
Releasing Site Details
• Replacing the site name with an ID may not always be effective
• The highest recruiting sites are likely knowable from clinicaltrials.gov or equivalent registries
• A frequency analysis on the data would reveal which site was the highest recruiting (especially if country information is provided)
• The risk is from geoproxy attacks – many participants will seek care in facilities close to where they live
• For a nontrivial percentage of participants, it may be possible to predict their residence location with some accuracy
© 2014 Privacy Analytics, Inc. 34
Public IPD?
• Public IPD will be challenging to anonymize adequately and
ensure exact replication of published results
• Public IPD is still useful with that caveat – may be good for
summary statistics and the investigation of basic
relationships
• Therefore this should not be discounted
• Needs to be augmented with other data release methods
that would allow the disclosure of more detailed data
© 2014 Privacy Analytics, Inc. 35
Data Release Strategy
• Strategy 1:
– When a data request is received, the data set is
anonymized to specifically meet the data request
– Must be repeated for all data requests
• Strategy 2:
– Create one anonymized data set for each trial and
irrespective of the data request, the same complete
anonymized data set is released
– Much more cost effective, but probably provides more
data than is needed
© 2014 Privacy Analytics, Inc. 36
The Importance of Governance
• More than just technical approaches are needed
• Governance necessary for:
– Tracking data users
– Stigmatizing analytics reviews
– Audits where necessary
– Review of anonymization practices
– Monitoring legislative and regulatory environment
© 2014 Privacy Analytics, Inc. 37
Section Takeaways
Special Considerations
• Multiple approaches to
releasing IPD
• Challenges releasing high
quality public IPD
• Sophisticated
anonymization techniques
are needed to ensure data
quality
• Governance also needed (as
well as technical
approaches)
• European regulators are
moving in the direction of
requiring clinical trials data
release
• In two stages: redacted CSRs
and then data
• Industry is taking the
initiative to develop
mechanism for data sharing
already
• There is a dearth of good
standards to address privacy
concerns
Current Status
© 2014 Privacy Analytics, Inc. 38
1. Overview of Privacy Analytics
2. Background on clinical trials transparency
3. Special considerations when anonymizing clinical trials data
4. A risk-based methodology for data anonymization
Agenda
© 2014 Privacy Analytics, Inc.
Identifiability Spectrum
Little De-identification Significant De-identification
5
20
3
2
10
811
16
A range of operational precedents exist based on the situational
context of the data’s use and available mitigating controls that
protect it.
© 2014 Privacy Analytics, Inc.
Re-identification Risk: Example
DIRECT IDENTIFIERS INDIRECT IDENTIFIERS SENSITIVE VARIABLES OTHER
ID Name Telephone No. Sex Year of Birth Lab TestLab
Result
Pay
Delay
1 John Smith (412) 668-5468 M 1959 Albumin, Serum 4.8 37
2 Alan Smith (413) 822-5074 M 1969 Creatine Kinase 86 36
3 Alice Brown (416) 886-5314 F 1955 Alkaline Phosphatase 66 52
4 Hercules Green (613)763-5254 M 1959 Bilirubin <0 36
5 Alicia Freds (613) 586-6222 F 1942 BUN/Creatinine Ratio 17 82
6 Gill Stringer (954) 699-5423 F 1975 Calcium, Serum 9.2 34
7 Marie Kirkpatrick (416) 786-6212 F 1966 Free Thyroxine Index 2.7 23
8 Leslie Hall (905) 668-6581 F 1987 Globulin, Total 3.5 9
9 Douglas Henry (416) 423-5965 M 1959 B-type Natriuretic peptide 134 38
10 Fred Thompson (416) 421-7719 M 1967 Creatine Kinase 80 21
3Two quasi-identifiers
matching in three
cells within a dataset
3Two quasi-identifiers
matching in three
cells within a dataset
© 2014 Privacy Analytics, Inc. 41
Little De-identification Significant De-identification
5
20
3
2
10
811
16
Spectrum of Identifiability
Leading research organizations apply these precedents to data release
for secondary purposes. We’ve embedded these precedents into our
software, PARAT CORE.
© 2014 Privacy Analytics, Inc.
Managing Re-identification Risk
© 2014 Privacy Analytics, Inc.
Complexity Stifles Time to Insight
“… removing patient identifiers and formatting all data sets [ ..] can take up to six months.”
Roche Description of Their Clinical Trials Data Sharing Process for Research Requests
… and the volume of clinical trials data releases will continue to grow rapidly
© 2014 Privacy Analytics, Inc. 44
Automating Anonymization
© 2014 Privacy Analytics, Inc.
Reduce Complexity: Accelerate Data Releases
A scalable set of packaged capabilities that enables the release of
anonymized data for analysis quickly, securely and cost-effectively:
Automate
Audit
Analyze
© 2014 Privacy Analytics, Inc. 46
Creating Expertise to Govern Data Releases
• Course on risk-based anonymization (2-day): on-site or
remote
• Exam on body of knowledge and work through case studies
• Maintaining knowledge over time through continuous
education
• Coaching on two data sets
• Requires automated support to operationalize
© 2014 Privacy Analytics, Inc.
Challenges:
• Significant size of the data set. Held more than
five years of clinical, prescription, laboratory,
scheduling and billing data of patients
• Numerous release requests from more than
2664 clinics and 5850 physicians
Post-marketing Surveillance
Analytic Outcomes:
De-identified data to analyze:
• Post-marketing surveillance of adverse events
• Public health surveillance
• Prescription pattern analysis
• Health services analysis
� Wanted to anonymize
data on 535,595
patients from general
practices
� Longitudinal data
needed to be used for
on-going and on-
demand analytics
47
© 2014 Privacy Analytics, Inc. 48
GI Protocol
• Two arm protocol; GI events after taking NSAIDs
with and without a PPI
© 2014 Privacy Analytics, Inc. 49
Chlamydia Protocol
• Females 14-24 years old inclusive tested and tested positive
for Chlamydia in the previous 12 months
© 2014 Privacy Analytics, Inc. 50
Section Takeaways
� A risk-based methodology
can be used to release high
quality IPD
� The process can be
automated to accelerate
data release, reduce costs,
ensure consistency, and
provide a defensible result
� Can develop internal
expertise or outsource the
whole data release process
Methodology &
SoftwareSpecial Considerations
• Multiple approaches to
releasing IPD
• Challenges releasing high
quality public IPD
• Sophisticated
anonymization techniques
are needed to ensure data
quality
• Governance also needed (as
well as technical
approaches)
• European regulators are
moving in the direction of
requiring clinical trials data
release
• In two stages: redacted CSRs
and then data
• Industry is taking the
initiative to develop
mechanism for data sharing
already
• There is a dearth of good
standards to address privacy
concerns
Current Status
© 2014 Privacy Analytics, Inc.
Balancing Privacy with Data Utility
Data Quality1 Analytic Granularity2 Depth of Insight3
Ensuring de-identified
data has analytic
usefulness by minimizing
the amount of distortion
but still ensure that re-
identification risk is very
small
Allowing users to
configure the extent of
de-identification to match
the characteristics of the
analysis that is
anticipated
Enabling analysis of the
total patient health
experience, to compile a
complete picture of this
experience from multiple
data sources and types
The Analytic Benefits of our Approach
© 2014 Privacy Analytics, Inc. 52
Also, contact me to learn more at [email protected].
We can set up a personalized demo or have a discussion on your
current anonymization needs. Just drop me a line.
We’re giving away copies of our Anonymizing Health Data: http://info.privacyanalytics.ca/anonymizinghealthcaredata.html
Anonymization Survey:
• http://surveys.ronin.com/wix/p1834
200753.aspx?src=1
July 14-16, Health Analytics Expo and Symposium,
Chicago, IL.
Final Thoughts
© 2014 Privacy Analytics, Inc. 53
Question and Answer
??
?