curating and managing research data for re-use confidential data management jared lyle

34
Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Upload: rosemary-mayo

Post on 30-Dec-2015

27 views

Category:

Documents


3 download

DESCRIPTION

Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle. We Are Here: Confidential Data Management. Disclosure: Risk & Harm. What do we promise when we conduct research about people? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Curating and ManagingResearch Data for Re-Use

Confidential Data ManagementJared Lyle

Page 2: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

We Are Here: Confidential Data Management

Page 3: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Disclosure: Risk & Harm

• What do we promise when we conduct research about people? – That benefits (usually to society) outweigh risk of

harm (usually to individual)– That we will protect confidentiality

• Why is confidentiality so important?– Because people may reveal information to us that

could cause them harm if revealed.– Examples: criminal activity, antisocial activity,

medical conditions...

Page 4: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

“There’s No Data Like No Data”

Data Producers “want to protect the confidentiality of survey respondents and avoid disclosure while at the same time maximizing data quality and data access.”

Confidentiality, Disclosure, and Data Access: Theory and Practical Application for Statistical Agencies (Doyle, Lane, Theeuwes, and Zayatz, Eds., 2001)

Page 5: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Two Kinds of Disclosure Risk

• The intruder knows the respondent, and is searching for them based on knowledge already in hand (parent, neighbor, spouse, etc.)

• The intruder does not know the respondent(s) and is searching against some comparison data base

Page 6: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Protecting Confidential Data

• Safe data• Safe places• Safe people• Safe outputs

Page 7: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Safe Data

• Disclosure Review• Disclosure Treatment

Page 8: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Disclosure Review

• [Documentation]• [Data]• Disclosure Review

Page 9: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Disclosure Risk Review

• General Considerations– Intended uses– Detail and sensitivity– Is sampling frame identifiable? – Outliers– Subsets and unique combinations

Page 10: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Disclosure Risk Review

• Direct Identifiers? – personal names– addresses (including ZIP codes)– telephone numbers– social security numbers– driver license numbers– patient numbers– certification numbers,

Page 11: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Disclosure Risk Review

• Indirect Identifiers? – detailed geography (i.e., state, county, or census

tract of residence)– exact date of birth– exact occupations held– exact dates of events– detailed income

Page 12: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Disclosure Risk Review

• External Linkages?– public patient/medical records– court records– police and correction records– Social Security records– Medicare records– drivers licenses– military records

Page 13: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Options for Restricting Content

• Removal• Blanking: ‘abcd’ to “ “• Recoding: ‘1234’ to ‘9999’• Bracketing and/or Collapsing: 13-29=1, 30-49=2• Top-coding/Bottom-coding: >1,000=1,000• Perterbing: noise addition (rounding, swapping)• Pseudonyms (for qualitative responses)• Restricting access

Page 14: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Exercise

Page 15: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Further Resources• Statistical Policy Working Paper 22 - Report on

Statistical Disclosure Limitation Methodology http://www.fcsm.gov/working-papers/spwp22.html

• The American Statistical Association, Committee on Privacy and Confidentiality - Methods for Reducing Disclosure Risks When Sharing Data http://www.amstat.org/committees/pc/SDL.html

• ICPSR's Confidentiality and Privacy web page

http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/confidentiality/

Page 16: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Training

• Joint Program in Survey Methodology http://projects.isr.umich.edu/jpsm/index.cfm

Page 17: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Discussion• How far do you go in mitigating disclosure

risk? What is the right balance?• Is processing for indirect identifiers overkill?• How long should data be restricted?• What are the advantages and drawbacks of

delayed dissemination?

Page 18: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Safe People, Places, Outputs…

Page 19: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

ICPSR Responsible Use StatementUsers of ICPSR data agree to a responsible use statement before downloading data from the Web site. It reads, in part:

•Any intentional identification of a RESEARCH SUBJECT (whether an individual or an organization) or unauthorized disclosure of his or her confidential information violates the PROMISE OF CONFIDENTIALITY given to the providers of the information. Therefore, users of data agree:

•To use these datasets solely for research or statistical purposes and not for investigation of specific RESEARCH SUBJECTS, except when identification is authorized in writing by ICPSR

•To make no use of the identity of any RESEARCH SUBJECT discovered inadvertently, and to advise ICPSR of any such discovery

•Agree not to redistribute data or other materials without the written agreement of ICPSR

Source: “Navigating Your IRB to Share Restricted Data” Webinar (http://bit.ly/Vi3RXd)

Page 20: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Restricted Data Use Agreement

• Sets requirements of the investigator and institution

• Defines ICPSR’s obligations• Requires signatures from investigator and legal

representative of researcher’s institution• Incorporates by reference– Information entered into the access system– IRB approval or exemption for project– Data security plan

Source: “Navigating Your IRB to Share Restricted Data” Webinar (http://bit.ly/Vi3RXd)

Page 21: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Data Use Agreement To avoid inadvertent disclosure of persons, families, households, neighborhoods, schools or health services by using the following guidelines in the release of statistics derived from the dataset.

1. In no table should all cases in any row or column be found in a single cell.2. In no case should the total for a row or column of a cross-tabulation be fewer than ten.3. In no case should a quantity figure be based on fewer than ten cases.4. In no case should a quantity figure be published if one case contributes more than 60 percent of the amount.5. In no case should data on an identifiable case, or any of the kinds of data listed in preceding items 1-3, be derivable through subtraction or other calculation from the combination of tables released.

Page 22: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Data Use Agreement

Page 23: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

What are the consequences of violating the terms of use agreement for ICPSR data? Subjects who participate in surveys and other research instruments distributed by ICPSR expect their responses to remain confidential. The data distributed by ICPSR are for statistical analysis, and they may not be used to identify specific individuals or organizations. Although ICPSR takes steps to assure that subjects cannot be identified, users are also obligated to act responsibly and not to violate the privacy of subjects intentionally or unintentionally. If ICPSR determines that the terms of use agreement has been violated, one or more of the steps will be taken which may include:

ICPSR may revoke the existing agreement, demand the return of the data in question, and deny all future access to ICPSR data.

The violation may be reported to the Research Integrity Officer, Institutional Review Board, or Human Subjects Review Committee of the user’s institution. A range of sanctions are available to institutions including revocation of tenure and termination.

If the confidentiality of human subjects has been violated, the case may be reported to the Federal Office for Human Research Protections. This may result in an investigation of the user’s institution, which can result in institution-wide sanctions including the suspension of all research grants.

A court may award the payment of damages to any individual harmed by the breach of the agreement.

Page 24: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Customizing the RUDDDA • Customize RUDDDA

with information supplied by the data providero Institution’s legal

name and addresso Official name AND

familiar reference of project or dataset

o Contact name of legal representative

o Preference for electronic or hard copy

Source: “Navigating Your IRB to Share Restricted Data” Webinar (http://bit.ly/Vi3RXd)

Page 25: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Accessing Restricted-Use Data• Use online data access request system– Link in Access Notes on study homepage

• Must provide:– Name, department, and title of investigator– Description of the proposed research– Approval or exemption from IRB– Names of research staff accessing data– CVs and signed confidentiality pledges– Information on data formats needed and data

storage technology

Source: “Navigating Your IRB to Share Restricted Data” Webinar (http://bit.ly/Vi3RXd)

Page 26: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

ICPSR Secure Data Services

Page 27: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

ICPSR Secure Data Services

Page 28: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Contracting ICPSR Restricted Data

Page 29: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

The Virtual Lab

This is the DP virtual machine Windows desktop (connected to the ICPSR data center).

This is your local computer’s desktop at your approved location.Neither content or data

can be transferred between the DP and … …your local desktop.

You cannot import programs into the DP. You cannot cut and paste or move files outside the DP.

You cannot access the Internet within the DP so using web browsers, email, and ftp is not possible.

You cannot cut & paste or move files between your computer (or anywhere else) and the DP.

x

Page 30: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

The Virtual Video Lab

Page 31: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Further Resources• ICPSR “Instructions for Preparing the Data Protection

Plan” http://www.icpsr.umich.edu/files/ICPSR/access/restricted/all.pdf

• “Introducing ICPSR’s Virtual Data Enclave (SDE)” http://techaticpsr.blogspot.nl/2012/09/introducing-icpsrs-virtual-data-enclave.html

• ICPSR Physical Data Enclave http://

www.icpsr.umich.edu/icpsrweb/content/ICPSR/access/restricted/enclave.html

Page 32: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Further Resources

• Example NAHDAP Restricted Data Use Agreement http://www.icpsr.umich.edu/files/NAHDAP/GenericRDAAgreement.pdf

• NAHDAP “Restricted-Use Data Deposit and Dissemination Procedures” http://www.icpsr.umich.edu/files/NAHDAP/NAHDAP-RestrictedDataProcedures.pdf

• “Navigating Your IRB to Share Restricted Data” Webinar http://bit.ly/Vi3RXd

Page 33: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

Discussion• What are you or your organization doing to

create ‘safe’ places and people?• What resources would you need to

successfully manage confidential data?• Are there partnership opportunities for

managing confidential data?

Page 34: Curating and Managing Research Data for Re-Use Confidential Data Management Jared Lyle

We Are Here: Confidential Data Management