exploring new methods for protecting and distributing confidential research data

40
Exploring New Methods for Protecting and Distributing Confidential Research Data Bryan Beecher Felicia LeClere ICPSR/University of Michigan

Upload: bryan-beecher

Post on 12-May-2015

783 views

Category:

Education


0 download

DESCRIPTION

I gave this presentation at the Fall 2009 CNI Membership meeting.

TRANSCRIPT

Page 1: Exploring New Methods for Protecting and Distributing Confidential Research Data

Exploring New Methods for Protecting and Distributing Confidential Research Data

Bryan BeecherFelicia LeClereICPSR/University of Michigan

Page 2: Exploring New Methods for Protecting and Distributing Confidential Research Data

Today’s Talk

• What’s ICPSR?• How do organizations distribute

confidential research data today?• What are the problems?• What can we improve?

Page 3: Exploring New Methods for Protecting and Distributing Confidential Research Data

What’s ICPSR?

• Inter-university Consortium for Political and Social Research– JSTOR for social

science data

• Serving billions since 1962

Page 4: Exploring New Methods for Protecting and Distributing Confidential Research Data

Who does ICPSR serve?

• Research universities– Discover and download data

• Teaching universities and colleges– On-line analysis

• Federal agencies– Data management, preservation, and

dissemination

Page 5: Exploring New Methods for Protecting and Distributing Confidential Research Data

Distributing data

Page 6: Exploring New Methods for Protecting and Distributing Confidential Research Data

Distributing data

• Most of our content is public-use– Anonymized public opinion– Aggregate government data

• Little risk of disclosure• But what about the good stuff?

Page 7: Exploring New Methods for Protecting and Distributing Confidential Research Data

Distributing sensitive data

Page 8: Exploring New Methods for Protecting and Distributing Confidential Research Data

Distributing sensitive data

• Higher risk of breech of confidentiality– Variables that give geographic

information that might be combined with other data sources to identify a respondent

• Requires special handling

Page 9: Exploring New Methods for Protecting and Distributing Confidential Research Data

Distributing sensitive data

• Researcher agrees to protect the data and identities

• Delivered securely

• Harsh penalty deterrent

http://www.flickr.com/photos/lwr/521394398

Page 10: Exploring New Methods for Protecting and Distributing Confidential Research Data

National Longitudinal Study of Adolescent Health

• Add Health– Highly used and cited study

• Very frank questions– Kids in 7th through 12th grade

• Carolina Population Center• Gold standard in data protection

Page 11: Exploring New Methods for Protecting and Distributing Confidential Research Data

Traditional Approach

http://www.flickr.com/photos/videolux/2389320345/

http://www.flickr.com/photos/curiousexpeditions/3767246490/

Page 12: Exploring New Methods for Protecting and Distributing Confidential Research Data

Traditional Approach

Confidential research

data

Apply for access

Write security plan

Repeat

Page 13: Exploring New Methods for Protecting and Distributing Confidential Research Data

Can we improve upon it?

• Paperwork– How do we speed

the application process?

• Security– How do we ensure

the data are going to a good home?

Page 14: Exploring New Methods for Protecting and Distributing Confidential Research Data

Paperwork

• Web portal– Research plan– IRB approval– CVs– Confidentiality

agreements

Page 15: Exploring New Methods for Protecting and Distributing Confidential Research Data

Paperwork

• Web portal– Behavioral

questionnaire– Electronic copy of

contract (HTML, PDF)

– Database back-end to drive workflow systems

Page 16: Exploring New Methods for Protecting and Distributing Confidential Research Data

Restricted data Contracting System

• Integrated with ICPSR’s existing Web download mechanism

• Collects information that would ordinarily be provided through paper

• “Tickler” system to send reminders, nag about missing items

Page 17: Exploring New Methods for Protecting and Distributing Confidential Research Data

Security

• Current system relies on…– The data provider to maintain

security templates– The researcher to write an IT security

plan– The data provider to read and

understand the plan– The researcher to execute the plan

Page 18: Exploring New Methods for Protecting and Distributing Confidential Research Data

ResearcherWorkstationICPSR

Current access model

Page 19: Exploring New Methods for Protecting and Distributing Confidential Research Data

Secure Area

ResearcherWorkstationICPSR

A new access model?

Page 20: Exploring New Methods for Protecting and Distributing Confidential Research Data

Secure area = the cloud?

• Cloud-based access– Convenient– Scalable– Economical– Perfect?

http://www.flickr.com/photos/docbudie/2240764187/

Page 21: Exploring New Methods for Protecting and Distributing Confidential Research Data

What could go wrong?

Page 22: Exploring New Methods for Protecting and Distributing Confidential Research Data

Almost everything

• Is the cloud reliable?• Will the data be safe?• We are building an analytic

environment for a researcher, how will we know what to provide?

• Will this perform well for the researcher?

• This is the main story…

Page 23: Exploring New Methods for Protecting and Distributing Confidential Research Data

Cloud reliability

• Already using the cloud for DR purposes since January 2009

• The Merit Network Operations Center monitors all of our stuff

• Ping, http GET every minute 24 x 7• Results?

Page 24: Exploring New Methods for Protecting and Distributing Confidential Research Data

Local v. cloud – CY 2009

Page 25: Exploring New Methods for Protecting and Distributing Confidential Research Data

Conclusion

• Cloud has been more reliable than local environment

• If local power was better, cloud would still be better, but only a little better

• Certainly seems to be good enough

Page 26: Exploring New Methods for Protecting and Distributing Confidential Research Data

Cloud security

• Absolute security?– Who cares?

• More secure than the typical WinTel desktop of a social science researcher?– That’s the goal

http://www.flickr.com/photos/amagill/235453953

Page 27: Exploring New Methods for Protecting and Distributing Confidential Research Data

Current practice

• Data archive maintains per-platform guidelines on IT security

• Researcher downloads a template and writes his/her own IT security plan

• Data provider reviews plan; approves or iterates until approved or rejected

Page 28: Exploring New Methods for Protecting and Distributing Confidential Research Data

Sample items

– I secured the computer on which the Add Health data resides in a locked room, or secured the computer to a table with a lock and cable (locking the case so the battery cannot be removed).

– I turned off all unneeded services and disabled unneeded network protocols.

Page 29: Exploring New Methods for Protecting and Distributing Confidential Research Data

Brutal facts

• Data providers are not IT experts• Researchers are not experts in IT security• Even if the system is secure on Day

One, what assurance is there that it continues to be secure?

http://www.flickr.com/photos/42dreams/1878611309

Page 30: Exploring New Methods for Protecting and Distributing Confidential Research Data

Our approach to security

• Leverage tools from the cloud provider (AWS access control lists)

• Leverage tools from UMich (regular Retina and Nessus scans)

• Engage a white hat hacker to probe and evaluate the system

Page 31: Exploring New Methods for Protecting and Distributing Confidential Research Data

Conclusion

• Expecting researchers to build and maintain secure IT environments is not reasonable

• We think we can build something at least as secure in the cloud

• We’ll evaluate our environment using outside evaluators

Page 32: Exploring New Methods for Protecting and Distributing Confidential Research Data

What to deploy?

• Model means we need to distribute a working analytic environment, not just the data

• Also gives the researcher the opportunity to limit access to only a subset of contractees

Page 33: Exploring New Methods for Protecting and Distributing Confidential Research Data

May I Take Your Order?

• Operating system?

• Analysis software?

• Who’s allowed to use the system?

• Anything else?

http://www.flickr.com/photos/stephenpougas/2267503544

Page 34: Exploring New Methods for Protecting and Distributing Confidential Research Data

The ACI Chooser

• Analytic Cloud Instance– Cumulus

• The ACI Chooser• Takes your order• Brings your ACI to your table (in the

cloud)

Page 35: Exploring New Methods for Protecting and Distributing Confidential Research Data

Conclusion

• We’re building this now• Issues to resolve

– How do we get passwords to people?– Remote access mechanism?

• Citrix? Terminal Services?

– Should we encrypt the data?

Page 36: Exploring New Methods for Protecting and Distributing Confidential Research Data

Performance

• Will a cloud-based analysis system meet the expectations of a researcher?

• Will one size fit all?

Page 37: Exploring New Methods for Protecting and Distributing Confidential Research Data

Amazon EC2

• Regular– S (1 CPU, 2GB, $0.12)– L (4 CPU, 7GB, $0.48)– XL (8 CPU, 15GB, $0.96)

• High memory– XXL (13 CPU, 34GB, $1.44)– XXXXL (26 CPU, 68GB, $2.88)

• High CPU– M (5 CPU, 2GB, $0.29)– XL (20 CPU, 7GB, $1.16)

Page 38: Exploring New Methods for Protecting and Distributing Confidential Research Data

Strategy

• Balance cost and performance• Start small, but give opportunity to

grow– Easy to move an image from one

instance size to another

• Measure performance via researcher’s experience

Page 39: Exploring New Methods for Protecting and Distributing Confidential Research Data

Conclusion

• Partners– Panel Study of Income Dynamics

(PSID)– Los Angeles Family and Neighborhood

Study (LA FANS)

• Start small; re-launch larger• Ask how well it works

Page 40: Exploring New Methods for Protecting and Distributing Confidential Research Data

Thanks and Final Thoughts

• Could preserve machine image + data + software + “program” for replication purposes

• enclavecloud.blogspot.com charts our adventures

• Cloud-related work sponsored by a recent NIH Challenge Grant