privacy in the digital age, helen cullyer

14
OPEN DATA AND PRIVACY IN THE HUMANITIES (AND ARTS AND SOCIAL SCIENCES) Helen Cullyer, Program Officer Scholarly Communications The Andrew W. Mellon Foundation

Upload: charleston-conference

Post on 24-Jun-2015

68 views

Category:

Education


2 download

DESCRIPTION

2014 Charleston Conference Friday, Nov 7, 11:30 AM

TRANSCRIPT

Page 1: Privacy in the Digital Age, Helen Cullyer

OPEN DATA AND PRIVACY IN THE HUMANITIES(AND ARTS AND SOCIAL SCIENCES)

Helen Cullyer, Program Officer

Scholarly Communications

The Andrew W. Mellon Foundation

Page 2: Privacy in the Digital Age, Helen Cullyer

OPEN DATA

Available for universal reuse and redistribution (though some open licenses do prevent commercial use)

Promotes transparency, reproducibility; advances research; and makes results of scholarly inquiry available to the public, policymakers in addition to other scholars

Page 3: Privacy in the Digital Age, Helen Cullyer

DATA IN THE HUMANITIES, ARTS, AND HUMANISTIC SOCIAL SCIENCES

Digitized and born-digital primary source collections (text, image, multimedia) and their associated metadata

Transcriptions and annotations

Survey or other data collected by researchers

Data that results from computational analysis of digital collections and raw datasets

Page 4: Privacy in the Digital Age, Helen Cullyer

OPEN KNOWLEDGE FOUNDATIONON OPEN DATA AND PRIVACY:

Our Data is data with no personal element, and a clear sense of shared ownership. Some examples would be where the buses run in my city, what the government decides to spend my tax money on, how the national census is structured and the aggregate data resulting from it. At the Open Knowledge Foundation, our default position is that our data should be open data – it is a shared asset we can and should all benefit from.

My Data is information about me personally, where I am identified in some way, regardless of who collects it. It should not be made open or public by others without my direct permission – but it should be “open” to me (I should have access to data about me in a useable form, and the right to share it myself, however I wish if I choose to do so).

Transformed Data is information about individuals, where some effort has been made to anonymise or aggregate the data to remove individually identified elements.

http://personal-data.okfn.org/2013/12/13/open-data-privacy/

Page 5: Privacy in the Digital Age, Helen Cullyer

IS THERE REALLY A PROBLEM?

Can’t you just anonymize and aggregate data and make those data openly accessible?

• Anonymized data are not necessarily de-identified data

• Aggregated data are not always granular enough in the humanities and humanistic social sciences

Page 6: Privacy in the Digital Age, Helen Cullyer

SOME QUESTIONS

• What types of transformations can be used to de-identify but retain the granularity and usefulness of data so that they can be made open?

• Does the push for open data distract us from the need to craft careful and multi-level access policies for certain data types?

• Is there a danger that in trying to make general rules and policies, we will ignore the differences among particular cases and: either (a) develop requirements that are too lax and endanger privacy; or (b) place unnecessary restrictions on data that could and should be open?

Page 7: Privacy in the Digital Age, Helen Cullyer

EXAMPLE #1:RECORDS OF THE CENTRAL LUNATIC ASYLUM FOR THE COLORED

INSANE(King Davis, University of Texas at Austin)

• Digitized organizational and medical records dating from 1868 through 1967

• Of interest to scholars in a number of fields (history, history of science and medicine, African American Studies) and to the families of former patients

• Privacy challenges: HIPAA regulations; state law; IRB regulations; and a host of ethical concerns

• What sort of access to different types of data will be given to different groups? How will access mechanisms for digital data be implemented?

Page 8: Privacy in the Digital Age, Helen Cullyer

EXAMPLE #2:SUBSCRIBERS TO THE NEW YORK PHILHARMONIC(Shamus Khan, Columbia; and Barbara Haws, NY

Philharmonic)

• Digitized and born-digital subscriber records (1842 to the present) that contain names and addresses

• Columbia researchers transcribing records and augmenting them with other publicly available data (egs. census data, information from New York Social Register)

• All names post-1953 are redacted in Columbia data

• What to share openly, and how, of the post-1953 data?

• NY Phil working on privacy and access policies for post-1953 archival records that they hold

Page 9: Privacy in the Digital Age, Helen Cullyer

EXAMPLE #3:EXCAVATING EPORTFOLIOS:

DIGGING INTO A DECADE OF STUDENT-DRIVEN DATAAmanda Licastro, CUNY Graduate Center, @amandalicastro

http://digitocentrism.commons.gc.cuny.edu/

• Data includes large sample of student writing (from publicly available WordPress eportfolio sites); anonymous survey responses; interviews

• All private sites and private posts stripped out of eportfolios. No grading information or other official student records included within data

• Results of computational analysis will be published

• Raw data must be encrypted according to IRB requirements

Page 10: Privacy in the Digital Age, Helen Cullyer

SOME NEEDS:

• Technical help and mentorship regarding data management

• “…examples of data management plans and workflow sequences for large data projects that could serve as instructive models for humanists like myself. And this work would be done best in a collaborative maker space where scholars from across the disciplines could have designated sessions where we could trouble-shoot together.”

Amanda Licastro

Page 11: Privacy in the Digital Age, Helen Cullyer

ARE IRB REQUIREMENTS TOO STRICT IN MANY CASES?See summary of recent National Academies report:

“To first determine if research activities fall within the scope of the Common Rule, the report recommends that HHS define “human subjects research” as a systematic investigation designed to develop or contribute to generalizable knowledge that involves direct interaction or intervention with a living individual or that involves obtaining identifiable private information about an individual.  Only research that fits this definition should be subject to IRB procedures and the Common Rule. Building on this definition, HHS should also clarify that research which relies on publicly available information, information in the public domain, or information that can be observed in public contexts does not meet the definition of human subjects research -- regardless of whether the information is personally identifiable -- as long as individuals whose information is used have no reasonable expectation of privacy. This includes digital data, some types of administrative records, and public-use data files...”

http://www8.nationalacademies.org/onpinews/newsitem.aspx?RecordID=18614

Page 12: Privacy in the Digital Age, Helen Cullyer

THE COMPLICATED HUMAN SUBJECTS PICTURE

• Does the research involve personally identifiable data? Is the source data publicly available anyway?

• Would the research involve presentation of data (digital or otherwise) to human subjects with the intention of deceiving or manipulating those subjects?

• What is the potential for harm in the research itself and dissemination of data?

For an excellent account of the current contested landscape, see Christopher Shea, “New Rules for Human-Subject Research are Delayed and Debated”, http://chronicle.com/article/New-Rules-for-Human-Subject/149767/?cid=wc&utm_source=wc&utm_medium=en

Page 13: Privacy in the Digital Age, Helen Cullyer

PRELIMINARY CONCLUSIONS

• The dichotomy between open / non-open is, in many cases, a false one. There are plenty of data types and versions of data that cannot be made fully open yet can be shared with a limited group of individuals in carefully controlled ways

• We need to be worried about privacy regulations and policies that are too stringent as well as those that are too lax

• What is a “reasonable expectation of privacy” in a networked environment?

• Need to generate robust regulations and policies, at a high level of generality, that both protect privacy and allow for collaborative and thoughtful discussion about what is appropriate in particular cases

Page 14: Privacy in the Digital Age, Helen Cullyer

FINAL THOUGHTS

Utilitarian approach: Quantify the risk of harm

Deontological (Kantian) approach: “Always do X”, “Never do Y”

Aristotelian particularist approach: The standard of judgment is the reasonable person

But how do we generate the “robust regulations and policies, at a high level of generality…” within which reasonable persons act? At what level of generality should those laws and policies function?

A rich typology of research projects that involve personal, identifiable data is needed