curating and managing research data for re-use appraisal & acquisition jared lyle

51

Upload: ophelia-holt

Post on 02-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle
Page 2: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Curating and ManagingResearch Data for Re-Use

Appraisal & AcquisitionJared Lyle

Page 3: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

We Are Here: Appraisal & Acquisition

Page 4: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Appraisal & Acquisition

• Collection development policy• Appraisal• Selection • Acquisition

Page 5: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

http://en.wikipedia.org/wiki/File:Schnorr_von_Carolsfeld_-_Die_Schlacht_Rudolfs_von_Habsburg_gegen_Ottokar_von_B%C3%B6hmen.jpg

Page 6: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Collection Development Policy

Identifies:•Archive’s user base•Types of data in which archive is interested•Criteria used to determine archival value

Page 7: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Types of Data of Special Interest to ICPSR• Diversity Data. Data that fosters

understanding of the experiences of racial and ethnic minorities and other marginalized peoples living in the United States.

• Complex Data. Data arising from longitudinal research, survey research, and non-standard types: biological data, administrative records, video data, spatial data, remotely sensed data, and relational databases.

http://www.icpsr.umich.edu/icpsrweb/ICPSR/curation/selection.jsp

Page 8: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Types of Data of Special Interest to ICPSR• Mixed Method Data. Data that can support both

qualitative and quantitative analyses; data resulting from concurrent (both at the same time), sequential (one following the other), or conversion (one method to the other) mixed method study designs.

• Interdisciplinary Data. Data from interdisciplinary studies, and data resulting from studies using the research methods of multiple disciplines.

• International Data. Data originating outside the United States and data that support cross-national, comparative research. We are especially interested in data from countries and regions of the world that do not have a national structure for archiving, disseminating, and preserving research data.http://www.icpsr.umich.edu/icpsrweb/ICPSR/curation/selection.jsp

Page 9: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

ICPSR Criteria for Archival Value

• Nationally representative • Theoretically/methodologically unique• Representing underrepresented research

populations• Widely cited, appearing in top tier journals, or

collected by an eminent scholar

Page 10: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Example Policies

• ICPSR Collection Development Policy http://www.icpsr.umich.edu/icpsrweb/ICPSR/org/policies/colldev.jsp

• UK Data Archive Collections Development Policy http://www.esds.ac.uk/news/publications/UKDACollectionsDevPolicy.pdf

• MSU Libraries Collection Development Policy Statement: Data Services http://libguides.lib.msu.edu/dataservicescollectiondevpolicy

Page 11: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

A collection development policy is a living document and should be updated over time to follow the trends and output of the research community.

Page 12: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Example: Twitter @LOC

http://www.niemanlab.org/2012/07/that-plan-to-archive-every-tweet-in-the-library-of-congress-definitely-still-happening/

Page 13: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

“Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress.”

http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/

Page 14: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

2010 = 50 million tweets per day2012 = 400 million tweets per dayhttp://www.niemanlab.org/2012/07/that-plan-to-archive-every-tweet-in-the-library-of-congress-definitely-still-happening/

Page 15: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

“It’s critical the future generations know what flavor burrito I had for lunch.”

-first comment on the Library’s project FAQ pagehttp://www.niemanlab.org/2012/07/that-plan-to-archive-every-tweet-in-the-library-of-congress-definitely-still-happening/

Page 17: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

“Research requests [of the Twitter archive] have included users looking for their own Twitter history, the study of the geographic spread of news, the study of the spread of epidemics, and the study of the transmission of new uses of language.”

https://www.conftool.net/or2012/index.php?page=browseSessions&form_session=2

Page 18: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

“…if you’re looking for a place where important historical and other information in digital form should be preserved for the long haul, we’re it!”

http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/

http://www.loc.gov/acq/devpol/

Page 19: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Example: Twitter @ICPSR

http://wewillraakyou.com/2010/09/under-twitters-hood/

Page 20: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

“We estimate the entire raw data will be about 15 TB and after processing and extraction, it may be less than 2~5 TB. Since we do not know at this point how large the data will be, it would be helpful if you can let us know ICPSR's upper bound on manageable data size so that we can quote that in the supplementary material for our initial proposal. Thank you.”

Page 21: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Example: Transactional Data @ICPSR

http://blog.mipimworld.com/wp-content/uploads/2012/01/Target-checkouts.jpg

Page 22: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Discussion

• What data are you pursuing?• What do you do if you are offered things you

don’t want?• What new forms of data do you anticipate

working with in the next year?• How will that affect your collection

development strategy and policy?

Page 23: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Possible New Forms of Data• Continuous location information from cell phones or Fastlane

transponders.• Product radio-frequency identification (RFIDs), online product

searches and purchases, and device fingerprinting. • Electronic medical records, and new devices for continuous

monitoring, passive heart beat measurement, movement indicators, skin conductivity

• Satellite imagery. • Social everything—networking, bookmarking, highlighting,

commenting, product reviewing, recommending, and annotating.• Online games and virtual worlds.

Gary King (http://www.sciencemag.org/content/331/6018/719.full)

Page 24: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Selection• Passive• Active• Serendipitous

Page 25: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Source: Pienta, Gutmann, & Lyle, 2009

Page 26: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Why are data not shared?

• Preparing data and documentation can be enormously time consuming

• Limited resources for data preparation• Need to protect the confidentiality of

respondents• Fear of getting “scooped”• Lack of rewards for sharing

Source: Pienta, Gutmann, & Lyle, 2009

Page 27: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Pienta, Alter, & Lyle (2010). “The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data”. http://hdl.handle.net/2027.42/78307

Pienta, Gutmann & Lyle (2009). “Research Data in The Social Sciences: How Much is Being Shared?” Presentation at the Research Conference on Research Integrity, Niagara Falls, NY

More about data sharing:

Page 28: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Pienta, Gutmann, Hoelter, Lyle, and Donakowski (2008). “The LEADS Database at ICPSR: Identifying Important ‘At Risk’ Social Science Data.”

http://www.data-pass.org/sites/default/files/Pienta_et_al_2008.pdf

Example: Selection @ICPSR

Page 29: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle
Page 30: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Discussion

• What are you doing to actively build your collection?

• How are you creating serendipitous selection?

Page 31: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Acquisition (aka Deposit)

https://www.dropbox.com/tour/3

Page 32: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Acquisition Goals

Transfer:•Content•Metadata•Legal permissions

Page 33: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Legal permissions

• Do they have authority to deposit the content with you?

• Can you then modify, reformat, preserve, describe, and redisseminate?

• Are there any human disclosure issues?

Page 34: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

ICPSR’s Deposit Agreement

• I have implicit or explicit copyright to this work and have the right to make it publicly available through ICPSR.[red highlights added by me]

Page 35: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

ICPSR’s Deposit Agreement• I give my permission for the Data Collection to be used by

ICPSR for the following purposes, without limitation:– To redisseminate copies of the Data Collection in a variety of

media formats– To promote and advertise the Data Collection in any publicity (in

any form) for ICPSR– To describe, catalog, validate and document the Data Collection– To store, translate, copy or re-format the Data Collection in any

way to ensure its future preservation and accessibility– To incorporate metadata or documentation in the Data

Collection into public access catalogues• I give my permission to ICPSR to enhance, transform and/or

rearrange to the Data Collection, including the data and metadata, for any of the following purposes:– Protect respondent confidentiality– Improve usability

[red highlights added by me]

Page 36: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

ICPSR’s Deposit Agreement• To the extent allowable by law or permitted by the

sponsor of the data collection, in preparing this data collection for public archiving and distribution, I have removed all information directly identifying the research subjects in these data, and I have used due diligence in preventing information in the collection from being used to disclose the identity of research subjects.

• I further agree to release and hold harmless ICPSR (including staff and the ICPSR Council) and the University of Michigan from any and all liability from claims arising out of any legal action concerning identification of research subjects, breaches of confidentiality, or invasions of privacy by or on behalf of said subjects.[red highlights added by me]

Page 37: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Deposit Mechanism

http://www.gizmodo.com.au/2010/08/rube-goldberg-the-man-behind-the-machines/

Page 38: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Deposit Mechanism

Atari’s “Star Trek” instructions:

Insert Quarter. Avoid Klingons.

-See Isaacson’s Steve Jobs

Page 39: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Example: Deposit Form @ICPSR

Pre-2007

Page 40: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

2007

Page 41: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

2010 Mock-up

Page 42: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Example: Deposit Form @DeepBlue

Page 43: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Example: Deposit Form @Dryad

Page 44: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/deposit/index.jsp2012

Page 45: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle
Page 46: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Deposit: Behind the Scenes

• Checksum• ID verification• File type verification• Data transferred to secure storage

Page 47: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Discussion• How easy or complex should your deposit

process be?• What are incentives you can use to encourage

depositors to do a thorough job?• What legal issues do you address at deposit?

Page 48: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Other issues: Formats

DSpace @ MIT:•Supported: DSpace fully supports the format.•Known: DSpace can recognize the format, but we cannot guarantee full support.•Unsupported: DSpace cannot recognize a format; such formats are listed as "application/octet-stream", or Unknown.

http://libraries.mit.edu/dspace-mit/build/policies/format.html

Page 49: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

http://techaticpsr.blogspot.com/2012/05/april-2012-deposits-at-icpsr.html

File Types Deposited @ICPSR - April 2012

Page 50: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

Other issues: Length of Commitment

How long (and at what level) do we commit to preserving data?•Forever? •10 years?•5 years after the last access?

Page 51: Curating and Managing Research Data for Re-Use Appraisal & Acquisition Jared Lyle

We Are Here: Appraisal & Acquisition