curating and managing research data for re-use appraisal & acquisition jared lyle

Post on 02-Jan-2016

217 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Curating and ManagingResearch Data for Re-Use

Appraisal & AcquisitionJared Lyle

We Are Here: Appraisal & Acquisition

Appraisal & Acquisition

• Collection development policy• Appraisal• Selection • Acquisition

http://en.wikipedia.org/wiki/File:Schnorr_von_Carolsfeld_-_Die_Schlacht_Rudolfs_von_Habsburg_gegen_Ottokar_von_B%C3%B6hmen.jpg

Collection Development Policy

Identifies:•Archive’s user base•Types of data in which archive is interested•Criteria used to determine archival value

Types of Data of Special Interest to ICPSR• Diversity Data. Data that fosters

understanding of the experiences of racial and ethnic minorities and other marginalized peoples living in the United States.

• Complex Data. Data arising from longitudinal research, survey research, and non-standard types: biological data, administrative records, video data, spatial data, remotely sensed data, and relational databases.

http://www.icpsr.umich.edu/icpsrweb/ICPSR/curation/selection.jsp

Types of Data of Special Interest to ICPSR• Mixed Method Data. Data that can support both

qualitative and quantitative analyses; data resulting from concurrent (both at the same time), sequential (one following the other), or conversion (one method to the other) mixed method study designs.

• Interdisciplinary Data. Data from interdisciplinary studies, and data resulting from studies using the research methods of multiple disciplines.

• International Data. Data originating outside the United States and data that support cross-national, comparative research. We are especially interested in data from countries and regions of the world that do not have a national structure for archiving, disseminating, and preserving research data.http://www.icpsr.umich.edu/icpsrweb/ICPSR/curation/selection.jsp

ICPSR Criteria for Archival Value

• Nationally representative • Theoretically/methodologically unique• Representing underrepresented research

populations• Widely cited, appearing in top tier journals, or

collected by an eminent scholar

Example Policies

• ICPSR Collection Development Policy http://www.icpsr.umich.edu/icpsrweb/ICPSR/org/policies/colldev.jsp

• UK Data Archive Collections Development Policy http://www.esds.ac.uk/news/publications/UKDACollectionsDevPolicy.pdf

• MSU Libraries Collection Development Policy Statement: Data Services http://libguides.lib.msu.edu/dataservicescollectiondevpolicy

A collection development policy is a living document and should be updated over time to follow the trends and output of the research community.

Example: Twitter @LOC

http://www.niemanlab.org/2012/07/that-plan-to-archive-every-tweet-in-the-library-of-congress-definitely-still-happening/

“Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress.”

http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/

2010 = 50 million tweets per day2012 = 400 million tweets per dayhttp://www.niemanlab.org/2012/07/that-plan-to-archive-every-tweet-in-the-library-of-congress-definitely-still-happening/

“It’s critical the future generations know what flavor burrito I had for lunch.”

-first comment on the Library’s project FAQ pagehttp://www.niemanlab.org/2012/07/that-plan-to-archive-every-tweet-in-the-library-of-congress-definitely-still-happening/

“Research requests [of the Twitter archive] have included users looking for their own Twitter history, the study of the geographic spread of news, the study of the spread of epidemics, and the study of the transmission of new uses of language.”

https://www.conftool.net/or2012/index.php?page=browseSessions&form_session=2

“…if you’re looking for a place where important historical and other information in digital form should be preserved for the long haul, we’re it!”

http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/

http://www.loc.gov/acq/devpol/

Example: Twitter @ICPSR

http://wewillraakyou.com/2010/09/under-twitters-hood/

“We estimate the entire raw data will be about 15 TB and after processing and extraction, it may be less than 2~5 TB. Since we do not know at this point how large the data will be, it would be helpful if you can let us know ICPSR's upper bound on manageable data size so that we can quote that in the supplementary material for our initial proposal. Thank you.”

Example: Transactional Data @ICPSR

http://blog.mipimworld.com/wp-content/uploads/2012/01/Target-checkouts.jpg

Discussion

• What data are you pursuing?• What do you do if you are offered things you

don’t want?• What new forms of data do you anticipate

working with in the next year?• How will that affect your collection

development strategy and policy?

Possible New Forms of Data• Continuous location information from cell phones or Fastlane

transponders.• Product radio-frequency identification (RFIDs), online product

searches and purchases, and device fingerprinting. • Electronic medical records, and new devices for continuous

monitoring, passive heart beat measurement, movement indicators, skin conductivity

• Satellite imagery. • Social everything—networking, bookmarking, highlighting,

commenting, product reviewing, recommending, and annotating.• Online games and virtual worlds.

Gary King (http://www.sciencemag.org/content/331/6018/719.full)

Selection• Passive• Active• Serendipitous

Source: Pienta, Gutmann, & Lyle, 2009

Why are data not shared?

• Preparing data and documentation can be enormously time consuming

• Limited resources for data preparation• Need to protect the confidentiality of

respondents• Fear of getting “scooped”• Lack of rewards for sharing

Source: Pienta, Gutmann, & Lyle, 2009

Pienta, Alter, & Lyle (2010). “The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data”. http://hdl.handle.net/2027.42/78307

Pienta, Gutmann & Lyle (2009). “Research Data in The Social Sciences: How Much is Being Shared?” Presentation at the Research Conference on Research Integrity, Niagara Falls, NY

More about data sharing:

Pienta, Gutmann, Hoelter, Lyle, and Donakowski (2008). “The LEADS Database at ICPSR: Identifying Important ‘At Risk’ Social Science Data.”

http://www.data-pass.org/sites/default/files/Pienta_et_al_2008.pdf

Example: Selection @ICPSR

Discussion

• What are you doing to actively build your collection?

• How are you creating serendipitous selection?

Acquisition (aka Deposit)

https://www.dropbox.com/tour/3

Acquisition Goals

Transfer:•Content•Metadata•Legal permissions

Legal permissions

• Do they have authority to deposit the content with you?

• Can you then modify, reformat, preserve, describe, and redisseminate?

• Are there any human disclosure issues?

ICPSR’s Deposit Agreement

• I have implicit or explicit copyright to this work and have the right to make it publicly available through ICPSR.[red highlights added by me]

ICPSR’s Deposit Agreement• I give my permission for the Data Collection to be used by

ICPSR for the following purposes, without limitation:– To redisseminate copies of the Data Collection in a variety of

media formats– To promote and advertise the Data Collection in any publicity (in

any form) for ICPSR– To describe, catalog, validate and document the Data Collection– To store, translate, copy or re-format the Data Collection in any

way to ensure its future preservation and accessibility– To incorporate metadata or documentation in the Data

Collection into public access catalogues• I give my permission to ICPSR to enhance, transform and/or

rearrange to the Data Collection, including the data and metadata, for any of the following purposes:– Protect respondent confidentiality– Improve usability

[red highlights added by me]

ICPSR’s Deposit Agreement• To the extent allowable by law or permitted by the

sponsor of the data collection, in preparing this data collection for public archiving and distribution, I have removed all information directly identifying the research subjects in these data, and I have used due diligence in preventing information in the collection from being used to disclose the identity of research subjects.

• I further agree to release and hold harmless ICPSR (including staff and the ICPSR Council) and the University of Michigan from any and all liability from claims arising out of any legal action concerning identification of research subjects, breaches of confidentiality, or invasions of privacy by or on behalf of said subjects.[red highlights added by me]

Deposit Mechanism

http://www.gizmodo.com.au/2010/08/rube-goldberg-the-man-behind-the-machines/

Deposit Mechanism

Atari’s “Star Trek” instructions:

Insert Quarter. Avoid Klingons.

-See Isaacson’s Steve Jobs

Example: Deposit Form @ICPSR

Pre-2007

2007

2010 Mock-up

Example: Deposit Form @DeepBlue

Example: Deposit Form @Dryad

http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/deposit/index.jsp2012

Deposit: Behind the Scenes

• Checksum• ID verification• File type verification• Data transferred to secure storage

Discussion• How easy or complex should your deposit

process be?• What are incentives you can use to encourage

depositors to do a thorough job?• What legal issues do you address at deposit?

Other issues: Formats

DSpace @ MIT:•Supported: DSpace fully supports the format.•Known: DSpace can recognize the format, but we cannot guarantee full support.•Unsupported: DSpace cannot recognize a format; such formats are listed as "application/octet-stream", or Unknown.

http://libraries.mit.edu/dspace-mit/build/policies/format.html

http://techaticpsr.blogspot.com/2012/05/april-2012-deposits-at-icpsr.html

File Types Deposited @ICPSR - April 2012

Other issues: Length of Commitment

How long (and at what level) do we commit to preserving data?•Forever? •10 years?•5 years after the last access?

We Are Here: Appraisal & Acquisition

top related