datacitaon$implementaon$$ @ dataverse$ · 2016. 3. 29. · authors$ published$year$ exportformats$...
TRANSCRIPT
-
Data Cita'on Implementa'on @ Dataverse
Mercè Crosas Chief Data Science and Technology Officer, IQSS, Harvard University @mercecrosas
Workshop: Data Cita'on Pilot Project Kick-‐off bioCADDIE supplemental project, NIH Big Data to Knowledge Feb 3, 2016
-
Data Cita'on in Dataverse complies with the Data Cita'on Principles
Data Cita'on Synthesis Group: Joint Declara'on of Data Cita'on Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 Altman, Crosas, The Evolu'on of Data Cita'on: From Principles to Implementa'on, IASSIST Quarterly; 2013
-
Authors Published Year Export Formats for users
Dataset Title Repository Name Persistent Iden'fier:
Handle or DOI Version
Data Cita'on Generated by Dataverse
-
Persistent Iden'fier Resolves to Dataset Landing Page
-
The Persistent Iden'fier applies to the en're Dataset, not to individual Files
-
The same Persistent Iden'fier applies to All Versions of the Dataset
Only major versions (not minor) appear in the generated data cita'on
-
Cita'on for Quan'ta've (tabular) Data
Authors, Published Year, Dataset Title, Persistent Iden+fier, Repository Name, Version, Universal Numerical Fingerprint (UNF), [File name], [var 1], [var 2], [var…]
Following: Altman, King, A Proposed Standard for the Scholarly Cita'on of Quan'ta've Data, D-‐Lib, 2007
Specify File in Dataset
Specify a subset of variables in Tabular Data File
Checksum independent of file format
-
Dataverse – DataCite Workflow
EZID API 1. Dataset Created in Dataverse 2. Mint DOI with status
“reserved” in EZID, send cita'on metadata
3. Dataset published in Dataverse
4. Change status to “public” in EZID
5. New version of Dataset 6. Send updated cita'on
metadata
DataCite API 1. Dataset Created in Dataverse 2. Reserve local DOI in
Dataverse 3. Dataset published in
Dataverse 4. Mint DOI in DataCite, send
cita'on metadata 5. New version of Dataset 6. Send updated cita'on
metadata
-
Addi'onal Metadata in Dataverse
Cita'on Metadata
• Authors • Title • Descrip'on • Dates • Contact • Subject • …
Domain Metadata
• Life Sciences: based on ISA-‐Tab (and OBI and NCBI taxonomy)
• Other domains (social science, astronomy)
File Metadata
• File header metadata
• File descrip'on, type
• Variable metadata
-
SBGrid Data Repository, Biomedical Dataverse (Sliz HMS, Crosas IQSS)
Social Science Big Data (King, Crosas at IQSS)
Data Provenance (Seltzer SEAS, Crosas, King IQSS)
Privacy Tools to share sensi've data (SEAS, Berkman, Privacy Lab, IQSS, MIT)
What’s Coming Next
-
Future Data Cita'on Extensions
• Provenance Metadata to be used in cita'on services • Extended Domain Metadata (e.g., Life Sciences) to be used in cita'on services
• Support for Privacy, Sensi've Datasets: – A DataTag (blue, green, yellow, orange, red, crimson) assigned to each dataset that defines its sensi've level, with security and access requirements
• Support for Large (Streaming) Datasets: – Many files per Dataset. E.g., Primary Structure Dataset with thousands of images
– Large Streaming Dataset. E.g., Geospa'al Tweets
Sweeney, Crosas, Bar-‐Sinai Sharing Sensi've Data with Confidence: The DataTags System, JOTS, 2015
-
Cita'on for Big Data: Large, Streaming, or Sensi've Datasets
Authors, Published Year, Title, Persistent Iden+fier, Repository Name, Version, [Subset: Query or Variable], [DataTag]
• Be able to cite en're Big Data dataset (with one Persistent Iden'fier), as well as specify granularity when needed
• Should the query be a RESTful url?
• Should the subset be defined by variable/auributes metadata?
• Should the DataTag be part of the cita'on for sensi've data?