gaining credit for sharing research data
TRANSCRIPT
![Page 1: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/1.jpg)
Varsha Khodiyar, PhD
Data Curation Editor, Scientific Data
Nature Publishing Group
@varsha_khodiyar
@scientificdata
Tweet with #SDJPN16
Gaining credit for sharing research data
Data publishing with Scientific Data RIKEN Center for Life Science Technologies 4th March 2016
![Page 2: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/2.jpg)
My background • Joined Scientific Data in October 2014
• Professional data curator since 2003
• PhD in Molecular Biology from the University of Leicester
• Contributed to the Human Genome Project as member of the Human Gene Nomenclature Committee (HGNC)
• Gene Ontology curator for 8 years, at University College London, UK
• 3 years of open data publishing experience
2
![Page 3: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/3.jpg)
Why share research data?
![Page 4: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/4.jpg)
Generating research data is expensive
Just 18.1% NIH grant applications funded in 2014*
• Hours spent writing grants?
• Hours spent reviewing grants?
Resources are finite/expensive
• Modified animals
• Specialized reagents
Time and effort taken in the laboratory to generate good, valid data
* report.nih.gov/success_rates/Success_ByIC.cfm
![Page 5: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/5.jpg)
Irreproducibility of published science
Figure 1 - Ioannidis JPA. et al. Repeatability of published microarray gene
expression analyses. Nature Genetics 41, 149–55 (2009) doi:10.1038/ng.295
![Page 6: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/6.jpg)
Withholding data impacts on human health
Clinical study reports, detailed data and software code available at Dryad Digital Repository doi:10.5061/dryad.bv8j6 and www.Study329.org
![Page 7: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/7.jpg)
• Diversity of analyses and opinion
• New research
• testing of new hypotheses
• new analysis methods
• meta-analyses to create new datasets
• studies on data collection methods
• Education of new researchers
• Increased return on investment in research
Vickers AJ: Whose data set is it anyway? Sharing raw data from randomized trials. Trials 2006, 7:15
Hrynaszkiewicz I, Altman DG: Towards agreement on
best practice for publishing raw clinical trial data. Trials 2009, 10:17
Sharing data promotes
![Page 8: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/8.jpg)
Researchers already share data
• Most researchers are sharing
data, and using the data of
others
• Direct contact between
researchers (on request) is a
common way of sharing data
• Repositories are second most
common method of sharing
Kratz and Strasser (2015) doi: 10.1371/journal.pone.0117619 9
![Page 9: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/9.jpg)
Some problems… • Sharing upon request relies heavily on trust
• Informally stored data associated with published works disappears at a
rate of ~17% per year (Vines et al. 2014; doi: 10.1016/j.cub.2013.11.014)
• Datasets not referenced in a manuscript are essentially invisible (a.k.a
“Dark data”)
• If data are available, they are often not interpretable or reusable
because sufficient detail is not included
• Data producers do not get appropriate credit for their work
![Page 10: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/10.jpg)
10
www.nature.com/scientificdata
![Page 11: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/11.jpg)
Credit – Scholarly credit for publishing data; all publications are indexed
and citeable.
Reuse – Standardized and detailed descriptions enables easier reuse of
published research data.
Quality – Rigorous peer-review on technical quality and reusability.
Editorial Board of experts in their field maintain community standards.
Discovery – Curated, machine-readable metadata for dataset discovery.
Validated links to published data in each article.
Open – Use of CC-BY licence for articles and CC0 for metadata. Promote
use of open licences for published data.
Service – Commitment to excellent service for authors and readers.
![Page 12: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/12.jpg)
What is a Data Descriptor?
![Page 13: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/13.jpg)
Data Descriptors have human and machine readable components
13
Human readable representation of
study i.e. article (HTML &
PDF)
Human readable representation of
study i.e. article (HTML
& PDF)
Machine readable
representation of study
i.e. metadata
![Page 14: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/14.jpg)
Synthesis
Analysis
Conclusions
What did I do to generate the data?
How was the data processed?
Where is the data?
Who did what and when?
Methods and technical analyses supporting the quality of the measurements.
Do not contain tests of new scientific hypotheses
Comparison of Data Descriptor to traditional article
![Page 15: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/15.jpg)
What types of data can be published?
15
Decades old
dataset
Standalone dataset
Data that has been used in an analysis
article
Large consortium
dataset
Data from a single
experiment
Data that the researcher finds
valuable and that others might find
useful too
Data associated with a high impact
analysis article
![Page 16: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/16.jpg)
When can a Data Descriptor be published?
16
After data analysis has
been published
Before analysis has been published
Authors not intending to analyse data
Data Descriptors can be submitted and published
at any point in the research workflow, i.e.
whenever it makes most sense for your data
After data analysis has
been published
Before the analysis has
been published
Publication alongside analysis
article
![Page 17: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/17.jpg)
Scientific Data accepts submissions from all quantitative research disciplines
17
![Page 18: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/18.jpg)
Helping authors find the right place for their data
![Page 19: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/19.jpg)
Scientific Data’s Repository List
Browse our recommended data repositories online.
• We currently list almost 80 repositories, across biological, medical,
physical and social sciences
• When required, we provide guidance to authors on the best place to
store their data
www.nature.com/sdata/data-policies/repositories
![Page 20: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/20.jpg)
Generation of machine readable metadata
![Page 21: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/21.jpg)
• We want to capture metadata about the dataset being described in each Data Descriptor
• The manuscript captures human readable metadata needed for data reuse
• The curated metadata records capture machine readable metadata needed for machine based data discovery
Metadata at Scientific Data
![Page 22: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/22.jpg)
ISA-Tab format for machine readable metadata
22
• Study workflow
• Key sample characteristics
needed for data discovery
• Relates samples to data files
• Shows location of dataset
• Uses controlled vocabularies
and ontologies (where
possible)
![Page 23: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/23.jpg)
Use of community endorsed ontologies and controlled vocabularies
23
Controlled vocabulary = list of standardized phrases of scientific concepts Ontology = controlled vocabulary with defined relationships between terms
![Page 24: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/24.jpg)
Structured Summary table from curated metadata
24
Investigation file
Study file
Sample characteristics reported in Structured Summary table: Organism Organism part Cell line Geographical location Environment type
![Page 25: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/25.jpg)
Viewing the metadata
25
1.
2.
3.
![Page 26: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/26.jpg)
Metadata for data discovery
Search by: • Data Repositories • Experiment design • Measurements made • Technologies used • Factor types • Sample Characteristics
• Organism • Environment types • Geographic locations
scientificdata.isa-explorer.org
![Page 27: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/27.jpg)
Citing Data
![Page 28: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/28.jpg)
Citing my own data
1. In the article text
2. In the Data Citation section
![Page 29: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/29.jpg)
Citing data I’ve reused
1. In the article text
2. In the References
section
![Page 30: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/30.jpg)
Clinical researchers support sharing, but…
Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS: Sharing of clinical trial data among trialists: a cross sectional survey. BMJ 2012;345:e7570
• Sharing de-identified data via repositories should be required (236 respondents, 74%)
• Investigators should share de-identified data on request (229 respondents, 72%)
![Page 31: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/31.jpg)
…clinical data producers have specific concerns
Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS: Sharing of clinical trial data among trialists: a cross sectional survey. BMJ 2012;345:e7570
![Page 32: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/32.jpg)
Example initiatives for sharing clinical data
Yale Open Data Access (YODA) & Clinical Study Data Request (CSDR) projects:
• Data Use Agreements (DUAs) • Controlled access environment • Scientific validity of reanalysis checked • Independent governance • Data anonymisation checks
http://yoda.yale.edu/ https://www.clinicalstudydatarequest.com/
![Page 33: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/33.jpg)
Clinical data publication at Scientific Data
• Identify repositories able to archive clinical data
• Work with identified repositories to establish workflows for
peer review and publication, whilst maintaining patient
privacy
• Facilitate specialist peer review process for clinical data, for
example ensure peer reviewers have agreed to terms of data
use agreement
Hrynaszkiewicz, I., Khodiyar, V., Hufton, A. & Sansone, S. A. Publishing descriptions of non-public clinical datasets: guidance for researchers, repositories, editors and funding organisations. BioRxiv http://dx.doi.org/10.1101/021667 (2015).
![Page 34: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/34.jpg)
A robust data-on-request workflow?
![Page 35: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/35.jpg)
Published Data Descriptor with clinical data Data Records
section details how to access
the data
![Page 36: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/36.jpg)
Links to restricted access data Data Citations link to repository
Data files requiring
permission to access
Freely accessible data files
![Page 37: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/37.jpg)
Data Reuse stories
![Page 38: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/38.jpg)
Data reuse by (some of) the same researchers
38
![Page 39: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/39.jpg)
Data reuse by other researchers in the same field
39
“The Data Descriptor made it easier to use the data, for me it was critical that everything was there…all the technical details like voxel size.”
Professor Daniele Marinazzo
![Page 40: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/40.jpg)
According to Google Scholar, cited 43 times! (February 2016)
Data reuse and citation by researchers
![Page 41: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/41.jpg)
41
www.bbc.co.uk/news/science-environment-33057402
Data reuse by the non-research community
![Page 42: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/42.jpg)
Data reuse by the non-research community
42
http://www.nytimes.com/interactive/2014/12/30/science/history-of-ebola-in-24-outbreaks.html
![Page 43: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/43.jpg)
Data Descriptors…
• …enable you to gain scholarly credit for your data gathering efforts.
• …are human AND machine readable.
• …can be published with, or independently of, an analysis article.
• …can be published point in the research workflow.
• …allow the publication and discovery of clinical data, whilst maintaining your patients privacy.
• …result in greater reuse and citation by fellow members of your research community.
• …extend the impact of your research data by enabling access to and reuse by the non-research community.
43
![Page 44: Gaining credit for sharing research data](https://reader033.vdocuments.net/reader033/viewer/2022042707/58eccc091a28ab293c8b46bf/html5/thumbnails/44.jpg)
Get more from
your data
Preserve it
Encourage reuse
Get credit for it
Visit nature.com/sdata Email [email protected] Tweet @ScientificData #SDJPN16