laurie goodman: overcoming hurdles to data publication

Post on 08-Jan-2017






Click to see full reader


Overcoming Hurdles to Data Publication

Laurie Goodman, PhDEditor-in-Chief GigaScience

ORCID ID: 0000-0001-9724-5976@GigaScience

(Personal Twitter Acct @Grimhawk1- but this is mostly me whining about Donald Trump, Pitbull Discrimination, and why I hate TSA and Homeland Security)

Why should we “publish” data?

1. Ioannidis et al., (2009). Repeatability of published microarray gene expression analyses. Nature Genetics 41: 142. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8)

Out of 18 microarray papers, resultsfrom 10 could not be reproduced

Deconstructing a paper into accessible, useable, trackable, interlinked units

Need to provide credit to reward sharing and proper organization of:• Narrative• Data/Metadata

availability/curation• Source Code, Software

availability• Interoperability• Availability of workflows• Transparent analyses


Source Code, Software



Data Sets inGigaDB

Analyses inGigaGalaxy

Paper inGigaScience

Linked to

Linked to

Open-access journal

Data Publishing Platform (under CC0 waiver)

Data Analysis Platform

How we view publishing at GigaScience

DOIs from

GigaScience Publishes (or links to) All Research ObjectsArticle (Narrative) + Data + Software + Source Code +

Methods + Workflows + Containers/Docker + VMs

Data sets inGigaDB

Analyses inGigaGalaxy


Linked to

Linked to

Workflow DOI


+ +

What is Data Publication?

1. Publishing a standard article that describes the data.

2. Making the data itself citable.

Make it easy to cite

See where it got cited!

Describe the data

Current listOf Darwin Finch Data Citations on Google Scholar

…And more


Data Publication HurdlesIf only it were easy…

• Data isn’t “scholarly” enough to be a citable entity (a ‘real’ paper)

• If I publish my data, I may not be able to publish the analysis paper later because journals will consider it Prior Publication

• If I publish my data, #DataParasites will use it!!*

* Response from Functional Genomics Data Society:

F1000 ResearchChecked with Publishers and Journals about Data Publication being considered “Prior Publication”

The polar bear DATA was published -as a citable entity- in 2011 before publication of a data analysis paper

BUT #dataparasites!Polar Bear Data were used before the data producer’s analysis paper was published—But it garnered 5 citations.

Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science. 2012 Apr 20;336(6079):344-7. doi:10.1126/science.1216424.

Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345. doi:10.1371/journal.pgen.1003345.

Morgan, CC et al., Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117.

Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursus maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133.

Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4. doi:10.1093/molbev/msu109

However, this paper didn’t include the data citation…The Data Publication has since garnered 6 more citations

Even though the data had been released 2 years earlier and been cited in other papers- The main analysis paper was published in Cell

Analysis Paper was published in Cell.(And made the cover)

Data Publication is being tracked by this and other tracking resources

AND THAT MEANS You can get a Data IF!!

How are Data Citations Doing Overall?Proportions of Citation Types Per Year

Looked at 1,125 Journal Articles with associated data in Dryad from 2011-2014

The Location of the Citation: Are Data Citation Recommendations Having an Effect? Elizabeth Hull, DataCite Blog

Highlights:• Dryad DOI in the works cited, as

recommended = only 6% of total articles

• Dryad DOI in the body only (including data availability sections) = 75%

• No citation (Dryad DOI not found anywhere in the article) = 20%

Good News:• Works cited in references increased from 5%

to 8% from 2011-2014• Articles with no data citation declined from

31% to 15%Bad News: With Current Growth Rate- expect to see 90% in works cited section in 2031

More Education Needed“Easiest” Way Forward is to Engage the Journal Community• Organizations providing citation guidelines should engage

“Editor Evangelists”• Editor Evangelists will do the following:

o Get Data Citation Guidelines in the Guide To Authorso Get Data Citation Guidelines in the Copy Editor

Handbooko Tell All their Editor Friends and Get a Cult following

Example: The Standardization of Gene Nomenclature in articles• The Human Genome Organization (HUGO) worked with journal editors in the

late 1990s to drive use of appropriate Gene Nomenclature, getting it into the guide to authors.

• Within about ~3 Years, standard nomenclature use was used by all

Oh- and don’t forget to have the Editors tell the Production Department that DOIs shouldn’t be stripped out and replaced with URLs.

Thanks to:Scott Edmunds, Executive EditorNicole Nogoy, Commissioning EditorPeter Li, Lead Data ManagerChris Hunter, Lead BioCuratorXiao (Jesse) Si Zhe, Database DeveloperSam Rose, Journal Development ManagerRob Davidson, Open Data Lead, Office for National Statistics


Contact us:

Follow us:

top related