laurie goodman at #sspboston: article+data+toolsreproducibility, reuse, & rapid release

13
Article+Data+Tools Reproducibility, Reuse, & Rapid Release Laurie Goodman, PhD Editor-in-Chief GigaScience

Upload: gigascience-bgi-hong-kong

Post on 28-Jan-2015

116 views

Category:

Science


2 download

DESCRIPTION

Laurie Goodman's talk at Society for Scholarly Publishing, Boston: Article+Data+Tools Reproducibility, Reuse, & Rapid Release 29th May 2014

TRANSCRIPT

Page 1: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

Article+Data+ToolsReproducibility, Reuse, & Rapid Release

Laurie Goodman, PhDEditor-in-Chief

GigaScience

Page 2: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

Current Scientific Communication Via Publication

• Scholarly articles are merely advertisement of scholarship . The actual scholarly artefacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab and reproducible research, 1995

• Core scientific statements or assertions are intertwined and hidden in the conventional scholarly narratives

• Lack of transparency, lack of credit for anything other than “regular” dead tree publication

Page 3: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

GigaSolution: deconstructing the paperPublishing all the pieces:

• Data/software available

• Metadata/curation

• Interoperability

• Availability of workflows

• Transparent analyses

Data Metadata

MethodsAnalyses

Page 4: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

How We Envision Research Publication(Communicating Science)

Data Sets inGigaDB

Analyses inGigaGalaxy

Paper inGigaScience

Linked to

Linked to

Open-access journal Data Publishing Platform

Data Analysis Platform

Page 5: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

It’s not just for ‘Omics anymore

Page 6: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

Example in Neuroscience

1. Neuroscience Data are not typically shared

2. For most papers: Data AND Tools are not typically made available to the reviewers

3. Journal Editors think Reviewers will not want to review data

GigaScience 2014, 3:3 doi:10.1186/2047-217X-3-3

Page 7: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

Example in Neuroscience• Neuroscience Data are not typically shared• Author Dr. Stephen Eglen said: “One way of encouraging neuroscientists to

share their data is to provide some form of academic credit.”• We hosted with a DOI: 366 recordings from 12 electrophysiology datasets• GigaDB is included in Thompson Reuters Data Citation Index • Data AND Tools are not typically made available to the reviewers• We made manuscript, data and tools all available to the reviewers.• We make sure to include reviewers who are able to properly assess the data

itself and rerun the tools • To reduce burdens- we sometimes select a reviewer who ONLY looks at the

data.• Journal Editors think Reviewers will not want to review data• What Reviewer Dr. Thomas Wachtler said: “The paper by Eglen and

colleagues is a shining example of openness in that it enables replicating the results almost as easily as by pressing a button.”

• What Reviewer Dr. Christophe Pouzat said: “In addition to making the presented research trustworthy, the reproducible research paradigm definitely makes the reviewers job more fun!”

Page 8: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

Data Citation Really is a Major IncentiveOn Weds this week- we released the genome sequence from 3000 Rice strains (13.4 TB of data)• These data were also deposited in NIH SRA repository• So why did we do it too?1. It is linked directly to the Data Paper that provides

details of data production, quality, and basic analysis2. Authors were hesitant to release these data (a HUGE

community resource) prior to the analysis paper publication (which, for 3000 strains… would take years…). The opportunity to have these data citable (and trackable) encouraged the authors and led to their releasing these data and doing so in collaboration with GigaScience’s Biocurator

The 3,000 Rice Genomes Project. (2014) GigaScience 3:7 http://dx.doi.org/10.1186/2047-217X-3-7; The 3000 Rice Genomes Project (2014) GigaScience Database. http://dx.doi.org/10.5524/200001

Page 9: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

Consider Cross Journal SupportCompetition is good…

….but sometimes we should collaborate for the community good

• PLoS recent data deposition policies have led to community concerns about feasibility.

• We support (and applaud) this …we have an even stricter data deposition policy

• But- PLoS ONE received a submission that was a comparative study of earthworm morphology and anatomy using a 3D non-invasive imaging technique called micro-computed tomography (or microCT) …And there is no good place to put this

• These data are extremely complex, videos, multiple files- with several folders of ~10 GB

Page 10: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

Consider Cross Journal Support

• GigaScience and PLOS ONE collaborated. They published the main article; we published a Data Note describing the data itself and hosted all the data on GigaDB under separate citation.

• With our Aspera Connection- reviewers could download even the 10 TB folders in ~1/2 hour

• Reviewer Dr. Sarah Faulwetter noted the usefulness of having these data available, saying: Instead of having to go through the lengthy process of obtaining the physical specimen from a museum, I can now download a fairly accurate representation from the web.

Lenihan et al (2014). GigaScience, 3:6 http://dx.doi.org/10.1186/2047-217X-3-6; Lenihan, et al (2014): GigaScience Database. http://dx.doi.org/10.5524/100092; Fernández et al (2014) PLOS ONE 9 (5) e96617 http://dx.doi.org/10.1371/journal.pone.0096617

Page 11: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

Think about what you do… and what you can do…• Promote- rather than inhibit- prepublication data sharing• Promote Data Citation in the reference section

– incentivizes data release– Makes it easier for reader to find

• Promote Data Sharing upon publication – Consider your data release policies

• Form collaborations with repositories to aid authors in depositing their work– Identify community organizations with metadata standards

• Make data available for reviewers (author website, community repositories, dryad and similar (your publisher?)– at least do a sanity check– Use “data reviewers”

No- this isn’t easy, but do what you can nowAnd work toward the rest

Evolve

Page 12: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

It’s Time to Move Beyond Dead Trees

18121665 1869

Page 13: Laurie Goodman at #SSPBoston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release

Thanks to:Scott Edmunds, Executive EditorNicole Nogoy, Commissioning EditorPeter Li, Lead Data ManagerChris Hunter, Lead BioCuratorRob Davidson, Data ScientistXiao (Jesse) Si Zhe, Database DeveloperAmye Kenall, Journal Development Manager

[email protected]@gigasciencejournal.com

@GigaScience

facebook.com/GigaScienceblogs.openaccesscentral.com/blogs/gigablog

Contact us:

Follow us:

www.gigasciencejournal.comwww.gigadb.org