laurie goodman at #sspboston: article+data+toolsreproducibility, reuse, & rapid release
DESCRIPTION
Laurie Goodman's talk at Society for Scholarly Publishing, Boston: Article+Data+ToolsReproducibility, Reuse, & Rapid Release 29th May 2014TRANSCRIPT
Article+Data+ToolsReproducibility, Reuse, & Rapid Release
Laurie Goodman, PhDEditor-in-Chief
GigaScience
Current Scientific Communication Via Publication
• Scholarly articles are merely advertisement of scholarship . The actual scholarly artefacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab and reproducible research, 1995
• Core scientific statements or assertions are intertwined and hidden in the conventional scholarly narratives
• Lack of transparency, lack of credit for anything other than “regular” dead tree publication
GigaSolution: deconstructing the paperPublishing all the pieces:
• Data/software available
• Metadata/curation
• Interoperability
• Availability of workflows
• Transparent analyses
Data Metadata
MethodsAnalyses
How We Envision Research Publication(Communicating Science)
Data Sets inGigaDB
Analyses inGigaGalaxy
Paper inGigaScience
Linked to
Linked to
Open-access journal Data Publishing Platform
Data Analysis Platform
It’s not just for ‘Omics anymore
Example in Neuroscience
1. Neuroscience Data are not typically shared
2. For most papers: Data AND Tools are not typically made available to the reviewers
3. Journal Editors think Reviewers will not want to review data
GigaScience 2014, 3:3 doi:10.1186/2047-217X-3-3
Example in Neuroscience• Neuroscience Data are not typically shared• Author Dr. Stephen Eglen said: “One way of encouraging neuroscientists to
share their data is to provide some form of academic credit.”• We hosted with a DOI: 366 recordings from 12 electrophysiology datasets• GigaDB is included in Thompson Reuters Data Citation Index • Data AND Tools are not typically made available to the reviewers• We made manuscript, data and tools all available to the reviewers.• We make sure to include reviewers who are able to properly assess the data
itself and rerun the tools • To reduce burdens- we sometimes select a reviewer who ONLY looks at the
data.• Journal Editors think Reviewers will not want to review data• What Reviewer Dr. Thomas Wachtler said: “The paper by Eglen and
colleagues is a shining example of openness in that it enables replicating the results almost as easily as by pressing a button.”
• What Reviewer Dr. Christophe Pouzat said: “In addition to making the presented research trustworthy, the reproducible research paradigm definitely makes the reviewers job more fun!”
Data Citation Really is a Major IncentiveOn Weds this week- we released the genome sequence from 3000 Rice strains (13.4 TB of data)• These data were also deposited in NIH SRA repository• So why did we do it too?1. It is linked directly to the Data Paper that provides
details of data production, quality, and basic analysis2. Authors were hesitant to release these data (a HUGE
community resource) prior to the analysis paper publication (which, for 3000 strains… would take years…). The opportunity to have these data citable (and trackable) encouraged the authors and led to their releasing these data and doing so in collaboration with GigaScience’s Biocurator
The 3,000 Rice Genomes Project. (2014) GigaScience 3:7 http://dx.doi.org/10.1186/2047-217X-3-7; The 3000 Rice Genomes Project (2014) GigaScience Database. http://dx.doi.org/10.5524/200001
Consider Cross Journal SupportCompetition is good…
….but sometimes we should collaborate for the community good
• PLoS recent data deposition policies have led to community concerns about feasibility.
• We support (and applaud) this …we have an even stricter data deposition policy
• But- PLoS ONE received a submission that was a comparative study of earthworm morphology and anatomy using a 3D non-invasive imaging technique called micro-computed tomography (or microCT) …And there is no good place to put this
• These data are extremely complex, videos, multiple files- with several folders of ~10 GB
Consider Cross Journal Support
• GigaScience and PLOS ONE collaborated. They published the main article; we published a Data Note describing the data itself and hosted all the data on GigaDB under separate citation.
• With our Aspera Connection- reviewers could download even the 10 TB folders in ~1/2 hour
• Reviewer Dr. Sarah Faulwetter noted the usefulness of having these data available, saying: Instead of having to go through the lengthy process of obtaining the physical specimen from a museum, I can now download a fairly accurate representation from the web.
Lenihan et al (2014). GigaScience, 3:6 http://dx.doi.org/10.1186/2047-217X-3-6; Lenihan, et al (2014): GigaScience Database. http://dx.doi.org/10.5524/100092; Fernández et al (2014) PLOS ONE 9 (5) e96617 http://dx.doi.org/10.1371/journal.pone.0096617
Think about what you do… and what you can do…• Promote- rather than inhibit- prepublication data sharing• Promote Data Citation in the reference section
– incentivizes data release– Makes it easier for reader to find
• Promote Data Sharing upon publication – Consider your data release policies
• Form collaborations with repositories to aid authors in depositing their work– Identify community organizations with metadata standards
• Make data available for reviewers (author website, community repositories, dryad and similar (your publisher?)– at least do a sanity check– Use “data reviewers”
No- this isn’t easy, but do what you can nowAnd work toward the rest
Evolve
It’s Time to Move Beyond Dead Trees
18121665 1869
Thanks to:Scott Edmunds, Executive EditorNicole Nogoy, Commissioning EditorPeter Li, Lead Data ManagerChris Hunter, Lead BioCuratorRob Davidson, Data ScientistXiao (Jesse) Si Zhe, Database DeveloperAmye Kenall, Journal Development Manager
[email protected]@gigasciencejournal.com
@GigaScience
facebook.com/GigaScienceblogs.openaccesscentral.com/blogs/gigablog
Contact us:
Follow us:
www.gigasciencejournal.comwww.gigadb.org