ismb/eccb 2013 keynote goble results may vary: what is reproducible? why do open science and who...

90
results may vary reproducibility, open science and all that jazz Professor Carole Goble The University of Manchester, UK [email protected] c.uk @caroleannegoble Keynote ISMB/ECCB 2013 Berlin, Germany, 23 July 2013

Upload: carole-goble

Post on 27-Jan-2015

110 views

Category:

Education


1 download

DESCRIPTION

Keynote given by Carole Goble on 23rd July 2013 at ISMB/ECCB 2013 http://www.iscb.org/ismbeccb2013 How could we evaluate research and researchers? Reproducibility underpins the scientific method: at least in principle if not practice. The willing exchange of results and the transparent conduct of research can only be expected up to a point in a competitive environment. Contributions to science are acknowledged, but not if the credit is for data curation or software. From a bioinformatics view point, how far could our results be reproducible before the pain is just too high? Is open science a dangerous, utopian vision or a legitimate, feasible expectation? How do we move bioinformatics from one where results are post-hoc "made reproducible", to pre-hoc "born reproducible"? And why, in our computational information age, do we communicate results through fragmented, fixed documents rather than cohesive, versioned releases? I will explore these questions drawing on 20 years of experience in both the development of technical infrastructure for Life Science and the social infrastructure in which Life Science operates.

TRANSCRIPT

Page 1: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

results may vary

reproducibility, open science

and all that jazz

Professor Carole Goble

The University of Manchester, UK

[email protected]

@caroleannegoble

Keynote ISMB/ECCB 2013 Berlin, Germany, 23 July 2013

Page 2: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

“knowledge turning”

[Josh Sommer, Chordoma Foundation]

• life sciences• systems biology• translational

medicine• biodiversity• chemistry• heliophysics• astronomy• social science• digital libraries• language

analysis

New Insight

Goble et al Communications in Computer and Information Science 348, 2013

Page 3: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

automate: workflows, pipeline & service

integrative frameworks

pool, share & collaborate

web systems

nanopub

semantics & ontologiesmachine readable documentation

scientific software

engineering

CSSE

Page 4: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

coordinated execution of services, codes, resourcestransparent, step-wise methodsauto documentation, loggingreuse variants

Page 6: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

• PALS

Page 7: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

reproducibilitya principle of the scientific method

separates scientists from other researchers and normal people

http://xkcd.com/242/

Page 8: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995

datasetsdata collectionsalgorithmsconfigurationstools and appscodesworkflowsscriptscode librariesservices,system software infrastructure, compilershardware

Morin et al Shining Light into Black BoxesScience 13 April 2012: 336(6078) 159-160

Ince et al The case for open computer programs, Nature 482, 2012

Page 9: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

• Workshop Track (WK03) What Bioinformaticians need to know about digital publishing beyond the PDF

• Workshop Track (WK02): Bioinformatics Cores Workshop,

• ICSB Public Policy Statement on Access to Data

Page 10: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

“an experiment is reproducible until another laboratory tries to repeat it.”

Alexander Kohn

hope over experience

even computational ones

Page 11: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

hand-wringing, weeping, wailing, gnashing of teeth.

Nature checklist.

Science requirements for data and code availability.

attacks on authors, editors, reviewers, publishers, funders, and just about everyone.

http://www.nature.com/nature/focus/reproducibility/index.html

Page 12: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

47/53 “landmark” publications could not be replicated

[Begley, Ellis Nature, 483, 2012]

Page 13: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

Nekrutenko & Taylor, Next-generation sequencing data interpretation: enhancing, reproducibility and accessibility, Nature Genetics 13 (2012)

Alsheikh-Ali et al Public Availability of Published Research Data in High-Impact Journals. PLoS ONE 6(9) 2011

59% of papers in the 50 highest-IF journals comply with (often weak) data sharing rules.

Page 14: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

Stodden V, Guo P, Ma Z (2013) Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals. PLoS ONE 8(6): e67111. doi:10.1371/journal.pone.0067111

Required as condition of publication

Required but may not affect decisionsExplicitly encouraged may be reviewed

and/or hostedImplied

No mention

Required as condition of publication

Required but may not affect decisionsExplicitly encouraged may be reviewed

and/or hostedImplied

No mention

170 journals, 2011-2012

Page 15: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

replication gap

1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 142. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950

Out of 18 microarray papers, resultsfrom 10 could not be reproduced

Out of 18 microarray papers, resultsfrom 10 could not be reproduced

More retractions: >15X increase in last decadeAt current % > by 2045 as many papers published as retracted

Page 16: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

re-compute

replicate

rerunrepeat

re-examine

repurpose

recreate

reuse

restore

reconstruct review

regeneraterevise

recycle

conceptual replication “show A is true by doing B rather than doing A again”verify but not falsify[Yong, Nature 485, 2012]

regenerate the figure

redo

[Lewis Carroll]

“When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.”

Page 17: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

reusereproduce

repeat replicate

same experimentsame lab

same experiment

different lab

same experiment

different set up

different experiment

some of same

test

Drummond C Replicability is not Reproducibility: Nor is it Good Science, onlinePeng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.

Page 18: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

validation assurance meets the needs of a stakeholder

e.g. error measurement, documentation

verification complies with a regulation, requirement, specification, or imposed condition

e.g. a model

science review: articles, algorithms, methodstechnical review: code, data, systems

V. Stodden, “Trust Your Science? Open Your Data and Code!” Amstat News, 1 July 2011

Page 19: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

DesignDesign

ExecutionExecution

Result AnalysisResult Analysis

CollectionCollection

PublishPublish

Peer Review

Peer Review

Peer ReusePeer Reuse

defend repeat

review1/certify replicate

review2compare reproduce

transferreuse

* Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)

make&run&document report&review&support

PredictionPrediction

Sound

Page 20: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

Corbyn, Nature Oct 2012fraud

“I can’t immediately reproduce the research in my own laboratory. It took an estimated 280 hours for an average user to approximately reproduce the paper. Data/software versions. Workflows are maturing and becoming helpful”

disorganisation

Phil Bourne

Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE under review.

inherent

Page 21: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

rigour reporting & experimental designcherry picking datamisapplication use of black box software*software misconfigurations, random seed reportingnon-independent bias, poor positive and negative controlsdodgy normalisation, arbitrary cut-offs, premature data triageun-validated materials, improper statistical analysis, poor statistical power, stop when “get to the right answer”

*8% validation Joppa, et al, Troubling Trends in Scientific Software Use SCIENCE 340 May 2013

Page 22: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

http://www.nature.com/authors/policies/checklist.pdf

Page 23: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

• anyone anything anytime

• publication access, data, models, source codes, resources, transparent methods, standards, formats, identifiers, apis, licenses, education, policies

• “accessible, intelligible, assessable, reusable”

http://royalsociety.org/policy/projects/science-public-enterprise/report/

Page 24: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

G8 open data charter

http://opensource.com/government/13/7/open-data-charter-g8

Page 25: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

republic of science*

regulation of science

institution coreslibraries

*Merton’s four norms of scientific behaviour (1942)

public services

Page 26: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

a meta-manifesto (I)• all X should be available and assessable forever

• the copyright of X should be clear• X should have citable, versioned identifiers• researchers using X should visibly credit X’s creators

• credit should be assessable and count in all assessments

• X should be curated, available, linked to all necessary materials, and intelligible

What’s the real issue?

Page 27: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

we do pretty well• major public data repositories• multiple declarations for depositing data• thriving open source community• plethora of data standardisation efforts• core facilities• heroic data campaigns • international and national bioinformatics coordination• diy biology movement

• great stories- Shiga-Toxin strain of E. coli, Hamburg, May 2011, China BGI Open data crowd sourcing effort.

• Oh, wait…University of Münster/University of Göttingen squabble http://www.nature.com/news/2011/110721/full/news.2011.430.html

Page 28: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

hard: patient data(inter)national complicationsbleeding heart paternalism

defensive researchinformed consent

fortresses

Kotz, J. SciBX 5(25) 2012

[John Wilbanks]

http://www.broadinstitute.org/files/news/pdfs/GAWhitePaperJune3.pdf

Page 29: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

massive centralisation – clouds, curated core facilities

long tail massive decentralisation –investigator held datasets

fragmentation & fragility

a data scarcity at point of delivery

RIP data

quality/trust/utility

Acta Crystallographica section B or C

data/code as first class citizen

Page 30: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

we are not bad peoplewe make progress

there was never a golden agethere never is

Page 31: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

a reproducibility paradox

big, fast,complicated, multi-step, multi-type multi-field

expectations of

reproducibility

diy publishinggreater access

Page 32: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

pretty stories shiny results feedback loop

novel, attention grabbing

neat, only positive

review: the direction of science, the next paper, how I would do it.

reject papers purely based on public data

obfuscate to avoid scrutiny

PLoS and F1000 counter

announce a result, convince us its correct

Page 33: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

the scientific sweatshopno resources, time, accountability

getting it published not getting it rightgame changing benefit to justify disruption

Page 34: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

citation distortion

Greenberg How citation distortions create unfounded authority: analysis of a citation network. British Medical Journal 2009, 339:b2680.

[Tim Clark]

Micropublications arxive reference

Simkin, Roychowdhury Stochastic modeling of citation slips. Scientometrics 2005, 62(3):367-384.

Clark et al Micropublications 2013 arXiv:1305.3506

Page 35: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

independent replication studiesself-correcting science

• hostility• hard• resource

intensive• no funding, time,

recognition, place to publish

• invisible to originators“blue collar science”

John Quackenbush

Page 36: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

independent review self-correcting science

• hostility• hard• resource

intensive• no funding, time,

recognition, place to publish

• invisible to originators“blue collar science”

John Quackenbush

Page 37: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

“the questions don’t change but the answers do”* • two years time when the paper is written• reviewers want additional work• statistician wants more runs• analysis may need to be repeated• post-doc leaves, student arrives• new data, revised data• updated versions of algorithms/codesquid pro quo citizenship• trickle down theory: more open more use more credit*others might• meta-analysis • novel discovery• other methods

what is the point: “no one will want it”

* Dan Reed

Page 38: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

emerging reproducible system ecosystemApp Store needed!

Sweave

ReproZip

instrumented desktop tools hosted servicespackaging and archivingrepositories, cataloguesonline sharing platformsintegrated authoringintegrative frameworks

XworX

Page 39: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?
Page 40: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?
Page 41: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

integrated database and journal

http://www.gigasciencejournal.com

copy editing computational workflowsfrom 10 scripts + 4 modules + >20 parameters to Galaxy workflows

galaxy.cbiit.cuhk.edu.hk

2-3 months2-3 weeks

[Peter Li]

made reproducible

Page 42: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

supporting data reproducibility

Data sets

Analyses

Linked to

Linked to

DOI

DOI

Open-Paper

Open-Review

DOI:10.1186/2047-217X-1-18>11000 accesses

Open-Code

8 reviewers tested data in ftp server & named reports published

DOI:10.5524/100044

Open-PipelinesOpen-Workflows

DOI:10.5524/100038Open-Data

78GB CC0 data

Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/>5000 downloads

Enabled code to being picked apart by bloggers in wiki http://homolog.us/wiki/index.php?title=SOAPdenovo2

[Scott Edmunds]

Page 43: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

Here is What I Want – The Paper As Experiment

1. User clicks on thumbnail2. Metadata and a

webservices call provide a renderable image that can be annotated

3. Selecting a features provides a database/literature mashup

4. That leads to new papers

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

PLoS Comp. Biol. 2005 1(3) e34[Phil Bourne]

Page 44: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

"A single pass approach to reducing sampling variation, removing errors, and scaling de novo assembly of shotgun sequences" http://arxiv.org/abs/1203.4802

http://ivory.idyll.org/blog/replication-i.html

born reproducible

http://ged.msu.edu/papers/2012-diginorm/

[C. Titus Brown]

Page 45: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

made reproducible

[Pettifer, Attwood]

http://getutopia.com

Page 46: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?
Page 47: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

The Research Lifecycle

IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

AuthoringTools

Lab Notebooks

DataCapture

SoftwareRepositories

Analysis Tools

Visualization

ScholarlyCommunication

Commercial &Public Tools

Git-likeResources

By Discipline

Data JournalsDiscipline-

Based MetadataStandards

Community Portals

Institutional Repositories

New Reward Systems

Commercial Repositories

Training

[Phil Bourne]

Page 48: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

the neylon equation

Process = Interest

Frictionx

Number peoplereach

Cameron Neylon, BOSC 2013, http://cameronneylon.net/

message #1: lower frictionborn reproducible

Page 49: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

4+1 architecture of reproducibility

“development” view“logical” view

“process” view “physical” view

social scenarios

Page 50: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

rigourreporting

reassemblyrecognition

reviewreuse

resourcesresponsibility

reskilling

“logical view”

Page 51: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

reporting

availability

documentation

Page 52: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

observations• the strict letter of the law• (methods) modeller/ workflow makers vs (data)

experimentalists• young researchers, support from PIs• buddy reproducibility testing, curation help• just enough just in time • staff leaving and project ends• public scrutiny, competition• decaying local systems• long term safe haven commitment• funder commitment from the start

Page 53: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

(Harris and Miller 2011) (Benkler 2011)

(Thomson, Perry, and Miller 2009)(Nowak 2006)

(Malone 2010)(Lusch, Vargo 2008)

(Wood and Gray 1991)(Roberts and Bradley 1991)

(Shrum and Chompalov 2007)

(Clutton-Brock 2009)(Tenopir et al 2011)(Borgman, 2012)

[Kristian Garza]

Page 54: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

scientific ego-systemtrust, reciprocity, collaboration to compete

famecompetitiveadvantage

productivitycredit

adoption kudos

for love

blamescooped uncredited misinterpretation scrutinycostlossdistractionleft behinddependency Fröhlich’s principles of scientific communication (1998)Merton’s four norms of scientific behaviour (1942)

Malone, Laubacher & Dellarocas The Collective Intelligence Genome, Sloan Management Review,(2010)

Page 55: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

• local investment – protective

• collective purchasing– share

• sole provider– broadcast

trade

[Nielson] [Roffel]

local asset economieseconomics of scarce prized

commodities

(Harris and Miller 2011)(Lusch, Vargo 2008)

Page 56: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

• hugging

• flirting

• voyerism

• inertia

• sharing creep

• credit drift

• local control

• code throwaway

Tenopir, et al. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6) 2012

asymmetrical reciprocity

Borgman The conundrum of sharing research data, JASIST 2012

family

friends

acquaintances strangersrivals

ex-friends

Page 57: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

1 0 JA N UA RY 2 0 1 3 | VO L 4 9 3 | N AT U R E | 1 5 9

“all research products and all scholarly labour are equally valued except by promotion and

review committees”

recognition

Page 58: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

message #2

citation is like ♥ not $large data providersinfrastructure codes

“click and run”instrument platformsmake credit count

Rung, Brazma Reuse of public wide gene expression data Nature Review Genetics 2012Duck et al bioNerDS: exploring bioinformatics' database and software use through literature mining. BMC Bioinformatics. 2013 Piwowar et al Sharing Detailed Research Data Is Associated with Increased Citation Rate PLoS ONE 2007

visible reciprocity contract

Page 59: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/, Workshop: Reproducible Research: Tools and Strategies for Scientific Computing

Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)

Page 60: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

in perpetuity

“its not ready yet”, “I need another publication”

shame

“its too ugly”, “I didn’t work out the details”

effort

“we don’t have the skills/resources”, “the reviewers don’t need it”

loss

“the student left”, “we can’t find it”

insecurity

“you wouldn’t understand it”, “I made it so no one could understand it”.

Randall J. LeVeque ,Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM News

Page 61: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

the goldilocks paradox

“the description needed to make an experiment reproducible is too much for the author and too little for the reader”

just enough just in time

José Enrique Ruiz (IAA-CSIC)

Galaxy Luminosity Profiling

Page 62: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

:reducing the

friction of curation

1. Enrich Spreadsheet Template

2. Use in Excel or OpenOffice

3. Extract and Process

RDF Graph

http://www.rightfield.org.uk

Page 63: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

anonymous reuse is hard

nearly always negotiated

Page 64: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

reskilling: software making practices

“As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software”

Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a Computational science: ...Error…why scientific programming does not compute.

Page 65: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

http://sciencecodemanifesto.org/http://matt.might.net/articles/crapl/

Page 66: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

better software

better research

C Titus Brown

Greg Wilson

data carpentry

Page 67: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

a word on reinventing

innovation is algorithms and methodology.

rediscovery of profile stochastic context-free grammars

(re)coding is reproducing.reinvent what is innovative.reuse what is utility.

Sean Eddy

author HMMER and Infernal software suites for sequence analysis

Goble, seven deadly sins of bioinformatics, 35.5K viewshttp://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics

Page 68: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

message #3 placing value on reproducibility

take action

Execution

Organisation

MetricsCulture

Process

[Daron Green]

Page 69: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

(re)assemblyGather the bits together

Find and get the bits

Bits broken/changed/lost

Have other bits

Understand the bits and how to put together

Bits won’t work together

What bit is critical?

Can I use a different tool?

Can’t operate the tool

Who’s job is this?

Page 70: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

specialist codes libraries, platforms, tools

service based

(cloud) hosted services

commodity platforms

data collectionscatalogues software

sepositories

my datamy processmy codes

integrative frameworks

gateways

Page 71: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

Methods(techniques, algorithms, spec of the steps)

Instruments(codes, services, scripts, underlying libraries)

Laboratory(sw and hw infrastructure, systems software, integrative platforms)

Materials(datasets, parameters, seeds)

Experiment

repeat(re-run)

replicate(regenerate)

reproduce(recreate)

reuse(repurpose/extend)

Setup

Actors

Results

OrigDiff

snapshot spectrum

Page 72: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

interactivelocal & 3rd party independent resourcesshielded heterogeneous infrastructures

BioSTIF

materials

method

instruments and laboratory

use workflowscapture the steps

standardised pipelinesauto record of experiment and set-up report & variant reusebuffered infrastructure

Page 73: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

use provenance the link between computation and results

static verifiable recordtrack changesrepairpartially repeat/reproducecarry citationcalc data quality/trustselect data to keep/releasecompare diffs/discrepancies

W3C PROV standardPDIFF: comparing provenance traces to diagnose divergence across experimental results [Woodman et al, 2011]

Page 74: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

“an experiment is as transparent as the visibility of its steps”

black boxes

closed codes & services, proprietary licences, magic cloud services, manual manipulations, poor provenance/version reporting, unknown peer review, mis-use, platform calculation dependencies

Joppa et al SCIENCE 340 May 2013; Morin et al Science 336 2012

Page 75: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

dependencies & changedegree of self-contained preservationopen world, distributed, alien hosted

data/software versions and accessibility hamper replicationspin-rate of versions

[Zhao et al. e-Science 2012]

“all you need to do is copy the box that the internet is in”

Page 76: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

portability /

variabilitysameness

availabilityopen

descriptionintelligibility

[Adapted Freire, 2013]

preservation & distributionpackaging

gatherdependencies

capture steps

VM

Reproducibilityframework

Page 77: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

packaging bickering

byte execution

virtual machine

black box

repeat

description

archived record

white box

reproduce

data+compute co-location cloud

packagingELIXIR Embassy Cloud

“in-nerd-tia”

Page 78: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

big data big compute

community facilitiescloud host costs and confidence

data scalesdump and file

capability

Page 79: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

“the reproducible window”all experiments become less reproducible over time

icanhascheezburger.com

how, why and what mattersbenchmarks for codes

plan to preserverepair on demand

description persistsuse frameworks

partial replicationapproximate reproduction

verificationresults may vary

message #4:

Sandve, Nekrutenko, Taylor, Hovig Ten simple rules for reproducible in silico research, PLoS Comp Bio submitted

Page 80: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

message #5: puppies aren’t freelong term reliability of hosts

multiple stewardship fragmented

business modelsreproducibility service industry

24% NAR services unmaintained after three years Schultheiss et al. (2010) PLoS Comp

Page 81: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

the meta-manifesto• all X should be available and assessable forever• the copyright of X should be clear• X should have citable, versioned identifiers• researchers using X should visibly credit X’s creators• credit should be assessable and count in all assessments• X should be curated, available, linked to all necessary materials, and intelligible

• making X reproducible/open should be from cradle to grave, continuous, routine, and easier

• tools/repositories should be made to help, be maintained and be incorporated into working practices

• researchers should be able to adapt their working practices, use resources, and be trained to reproduce

• cost and responsibility should be transparent, planned for, accounted and borne collectively

• we all should start small, be imperfect but take action. Today.

http://www.force11.org

Page 82: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

research is like software

Jennifer Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012

• evolution of a body • fork, pull, merge• subpart different cycles,

stewardship, authors• refactored granularity• software release

practices for workflows, scripts, services, data and articles

• thread the salami across parts, repositories and journals

• chop up and micro-attribute Faculty1000

Page 83: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

http://www.researchobject.org/

bundles and relates digital resources of a scientific experiment or investigation using standard mechanisms

http://www.w3.org/community/rosc/

Page 84: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

towards a release app store• checklists for

descriptive reproducibility

• packaging for multi-hosted research (executable) components

• exchange between tools and researchers

• framework for research release and threaded publishing using core standards

TT43 Lounge 81

Page 85: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

those messages again

• lower friction, born reproducible• credit is like love• take action, use (workflow) frameworks • prepare for the reproducible window • puppies aren’t free

Page 86: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

final message

The revolution is not an apple that falls when it is ripe. You have to make it drop.

Page 87: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

acknowledgements• David De Roure• Tim Clark• Sean Bechhofer• Robert Stevens• Christine Borgman • Victoria Stodden• Marco Roos• Jose Enrique Ruiz del Mazo• Oscar Corcho• Ian Cottam• Steve Pettifer• Magnus Rattray• Chris Evelo• Katy Wolstencroft• Robin Williams• Pinar Alper• C. Titus Brown• Greg Wilson• Kristian Garza

• Wf4ever, SysMO, BioVel, UTOPIA and myGrid teams

• Juliana Freire• Jill Mesirov• Simon Cockell• Paolo Missier• Paul Watson• Gerhard Klimeck• Matthias Obst• Jun Zhao• Pinar Alper• Daniel Garijo• Yolanda Gil• James Taylor• Alex Pico• Sean Eddy• Cameron Neylon• Barend Mons• Kristina Hettne• Stian Soiland-Reyes• Rebecca Lawrence

Page 88: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

Mr Cottam

10th anniversary today!

Page 89: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

https://twitter.com/csmcr/status/361835508994813954

[Jenny Cham]

summary

Page 90: ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?

Further Information• myGrid

– http://www.mygrid.org.uk• Taverna

– http://www.taverna.org.uk• myExperiment

– http://www.myexperiment.org• BioCatalogue

– http://www.biocatalogue.org• SysMO-SEEK

– http://www.sysmo-db.org• Rightfield

– http://www.rightfield.org.uk• UTOPIA Documents

– http://www.getutopia.com• Wf4ever

– http://www.wf4ever-project.org• Software Sustainability Institute

– http://www.software.ac.uk• BioVeL

– http://www.biovel.eu• Force11

– http://www.force11.org

• http://reproducibleresearch.net• http;//reproduciblescience.org