001 phil campbell reproducibility and data at nature npg 13-11-14

34
Research data and reproducibility at Nature Philip Campbell Publishing Better Science through Better Data meeting NPG 14-11-14

Upload: deependra-kumar-ban

Post on 16-Nov-2015

218 views

Category:

Documents


2 download

DESCRIPTION

reproducibility of data in nature

TRANSCRIPT

Title of presentation

Research data and reproducibility at NaturePhilip CampbellPublishing Better Science through Better Data meetingNPG14-11-14

ContentsData: opportunities and costsReproducibility: Natures approaches

Aspiration: all scientific literature online, all data online, and for them to interoperateJust one idea of what this aspiration might look like: the Collage Authoring Environment for an executable paper.

Static content (the body of the publication) is extended by interactive elements. readers can access primary data and reenact computations in order to validate the presented conclusions or navigate result spaces. Subject to the authors approval, readers can also obtain access to the underlying code of the experiments presented in the publication. It is a web based infrastructure, which can be integrated with the publishers portal.

Closing the concept-data gap

Maintaining the credibility of science

Exploiting the data deluge & computational potential

Combating fraud

Addressing planetary challenges

Supporting citizen science

Responding to citizens demands for evidence

Restraining the Database StateWhy is open data an urgent issue? Henry Oldenburg: the scientific journal and the process of peer review Henry Oldenburg (1619-1677) was a German theologian who became the first Secretary of the Royal Society. He corresponded with the leading scientists of Europe, and believed that rather than waiting for entire books to be published, letters were much better suited to quick communication of facts or new discoveries. He invited people to write to him, even laymen who were not involved with science but had discovered some item of knowledge. He no longer required that science be conveyed in Latin, but in any vernacular language. From these letters the idea of printing scientific papers or articles in a scientific journal was born. In creating the Philosophical Transactions of the Royal Society in 1665, he wrote: "It is therefore thought fit to employ the [printing] press, as the most proper way to gratify those [who] . . . delight in the advancement of Learning and profitable Discoveries [and who are] invited and encouraged to search, try, and find out new things, impart their knowledge to one another, and contribute what they can to the Grand Design of improving Natural Knowledge . . . for the Glory of God . . . and the Universal Good of Mankind." Oldenburg also initiated the process of peer review of submissions by asking three of the Societys Fellows who had more knowledge of the matters in question than he, to comment on submissions prior to making the decision about whether to publish.

REFAaron Klug, (2000) Address of the President, Sir Aaron Klug, O.M., P.R.S., Given at the Anniversary Meeting on 30 November 1999, Notes Rec. R. Soc. Lond. 2000 54, 99-108.Marie Boas Hall, Henry Oldenburg: Shaping the Royal Society (Oxford: Oxford University Press 2002).

Intelligent opennessOpenness of data per se has no value. Open science is more than disclosure

Data must be:AccessibleIntelligibleAssessableRe-usable

Only when these four criteria are fulfilled are data properly openMETADATAIn science at least, it is clear that openness must be intelligent in order to really deliver the scrutiny that is key to scientific progress. It must be accessible, assessable, intelligible and reusable.

Metadata must be audience-sensitive

Scientific data rarely fits neatly into an EXCEL spreadsheet!

The transition to open dataPathfinder disciplines where benefit is recognised and habits are changing

Bioinformatics (-omics disciplines)Biological scienceParticle physicsNanotechnologyEnvironmental scienceLongitudinal societal dataAstronomy & space science

e.g. Gene Omnibus 2700 GEO uploads by non-contributors in 2000 led to 1150 papers (>1000 additional papers over the 16 that would be expected from investment of $400,000)

Databases as publicationsHosts/suppliers of databases are publishersThey have a responsibility to curate and provide reliable access to content.They may also deliver other services around their productsThey may provide the data as a public good or charge for access

Worldwide Protein Data Bank (wwPDB)

Worldwide Protein Data Bank (wwPDB)The Worldwide Protein Data Bank (wwPDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids. As of January 2012, it held 78477 structures. 8120 were added in 2011, at a rate of 677 per month. In 2011, an average of 31.6 million data files were downloaded per month. The total storage requirement for the repository was 135GB for the archive. The total cost for the project is approximately $11-12 million per year (total costs, including overhead), spread out over the four member sites. It employs 69 FTE staff. wwPDB estimate that $6-7 million is for data in expenses relating to the deposition and curation of data.

UK Data ArchiveThe UK Data Archive, founded 1967, is curator of the largest collection of digital data in the social sciences in the United Kingdom. UKDA is funded mainly by Economic and Social Research Council, University of Essex and JISC, and is hosted at University of Essex.On average around 2,600 (new or revised) files are uploaded to the repository monthly. (This includes file packages, so the absolute number of files is higher.) The baseline size of the main storage repository is