pulverer-embo-source data-nfdp13

46
EMBO SourceData – Next Gen Open Accesss Bernd Pulverer Chief Editor | The EMBO Journal Head | Scientific Publication

Upload: datadryad

Post on 27-Jan-2015

109 views

Category:

Education


3 download

DESCRIPTION

Presentation by Bernd Pulverer on EMBO's 'Source Data' and the next generation of open access given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK

TRANSCRIPT

Page 1: Pulverer-embo-source data-nfdp13

EMBO SourceData– Next Gen Open Accesss

Bernd PulvererChief Editor | The EMBO JournalHead | Scientific Publications

Page 2: Pulverer-embo-source data-nfdp13

Data transparency

Page 3: Pulverer-embo-source data-nfdp13

Scientific publishing– Dominant channel for the

dissemination of peer-reviewed data.

– Journals function as a proxy for quality in research assessment

– The rate of publishing keeps

increasing.– Papers are human-readable but

poorly machine-readable.

5/27

Page 4: Pulverer-embo-source data-nfdp13

Title

Abstract

Synopsis

Main paper

Supp Info

Datasets

The Research Paper

Page 5: Pulverer-embo-source data-nfdp13

Title

Abstract

Synopsis

Main paper

Supp Info

Datasets

Expert View

The Research Paper

Page 6: Pulverer-embo-source data-nfdp13

‘Expert View’• All the data required to support the conclusions included in

the paper.• ‘General reader’ vs. ‘expert’ view of the paper:

– Expandable/collapsible ‘inline’ sections, – Copy edited.

• Restricted to select types of data and information:– Replicates– Controls, experimental optimization– ‘Negative’ results– Extended experimental protocols – Computational algorithms

• Datasets presented as separate files.• No further reaching data

6

Page 7: Pulverer-embo-source data-nfdp13

Title

Abstract

Synopsis

Main paper

Expert View

DatasetsSource data

Page 8: Pulverer-embo-source data-nfdp13

What is a figure?

A scientific result converted into a collection of pixels

8/27

Page 9: Pulverer-embo-source data-nfdp13

Discoverable, rich content

‘I’m a great believer in seeing all the data – this is a very important lever that we have for transparency’

Michael Farthing, founder COPE

Page 10: Pulverer-embo-source data-nfdp13

SourceData

Tools to publish figures as structured digital objects that link the human-readable illustrations with machine-readable metadata and ‘source data’ in order to• improve data transparency (ethics)• make published data (re)useable• enable data-oriented search

9/27

Page 11: Pulverer-embo-source data-nfdp13

Metadata

•Focus on the biological content•Use standard identifiers and existing controlled vocabularies

Search

•Data-oriented semantic search of the literature.•Overcome some of the limitations of keyword-based search

10/27

SourceData

Data

•Figure source data files hosted by the journals•Link to data repositories

Page 12: Pulverer-embo-source data-nfdp13

•Archive

•Transparency

•Revisualization

•Reuse

•Integration

•Search

•Discourage

manipulation

o voluntaryo ~40% papers

Page 13: Pulverer-embo-source data-nfdp13

12/27

Page 14: Pulverer-embo-source data-nfdp13

No

No

Yes

Yes

Data Transparency

Page 15: Pulverer-embo-source data-nfdp13
Page 16: Pulverer-embo-source data-nfdp13

Metadata

•Focus on the biological content•Use standard identifiers and existing controlled vocabularies

Search

•Data-oriented semantic search of the literature.•Overcome some of the limitations of keyword-based search

10/27

SourceData

Data

•Figure source data files hosted by the journals•Link to data repositories

Page 17: Pulverer-embo-source data-nfdp13

Structured metadata:‘perturbation-observation-assay’

1. ‘Object-oriented’ representation of experimental variables: list biological components.

2. Retain the causality of the experimental design: “Measurement of Y as a function of A, B, C, using assay P in biological system S.”

3. Machine-readable representation with standard identifiers.

measured componentmeasured component

perturbed componentperturbed component

experimental system

15/27

assayed property

Page 18: Pulverer-embo-source data-nfdp13

Data copy editors

18

Page 19: Pulverer-embo-source data-nfdp13

Data

•Figure source data files hosted by the journals•Link to data repositories

Metadata

•Focus on the biological content•Use standard identifiers and existing controlled vocabularies

Search

•Sata-oriented semantic search of the literature.•Overcome some of the limitations of keyword-based search

10/27

SourceData

Page 20: Pulverer-embo-source data-nfdp13

Data-oriented search

Page 21: Pulverer-embo-source data-nfdp13

Resulting hypothesis: test drug Z in disease D.

tissue Ttissue T disease D

disease D

gene xgene x

Pap

er 3

protein X protein X PPkinase Ykinase Y

Pap

er 2

kinase Ykinase Y activityactivitydrug Zdrug Z

Pap

er 1

Data-oriented search

19/27

Page 22: Pulverer-embo-source data-nfdp13

Data-oriented search

CREBforskolin CREBforskolin CREBforskolin CREBtime

Query: More-like-this:

17/27

Page 23: Pulverer-embo-source data-nfdp13

sdAnnotations:annotationID a sdCore:PerturbationMeasurmentExp; :linkedToPanel sdPanels:panelID; :hasVariable sdVariables:variable1; :hasVariable sdVariables:variable2; :usingBiologicalSystem sdBiolSystem:biolSystemNode; :basedOnSourcedataset sdSourceDatasets:dsID .

‘Next Generation’ Open Access

Data SearchMetadata

Page 24: Pulverer-embo-source data-nfdp13

24

Page 25: Pulverer-embo-source data-nfdp13

Raw, rare, well done...?

Page 26: Pulverer-embo-source data-nfdp13
Page 27: Pulverer-embo-source data-nfdp13

From raw to processed data

Page 28: Pulverer-embo-source data-nfdp13

A data ‘ecosystem’

data accesssearch

ReaderReader

paperdata

AuthorAuthor

SourceDataSourceData

JournalsJournals Data repositoriesData repositories

26/27

Page 29: Pulverer-embo-source data-nfdp13

Distributed infrastructure

Database

Journals

Users

Users

Res

earc

h da

taR

esea

rch

data

Page 30: Pulverer-embo-source data-nfdp13
Page 31: Pulverer-embo-source data-nfdp13
Page 32: Pulverer-embo-source data-nfdp13
Page 33: Pulverer-embo-source data-nfdp13

Smad3

Hey1

TGFbetaVE-cdh

Rad51 foci

AR

Tsc2

1 4

6 2 5

3

1,4

4

5

6

2

Rad51Nuclear

complexesTGFb, Smad3

Page 34: Pulverer-embo-source data-nfdp13

Literature search engines

PubMed72%

PubMed72%

Europe PMC<2%

Europe PMC<2%

Google17%

Google17%

Page 35: Pulverer-embo-source data-nfdp13

Data are published in papers

7/27

Page 36: Pulverer-embo-source data-nfdp13

‘Publishing’ papers

‘Depositing’ datasets

Page 37: Pulverer-embo-source data-nfdp13

Availability of published data and software

• Datasets obtained by experimentation, computation or data mining, should be made freely available, without restriction.

• Software should be described in sufficient detail to allow reproduction. If a specific implementation is the focus of the study, free access for non-commercial users is strongly recommended.

• Deposition of data should preferably be in one of the public databases prior to submission.

Page 38: Pulverer-embo-source data-nfdp13

Data deposition

Large-scale datasets, sequences, atomic coordinates and computational models should be deposited in one of the relevant public databases prior to submission (provided private access is available at the database) and authors should include accession codes in the Materials & Methods section.

Page 39: Pulverer-embo-source data-nfdp13

BigData

Page 40: Pulverer-embo-source data-nfdp13

Public databases

Structural data PDB, NDB, EMDataBankFunctional genomics GEO, ArrayExpressProteomics Pride, PeptideAtlas, PASSEL

PPI IMEx consortium

Clinical genomics datasets EGA, dbGAP

Metagenomics Genbank

Computational models BioModels, JWS

Page 41: Pulverer-embo-source data-nfdp13

search

Page 42: Pulverer-embo-source data-nfdp13

SourceData

Data

•Figure source data files hosted by the journals•Link to ‘unstructured data’ repositories

Metadata

•Focus on the biological content•Use standard identifiers and existing controlled vocabularies

Search

•Data-oriented semantic search of the literature.•Overcome some of the limitations of keyword-based search

10/27

Page 43: Pulverer-embo-source data-nfdp13

43

Page 44: Pulverer-embo-source data-nfdp13
Page 45: Pulverer-embo-source data-nfdp13
Page 46: Pulverer-embo-source data-nfdp13