data models for preserving and publishing digital research material beyond the pdf

62
Data models for digital preservation and publishing beyond the PDF Jun Zhao, Mark Thompson, Kristina Hettne, Stian Soiland, Susana Garcia , Marco Roos Acknowledging Harish Dharuri, Susanna Sansone, Philipe Rocca-Sera, Alejandra Gonzales-Beltran, Albert Mons, Arie Baak, Erik Schultes, Carole Goble, Barend

Upload: leiden-university-medical-center

Post on 10-May-2015

857 views

Category:

Lifestyle


2 download

DESCRIPTION

Slides for the Technology Track of ISMB/ECCB 2013 in Berlin on digital publishing, highlighting the Research Object model, Nanopublications, and ISA as a means to capture methods and results when research is carried out digitally. This work was supported by the EU workflow forever project (http://wf4ever-project.org).

TRANSCRIPT

Page 1: Data models for preserving and publishing digital research material beyond the PDF

Data models for digital preservation and publishing beyond the PDF

Jun Zhao, Mark Thompson, Kristina Hettne, Stian Soiland, Susana Garcia , Marco Roos

Acknowledging Harish Dharuri, Susanna Sansone, Philipe Rocca-Sera,

Alejandra Gonzales-Beltran, Albert Mons, Arie Baak, Erik Schultes, Carole Goble, Barend Mons

The Workflow Forever project (EU FP7 nr. 270192), Digital Libraries and Digital Preservation. (ICT-2009.4.1)

Page 2: Data models for preserving and publishing digital research material beyond the PDF

Recording your computational steps…

Bioinformaticians have no labbooks!and no training on digital notekeeping

http://graemefielder.wordpress.com/2010/09/17/lab-books-evolution-required/

Page 3: Data models for preserving and publishing digital research material beyond the PDF

State of the art study capture?

Page 4: Data models for preserving and publishing digital research material beyond the PDF

How then?Workflows encapsulate in silico analysis

http://ap27-cgla.blogspot.nl/ http://openi.nlm.nih.gov/detailedresult.php?img=2743669_1471-2105-10-252-2&req=4

Page 5: Data models for preserving and publishing digital research material beyond the PDF

5

Components to understand an experimentIs a workflow enough?

Workflow: Which biological pathways explain the

associations?

Interpret results(Interaction

pathways in the cell)

Research QuestionGenome Wide Association Studies (GWAS)

In 1000+ people: which gene mutations are associated with metabolic syndrome,

and why?

Download data- External DB

- Existing Knowledge

Hypothesis

Genes involved in inflammation pathways are

involved in the onset of metabolic syndrome.

Page 6: Data models for preserving and publishing digital research material beyond the PDF

6

Components to understand an experimentIs a workflow enough?

Workflow: Which biological pathways explain the

associations?

Interpret results(Interaction

pathways in the cell)

Research QuestionGenome Wide Association Studies (GWAS)

In 1000+ people: which gene mutations are associated with metabolic syndrome,

and why?

Download data- External DB

- Existing Knowledge

Hypothesis

Genes involved in inflammation pathways are

involved in the onset of metabolic syndrome. Preserve

PreservePreserve

Preserve

Preserve

Page 7: Data models for preserving and publishing digital research material beyond the PDF

Research Object

DataData

Method/Experimental

protocol

Method/Experimental

protocol

FindingsFindings

Types of resources

ISA-TAB/ISA2OWL

Nanopublication

ISA-TAB/ISA2OWLWfdesc

ISA-TAB/ISA2OWLWfdesc

Data Models

Capture more than workflows

Page 8: Data models for preserving and publishing digital research material beyond the PDF

Research Object ModelPreservation for understanding

Preserve at least the:– Hypothesis

– A workflow-like sketch

– One or more workflows

– Input data

– Workflow runs

– Results

– Conclusion

My Research Book

Page 9: Data models for preserving and publishing digital research material beyond the PDF

9

Fame and Glory

It was me, me,

me!

What I found

How I found

it

HDAC1 interacts with Parvb

Discovered by: me

Nanopublication

AssertionProvenance of Assertion

Metadata of nanopublication

Page 10: Data models for preserving and publishing digital research material beyond the PDF

Prototyping the models

• Create: myExperiment• Better: Checklist service• Evolution: Digital Library software• Curation: Quality Monitoring Service• Credit original assertions: LandMark Tool• Applications by private partners

Page 11: Data models for preserving and publishing digital research material beyond the PDF

myExperiment- create Research Objects

Prototyping the Research Object Data Model in

Page 12: Data models for preserving and publishing digital research material beyond the PDF
Page 13: Data models for preserving and publishing digital research material beyond the PDF
Page 14: Data models for preserving and publishing digital research material beyond the PDF
Page 15: Data models for preserving and publishing digital research material beyond the PDF
Page 16: Data models for preserving and publishing digital research material beyond the PDF
Page 17: Data models for preserving and publishing digital research material beyond the PDF
Page 18: Data models for preserving and publishing digital research material beyond the PDF
Page 19: Data models for preserving and publishing digital research material beyond the PDF
Page 20: Data models for preserving and publishing digital research material beyond the PDF
Page 21: Data models for preserving and publishing digital research material beyond the PDF
Page 22: Data models for preserving and publishing digital research material beyond the PDF
Page 23: Data models for preserving and publishing digital research material beyond the PDF
Page 24: Data models for preserving and publishing digital research material beyond the PDF
Page 25: Data models for preserving and publishing digital research material beyond the PDF
Page 26: Data models for preserving and publishing digital research material beyond the PDF

Checklist service- make better Research Objects

Prototyping the Research Object Data Model in

Page 27: Data models for preserving and publishing digital research material beyond the PDF
Page 28: Data models for preserving and publishing digital research material beyond the PDF

http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API

Page 29: Data models for preserving and publishing digital research material beyond the PDF

http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API

Page 30: Data models for preserving and publishing digital research material beyond the PDF

RELEASE! http://www.wf4ever-project.org/wiki/display/docs/RO+checklist+evaluation+API

Page 31: Data models for preserving and publishing digital research material beyond the PDF

Digital Library software- evolution of a Research Object

Prototyping the Research Object Data Model in

Page 32: Data models for preserving and publishing digital research material beyond the PDF
Page 33: Data models for preserving and publishing digital research material beyond the PDF
Page 34: Data models for preserving and publishing digital research material beyond the PDF
Page 35: Data models for preserving and publishing digital research material beyond the PDF

Research Object ‘under construction’

Page 36: Data models for preserving and publishing digital research material beyond the PDF

Snaphots to record intermediate states

Page 37: Data models for preserving and publishing digital research material beyond the PDF

Full copy ‘Ready for Release’

Page 38: Data models for preserving and publishing digital research material beyond the PDF

Quality Monitoring Service- Long term curation

Prototyping the Research Object Data Model in

Page 39: Data models for preserving and publishing digital research material beyond the PDF
Page 40: Data models for preserving and publishing digital research material beyond the PDF
Page 41: Data models for preserving and publishing digital research material beyond the PDF
Page 42: Data models for preserving and publishing digital research material beyond the PDF
Page 43: Data models for preserving and publishing digital research material beyond the PDF
Page 44: Data models for preserving and publishing digital research material beyond the PDF
Page 45: Data models for preserving and publishing digital research material beyond the PDF

Landmark Claim Tool- mark and credit the first discovery

Prototyping the Nanopublication Model

Page 46: Data models for preserving and publishing digital research material beyond the PDF

Landmark Claim Tool

Core data

Attribution

Qualification

Page 47: Data models for preserving and publishing digital research material beyond the PDF

Applications from private partners- Robust tools for business stakeholders

Prototyping the Nanopublication Model

Page 48: Data models for preserving and publishing digital research material beyond the PDF

Nanopublication applicationsEuretos Company

Copyright Euretos b.v. 2013

48

Releases planned for 2014

Page 49: Data models for preserving and publishing digital research material beyond the PDF

Some gory detailData models ‘under the hood’

Page 50: Data models for preserving and publishing digital research material beyond the PDF

50

Research Object Model at a glance

Research Object

ResourceResource

Resource

AnnotationAnnotation

Annotation

oa:hasTarget

ResourceResource

Annotation graphoa:hasBody

ore:aggregatesManifest

ore:isDescribedBy

For more information and extensions (Evolution model, MINIM) seehttp://wf4ever-project.org/

Page 51: Data models for preserving and publishing digital research material beyond the PDF

Extensions

Page 52: Data models for preserving and publishing digital research material beyond the PDF

52

Wf4Ever architecture

Semantic REST API

RDF triple store(RO structure, Annotations)

RO indexUploaded files

PortalChecklist service

Command line

Workflow runner

...

Page 53: Data models for preserving and publishing digital research material beyond the PDF

Nanopublication Data Model

Assertion

Nanopublication URL

Provenance PublicationInfo

assertion

opm:was

DerivedFrom

http://rdf.biosemantics.org/…profiles_matching_198

0_2010

opm:wasGene-ratedBy

thisnanopub

dcterms:created

2012-03-28T11:32

^̂ xsd:dateTime

pav:authored-

By

associa-tion

a sio:statis-ticalAssociation

sio:has-measurementValue

Association_1_p_value

a

Sio:probability-value

sio:has-value

6.56 e-5

^̂ xsd:float

sio:refers-to

http://bio2rdf.org/

omim:210600

researcherid.com/rB-6035-

2012

dcterms:DOI

http://dx.doi.org/

….

…http://

bio2rdf.org/geneid:55835

Integrity Key

An Individual association between concepts:• statement or declaration• measurement• hypothetical inference• quantitative or qalitative

Guarantee immutabilityafter publication

Unique, persistent and resolvable identifier

How this assertion came to be, methods,

evidence, context, etc.

• Detailed attribution for authors, institutions, lab technicians, curators

• License info• Publication date

Page 54: Data models for preserving and publishing digital research material beyond the PDF

Assertion

http://www.store.net/mynanopub.rdf

Provenance Publication-Info

SoapDenovo 2 increases correct

assembly length by 3-80 times over Soapdenovo 1

pav:authoredBydc:rights

dc:created

A Galaxy workflow

results

slides

hypothesis

ro:aggregate

s

Research object can link to a nanopub as

an experimental result

ro:aggregates

Page 55: Data models for preserving and publishing digital research material beyond the PDF

Assertion

http://www.store.net/mynanopub.rdf

Provenance Publication-Info

SoapDenovo 2 increases correct

assembly length by 3-80 times over Soapdenovo 1

pav:authoredBydc:rights

dc:created

A Galaxy workflow

results

slides

hypothesis

ro:aggregate

s

Nanopublication gains detailed

workflow provenance by

linking to RO

ro:aggregates

rdf:describedBy

Page 56: Data models for preserving and publishing digital research material beyond the PDF

Assertion

http://www.store.net/mynanopub.rdf

Provenance Publication-Info

SoapDenovo 2 increases correct

assembly length by 3-80 times over Soapdenovo 1

pav:authoredBydc:rights

dc:created

A Galaxy workflow

results

ro:aggregates

slides

hypothesis

ro:aggregate

s

Extend your provenance!

E.g. link the claim to the original data elements

from which it was derived

rdf:describedBy

Page 57: Data models for preserving and publishing digital research material beyond the PDF

Assertion

http://www.store.net/mynanopub.rdf

Provenance Publication-Info

SoapDenovo 2 increases correct

assembly length by 3-80 times over Soapdenovo 1

pav:authoredBydc:rights

dc:created

A Galaxy workflow

results

ro:aggregates

slides

hypothesis

ro:aggregate

s

?rdf:describedBy

Page 58: Data models for preserving and publishing digital research material beyond the PDF

Community effort

• Research Objectshttp://researchobjects.org/http://wf4ever-project.org/

• Nanopublicationhttp://Nanopub.org/

• ISA-toolshttp://www.isa-tools.org/

• Research Objects Community Group at W3Chttp://w3.org/community/rosc

Page 59: Data models for preserving and publishing digital research material beyond the PDF

W3C community group for ROhttp://www.w3.org/community/rosc/

Page 60: Data models for preserving and publishing digital research material beyond the PDF

Conclusions (1/2)

• Applications of RO and Nanopublication data models to capture the bioinformatics research process ‘beyond the PDF’

• Data models:ISA, Research Objects, Nanopublications

Page 61: Data models for preserving and publishing digital research material beyond the PDF

Conclusions (2/2)

• Reference implementations / first to adopt:myExperiment, DLibra, Checklist service, Curation/monitoring, Landmark tool

• Private partners developing stable nanopublication applications

• Prevent perfectionism of the developers:get involved now!

Page 62: Data models for preserving and publishing digital research material beyond the PDF

THANK YOU FOR YOUR ATTENTION

http://researchobject.org/ http://nanopub.org/ http://isa-tools.org/ Research Object Community group at W3C: http://w3.org/community/rosc