scientific data management - from the lab to the web semantic data management dagstuhl seminar 22-27...
TRANSCRIPT
![Page 1: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/1.jpg)
Scientific Data Management -
From the Lab to the Web
Semantic Data ManagementDagstuhl Seminar22-27 April 2012
José Manuel Gómez Pérez, iSOCO
www.wf4ever-project.org
![Page 2: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/2.jpg)
2
Some factsThe data deluge
Source: IDC ‘s The 2011 Digital Universe Study – Extracting Value from Chaos
» In 2010 the size of the digital universe exceeded 1 Zettabyte (=1 trillion Gb)
» 1.8 Zb in 2011» 35 Zb expected in 2020
» 90% unstructured data» 70% user-generated» 75% resulting from data copying,
merging, and transforming
» Metadata is the fastest growing data category
» Much of such data is dynamic, real-time, volatile
![Page 3: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/3.jpg)
3
Two main challengesDealing with dynamicity
» Challenge 1: Identifying and structuring the relevant portions of the data for the task at hand
› First-class data citizens
» Challenge 2: Managing the lifecycle of data entities
› Preservation› Evolution and versioning› Decay
Both technical and social aspects involved
![Page 4: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/4.jpg)
4
Experiment Results (data)
Scientific Interpretatio
n
Workflows in the Scientific MethodThe Research Lifecycle
Example: Genome-Wide Association Studies
BackgroundHypothesis
AssumptionsInput data
Method
PublicationResults(Data)
![Page 5: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/5.jpg)
5
Workflow-based Science
» A mechanism for coordinating the execution of services and linking together resources.
» The combination of data and processes into a configurable, structured set of steps that implement semi-automated computational solutions in scientific problem-solving
What is a Scientific Workflow?
Scientific workflows are at the core of scientific data management
› Enable automation› Encourage best practices
![Page 6: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/6.jpg)
Challenge 1
Identifying and structuring the relevant portions of the data for the task at
hand
First-class data citizens
![Page 7: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/7.jpg)
7
Questions for Scientific Data and Workflows IssuesWho are you ? Where and when were you born ? Who were your parents (creators) ?
Identity and DescriptionAuthenticityUniqueness
For which purpose were you conceived and have been used ? Reuse, Repurpose
What do you have inside ? InspectionVisualizationAnnotations
How is your content linked ? Graphical Representation
May I access all your parts ? Access Rights
Which parts can I replace ? Adaptability
What have they done to you ? Who and When ? Why did they do that ?
ProvenanceVersioning
Why have you been recommended to me ? Can I believe what you are saying or trust your results ?
Information Quality
Do you still produce the same results ? Reproducibility
Are you still working ?How could I repair you ?
Completeness Stability
How could I thank you ? How could I talk about you ?
Credit
![Page 8: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/8.jpg)
8
Research Objects as Technical ObjectsChallenge 1: Identifying and structuring the relevant data
Carriers of Research Context» Referentiable» Aggregation, Dispersed
› Heterogeneous › Local and External
» Annotated metadata› Provenance› Structured: Manifests,
Recipes, Permissions, Discourse
» Lifecycle › Publishing, Evolution› Versioning
» Mixed Stewardship› Graceful Degradation
» Sharing» Security & Privacy
» Stereotypical User Profiles» Services
Distributed Third Party Tenancy
Alien Store
Technical Objects Social Objects
OAI-ORE
![Page 9: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/9.jpg)
99 9
Research Objects as Social Objects
Package, Explore, Inspect, Review, Exchange, Share, Reuse, Publish, Credit
![Page 10: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/10.jpg)
10
Research Object model core (simplified)http://purl.org/wf4ever/ro#
ro:Resourcero:ResearchObject
ro:Manifest
ro:AggregatedAnnotation
ore:aggregates
ro:annotatesAggregatedResource
wfdesc:Workflow
ore:isDescribedBy
Note: This figure shows a simplified view of the RO core.
RO specification: http://wf4ever.github.com/ro
› ro (aggregation and annotation)› wfdesc (workflow description)› Minim* (minimum info model)› wfprov (workflow provenance)› roprov (RO provenance)› roevo (evolution model)
*Minim based on M. Gamble’s MIM
![Page 11: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/11.jpg)
Challenge 2
Managing the lifecycle of data entities
Evolution and Decay
![Page 12: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/12.jpg)
12
RO Evolution & VersioningChallenge 2: Managing the lifecycle of data entities
![Page 13: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/13.jpg)
13
Workflow Decay• Component level• flux/decay/unavailability• Data level• Infrastructure level
Experiment Decay• Methodological changes• New technologies• New resources/components• New data
RO DecayChallenge 2: Managing the lifecycle of data entities
![Page 14: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/14.jpg)
14
Preservation, Conservation, Recreating
PreservingArchived RecordFixed SnapshotsReviewRerun & Replay
ConservingActive InstrumentLiveRerun & ReuseRepair & Restore
RecreatingArchived RecordActive InstrumentLiveRebuild Recycle Repurpose
![Page 15: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/15.jpg)
15
Possible types of decay (an example)Challenge 2: Managing the lifecycle of data entities
![Page 16: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/16.jpg)
16
A Taxonomy of RO decayDecay Analysis
1. Service tool is missing
2. Service file descriptor disappeared
3. Service up but not contactable
4. Service up but functionality changed
5. Local software dependencies
6. Data unavailability
7. Changes in data formats
8. Chained dependency
9. Credentials deprecated
10. Input data superseded by other data
11. RO metadata outdated (upon versioning)
12. Old fashioned RO
13. External references lose credit
14. Execution framework no longer available
![Page 17: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/17.jpg)
17
Sample decay typeA taxonomy of workflow decay
![Page 18: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/18.jpg)
18
1.0 Certificate – Evaluation of Stability and CompletenessDecay Analysis
Is the RO free from any form of decay preventing workflow execution?
» Focus on reproducibility» Assisted detection of RO decay» Active monitoring on decay forms» RO and workflow provenance
Is the minimal aggregation of resources encapsulated by the RO consistent?
» RO checklists» Produced by scientists» Automatically checked against
minimal model (minim)» RO evolution
Stability Completeness
1.0 Certificate notion originally proposed by Yde de Jong
1.0 Certificate of quality
» Notification» Explanation
![Page 19: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/19.jpg)
19
Lessons learntRecap
» Data with a Purpose
» Encapsulate & Conquer› Goal-driven (purpose)› Aggregation› Community-managed
» Nothing is immutable, especially data.
› Foster evolution › Monitor decay
Scalability
Provenance
![Page 20: Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar 22-27 April 2012 José Manuel Gómez Pérez, iSOCO](https://reader035.vdocuments.net/reader035/viewer/2022070401/56649f175503460f94c2e438/html5/thumbnails/20.jpg)
20
QuestionsThanks for your Attention!
Any Questions?
http://www.wf4ever-project.org/