provenance of scientific information as experienced in driver 6th e-infrastructure concertation...

22
Provenance of scientific information as experienced in DRIVER 6th e-Infrastructure Concertation Event Lyon, 24 th November 2008 Wolfram Horstmann Bielefeld University / DRIVER

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Provenance of scientific information

as experienced in DRIVER

6th e-Infrastructure Concertation Event

Lyon, 24th November 2008

Wolfram HorstmannBielefeld University / DRIVER

Notions of Provenance

• Where do data objects* originate from? – Scientific Work -- examples

• Instrumentation techniques– Manufacturers of hard- and software

• Methodologies– Processes, e.g. gene sequencing

– Technical/Local -- examples

• (web)-identifiers• Database, repository name

* Primary data, documents, metadata …

Why Provenance?

• Quoting / Citing / Referencing as global scientific principle – „Reproducible research“

• Giving credits to authors / creators in distributed environments

• Original location / context has to be known

• Experienced in Grid-Environments [1]

Provenance & Interoperability

• Re-Use / Sharing: “Addressing/Accessing”– Common view, common use– Unidirectional: No change of data objects!

• Federation: “Discovering in Context”– Remote representation of distributed DOs

• Aggregation: “Contextualizing”– Add unchanged object in a context

• Processing/Annotation: “Changing”– Uni- vs. Bidirectional: Change of DOs and remote

representation vs. back-storage (e.g. CVS)

Scenarios in DRIVER

Digital Scientific Data

Digital Object Collections

⊃⊃ ⊃ ⊃

Digital Object Repositories

+ + + +

=

Digital Information Space

Conventional Web Data

„Simple“ Applications

Metadata Infrastructure

Basic Provenance Settings

• Indicate Production Situation– Metadata

• Author, Instrumentation etc.

• Remote Representation– Indicate place of origin in remote systems

• Metadata as digital objects / first order citizens

– Allow lineage respresentation • Credits in remote environments / versioning

Orders of Provenance

• 1st order: Metadata– Provenance attached to data– Minimal „knowledge“ required in application– Allow remote handling of data objects– Require metadata infrastructure– Metadata introduce 2 objects: requires linkage

• 2nd order: context / compounds– Express multiple relations between objects– May introduce semantic model

Provenance in DRIVER #1

• Simple Objects: OAI-PMH [2]

– 1st order provenance • Metadata: minimum OAI-DC

– 2nd order provenance• DRIVER explicit identifiers for repositories• OAI-PMH: inline representation („about“)

Semantic/Compound Data

„Semantic“ Applications

Provenance in DRIVER #2

• „Enhanced Publications“ – Research project in

DRIVER-II– Representation of

data /document packages

– Use of OAI-ORE

Provenance in OAI-ORE

• OAI-ORE: Object Re-Use and Exchange[4] – Uses Resource Maps < Named Graphs– Uses „lineage“ to represent expl. Provenance– Future: explicit provenance model [7] ?

Summary

• Provenance essential for …– Indicating origin in distributed data spaces

• Accessing / Addressing• Federation / Aggregation • Processing / Annotation

– Document and data citation / trace-back– 1st order: describing data > metadata– 2nd order: describing context > semantic data

Lessons learnt in DRIVER

• Use web-enabled Identification (URI/UDDI etc.)– „Dark“ databases don‘t interoperate

• 1st order provenance at place of origin– Requires metadata to describe origin– Enables a metadata infrastructure– Introduces linkage problem

• 2nd order provenance in contexts– Requires data provider identification in federators /

aggregators in order to link back– May require semantic model for context– Would benefit from a semantic infrastructure

Resources[1] On provenance in the eScience / grid-environment

– http://www.sigmod.org/sigmod/record/issues/0509/p31-special-sw-section-5.pdf – In GLITE

• http://www.cesnet.cz/doc/techzpravy/2007/glite-job-provenance/• http://twiki.ipaw.info/bin/view/Challenge

[2] On provenance in OAI-PMH– http://www.openarchives.org/OAI/2.0/guidelines-provenance.htm

[3] On provenance OAI-ORE (referred to as ore:lineage)– http://www.openarchives.org/ore/meetings/Soton/ore_beyond_basics.pdf

(general)– http://www.openarchives.org/ore/1.0/vocabulary (definition)

[4] Named Graphs, Provenance and Trust (Caroll et al. )– http://www4.wiwiss.fu-berlin.de/bizer/SWTSGuide/carroll-ISWC2004.pdf

[5] W3C: On provenance in RDF– http://www.w3.org/2001/12/attributions/

[6] Open Provenance Model– http://eprints.ecs.soton.ac.uk/14979/1/opm.pdf

[7] DRIVER: Digital Repository Infrastructure for European Research– http://www.driver-community.eu