open provenance model tutorial session 2: opm overview and semantics

51
Open Provenance Model Tutorial Session 2: OPM Overview and Semantics Luc Moreau [email protected] University of Southampton

Upload: dewitt

Post on 24-Feb-2016

66 views

Category:

Documents


0 download

DESCRIPTION

Open Provenance Model Tutorial Session 2: OPM Overview and Semantics. Luc Moreau [email protected] University of Southampton. Session 2: Aims. In this session, you will learn about: The Open Provenance Model The definition of its abstract model The inferences it supports - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Open Provenance Model Tutorial Session 2: OPM Overview and Semantics

Luc [email protected] of Southampton

Page 2: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Session 2: Aims

In this session, you will learn about:• The Open Provenance Model• The definition of its abstract model• The inferences it supports• Various efforts to provide OPM with a

semantics

Page 3: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Session 2: Contents

• Requirements and non-requirements• Definition of OPM• Specialization of OPM with Profiles• Formalizations of OPM

Page 4: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

OPM (NON-)REQUIREMENTS

Page 5: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

OPM Requirements

• To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model.

• To allow developers to build and share tools that operate on such provenance model.

• To define the model in a precise, technology-agnostic manner.

• To define bindings to XML/RDF separately• To support a digital representation of provenance for any

“thing”, whether produced by computer systems or not

Page 6: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

OPM Non-Requirements

• OPM does not specify the internal representations that systems have to adopt to store and manipulate provenance internally.

• OPM does not specify protocols to store such provenance information in provenance repositories.

• OPM does not specify protocols to query provenance repositories.

Page 7: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

7

OPM Layered Model

OPM Core

OPM Essential Profiles: Collections, Attribution

OPM Domain Specialization: Workflow, Web

Tech

nolo

gy B

indi

ngs:

XM

L, R

DF

OPM Sig OPM

bas

ed A

PIs:

reco

rd, q

uery

Page 8: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

THE OPEN PROVENANCE MODEL (OPM)

Page 9: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Open Provenance Model

• Allow us to express all the causes of an item– e.g., provenance of a bottle of wine includes:

• Grapes from which it is made• Where those grapes grew• Process in the wine’s preparation• How the wine was stored• Between which parties the wine was transported, e.g. producer to

distributer to retailer• Where it was auctioned

• Allow for process-oriented and dataflow oriented views

• Based on a notion of annotated causality graph

Page 10: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Nodes

• Artifact: Immutable piece of state, which may have a physical embodiment in a physical object, or a digital representation in a computer system.

• Process: Action or series of actions performed on or caused by artifacts, and resulting in new artifacts.

• Agent: Contextual entity acting as a catalyst of a process, enabling, facilitating, controlling, affecting its execution.

A

P

Ag

Page 11: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Edges

A1 A2

P1 P2wasTriggeredBy

wasDerivedFrom

A Pused(R)

APwasGeneratedBy(R)

Ag PwasControlledBy(R)

Edge labels are in the past to express that these are used to describe past executions

Page 12: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Illustration

• Process “used” artifacts and “generated” artifact

• Edge “roles” indicate the function of the artifact with respect to the process (akin to function parameters)

• Edges and nodes can be typed

Causation chain:• P was caused by A1 and A2• A3 and A4 were caused by P• Does it mean that A3 and A4

were caused by A1 and A2?

P

A1 A2

A3 A4

used(divisor)used(dividend)

wasGeneratedBy(rest)wasGeneratedBy(quotient)

type=division

Page 13: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Hierarchical Descriptions (1)

P

A1 A2

A3 A4

used(r2)used(r1)

wasGeneratedBy(r3)wasGeneratedBy(r4)

Page 14: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Hierarchical Descriptions (2)

P1

A1 A2

A3 A4

used(r2)used(r1)

wasGeneratedBy(r3)wasGeneratedBy(r4)

P2Drill down

Page 15: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Hierarchical Descriptions (3)

P

A1 A2

A3 A4

used(r2)used(r1)

wasGeneratedBy(r3)wasGeneratedBy(r4)

P1

A1 A2

A3 A4

used(r2)used(r1)

wasGeneratedBy(r3)wasGeneratedBy(r4)

P2

If these two graphs denote the same execution, it is not true that A4 was caused by A1; hence dependencies between artifacts need to be asserted explicit

Page 16: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Explicit Data Derivations (1)

P

A1 A2

A3 A4

used(r2)used(r1)

wasGeneratedBy(r3)wasGeneratedBy(r4)

P1

A1 A2

A3 A4

used(r2)used(r1)

wasGeneratedBy(r3)wasGeneratedBy(r4)

P2

If these two graphs denote the same execution, it is not true that A4 was cause by A1; hence dependencies between artifacts need to be asserted explicit

wasDerivedFrom wasDerivedFromwasDerivedFromwasDerivedFrom

Page 17: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Explicit Data Derivations (2)

Causation chain:• P was caused by A1 and

A2• A3 and A4 were caused

by P• A3 was caused by A1

and A2• A4 was caused by A1

and A2

P

A1 A2

A3 A4

used(divisor)used(dividend)

wasGeneratedBy(rest)wasGeneratedBy(quotient)

type=division

was

Deriv

edFr

om

was

Deriv

edFr

om

wasDer

ivedF

rom

wasDerivedFrom

Page 18: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Provenance of Physical Objects

Page 19: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Another Account of a same Execution

Page 20: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Accounts

• Mechanism by which multiple descriptions of a same execution can co-exist in a same OPM graph

• Different accounts may be provided by different observers (or asserters)

• Accounts can overlap if they have some OPM subgraph in common

• An account can be a refinement of another, if it provides more details– Support for hierarchical descriptions

• Accounts may be conflicting!

Page 21: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Accounts

• Account is like a graph colouring

• Nodes/edges are asserted to belong to some accounts

Bake execution

Bad Bake execution

Both executions

Page 22: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

OPM SEMANTICS

Page 23: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Completion Rules

P1

P2

P1

P2

A

Equivalence

A1

A2

A1

A2

P

Converse does notnecessarily hold

Page 24: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Inferences

• Transitivity of edges connecting an artifact

• Starred edge “was Caused by”

• What we can infer is defined by transitive closure

A

AA/P1

A/P2

A

A

AA/P1

A/P2

*

Page 25: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

WasTriggeredBy is not transitive

• By completion, there exists A12 generated by P1 and used by P2

• By completion, there exists A23 generated by P2 and used by P3

• A23 could have been generated before A12 was used

P1

P3

P1

P3

*P2

Page 26: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

OPM Inferences

Page 27: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Valid OPM Graphs

• WasDerivedFrom* is acyclic within one account – Intuition: a data item cannot be derived from itself– Note: cycles may exist in multiple accounts

• An artifact can be generated by at most one process in a given account

Page 28: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Time Information

• Causality implies time ordering, but not the converse

• Time regarded as crucial information in the provenance of data (though time does not imply causality)

• The model specifies constraints that time information must satisfy with respect to causal dependencies

Page 29: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Time Constraints

A Pused(R) AwasGeneratedBy(R)

Ag

wasControlledBy(R)start: T2end: T5

T4T3

T1<T3 (artifact must exist before being used)T2<T3 (process must have started before using artifacts)T3<T5 (process uses artifacts before it ends)T2<T4 (process must have started before generating artifacts)T4<T5 (process generates artifacts before it ends)T4<T6 (artifact must exist before being used)T2<T5 (process must have started before ending)no constraint between t3 and t4

wasGeneratedBy(R)

T1

used(R)

T6

Page 30: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Annotations

• All OPM entities (edges, nodes, graphs, accounts can be annotated)

• All annotations should be addressable (allowing for annotations of annotations)

• Bindings to formalize how annotations can be serialized (standard in RDF, custom in XML)

• Reserved properties: hasType, hasValue, ...

Let’s no reinvent the wheel!

Page 31: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

OPM SPECIALIZATIONS

Page 32: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Concept of a Profile

• A specialisation of an OPM graph for a specific domain or to handle a specific problem

• Profile definitions are welcome!• Note: profile multiplicity challenges inter-

operability• A profile has a unique identity• Defines vocabulary, guidelines, expansion

guidance, serialisation format

Page 33: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Profile Compliance

PROFILE•Id•Vocabulary•Guidance•Expansion directives•Serialisation

ProfileCompliant

Graph

Profile-expandedGraph

Profile Expansion

Page 34: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Inferred Graph 2

Profile Compliance

ProfileCompliant

Graph

Profile-expandedGraph

Inferred Graph1

OPM Inference

Page 35: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Emerging Profiles

• Emerging Profiles– Collections– Dublin Core– D-Profile

• Will be discussed in separate session

Page 36: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

OPM FORMALIZATIONS

Page 37: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Early Formalizations

• OPM v1.00 and OPMv1.01 contained a set-theoretic definition of OPM and permitted inferences

• Moved out of OPMv1.1 since it is difficult to keep specification and formalization in sync

• While the formalization is useful in defining OPM precisely, it does not give OPM a meaning!

Page 38: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Reproducibility Semantics (Moreau 2010)

• Sees OPM graph as an executable program:– Each process is associated with the name of an

executable primitive– Primitive environment maps primitive names to

primitives• PrimitiveEnv = PrimitiveNamePrimitive• Primitive = P(RoleValue) P(RoleValue)

– Graph factories to create new artifacts, new processes …

Page 39: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Reproducibility Semantics (Moreau 2010)

• An execution of an OPM graph results in– A new OPM graph, describing re-execution– A mapping between nodes of the original graph

and the resulting graph• Execution proceeds by ordering processes

(assumes acyclicity) and re-executing them, one by one; for each process executed, new process node and new output artifacts are created by factory

Page 40: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Reproducibility Semantics (Moreau 2010)

Page 41: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Temporal Semantics(Kwasnikowska, Moreau, Van den Bussche 2010)

• Timepoints– create(A): creation of artifact A– begin(P), end(P): beginning and end of process P– use(P,r,A): use of artifact A in role r, by process P

• Temporal theory Th(G) of a graph G is a set of inequalities: e.g.,– begin(P)≤create(A) for any generated-by edge AP– create(A)≤end(P) for any used edge PA

• Temporal interpretation of G is a triple (T, , τ)• A temporal interpretation satisfies u≤v if τ(u) τ(v)• A temporal model of G is a is a temporal interpretation that satisfies

all inequalities from Th(G)• Logical consequence G u≤v⊨ if it is satisfied in every temporal model

of G.

Page 42: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Temporal Semantics(Kwasnikowska, Moreau, Van den Bussche 2010)

• OPM Inference: G A⊢ P • Why this set of inference rules?• Characterization of OPM inference rules in the

form of a soundness and completeness result

Cases not involving use-timepoints– G begin(P)≤create(A) iff G A⊨ ⊢ P

Cases involving use-timepoints– G begin(P)≤use(Q,r,A) iff G some pattern⊨ ⊢

Page 43: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Temporal Semantics(Kwasnikowska, Moreau, Van den Bussche 2010)

Refinement of two OPM graphs

• Let us consider two OPM graphs G and H,• For any timepoints u,v of both G and H,• G is refined by H• If G u≤v then H u≤v ⊨ ⊨

Page 44: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Causality Semantics (Cheney 2010)

• Exploits Halpern and Pearl’s causal theory of explanation

• The semantics of an OPM graph is a causal function, mapping graph inputs to outputs

• Provenance semantics P f approximates locally a function f, if for any u1, …, un

[[P f(u1, …, un)]]τ=fτ(u1, …, un) for some intervention τ fixing some inputs of f

Page 45: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Workflow Semantics (Missier and Goble 2010)

• Two functions:– W2G: Workflow × Trace OPM Graph– G2W: OPM Graph Workflow

• Two properties:– Plausible workflow:

• W2G(G2W(g),T)=g– Lossless-ness:

• G2W(W2G(w,T))=w

• Define W2G and G2W for Taverna workflow language• Introduce annotations to be able to reconstruct Taverna

iterations• In essence, provide a semantics for OPM by composing

G2W and Taverna semantics

Page 46: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Provenance Vocabulary Mappings(Sahoo et al 2010)

OPM selected as the reference provenance model. • First, because OPM is a general and broad model that

encompasses many aspects of provenance. • Second, it already represents a community effort that spans

several years and is still ongoing, already benefiting from many discussions, practical use, and several versions.

• Finally, many groups are already undergoing efforts to map their vocabularies to OPM, and in addition there are already some mappings (called profiles in OPM) developed by the OPM group to some existing vocabularies.

Page 47: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Conclusions on OPM Semantics

• Four novel semantics of OPM published in 2010

• Deal with different subsets of OPM• Not all fully “compatible” with OPM v1.1• Grand theory of OPM is still an open problem

Page 48: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

CONCLUSION AND OPEN ISSUES

Page 49: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Conclusions

• Over 14 teams have implemented the OPM specification for a successful inter-operability exercise PC3

• Open source governance model for OPM• OPM1.1 published and to be used in PC4• OPM consists of a common core found in many

provenance vocabularies• What beyond? – Define useful profiles– Finalize semantics

Page 50: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Open Issues (inter-operability)

• List of technical issues: agents, annotations, time, streamed data, collections, mutable objects

• How to express queries over OPM graphs?• Security: attribution and non-repudiation• API for recording and querying• How to inter-operate in a distributed system?

Page 51: Open Provenance Model Tutorial Session 2:  OPM Overview and Semantics

Open Issues (research)

• Accounts• Relations between accounts: refinement,

overlap, alternate• Reasoning with conflicting provenance• Reasoning with incomplete provenance• Can we formalise profiles?