a graph model for data and workflow provenance · 2019. 2. 25. · a graph model for data and...

68
A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche, & Stijn Vansummeren TaPP 2010

Upload: others

Post on 19-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

A graph model for data and workflow provenance

Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska,

Jan van den Bussche, & Stijn Vansummeren

TaPP 2010

Page 2: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Provenance in ...• Databases

• Mainly for (nested) relational model

• Where-provenance ("source location")

• Lineage, why ("witnesses")

• How/semiring model

• Relatively formal

• Workflows

• Many different systems

• Many different models

• (converging on OPM?)

• Graphs/DAGs

• Relatively informal

Page 3: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Provenance in ...• Databases

• Mainly for (nested) relational model

• Where-provenance ("source location")

• Lineage, why ("witnesses")

• How/semiring model

• Relatively formal

• Workflows

• Many different systems

• Many different models

• (converging on OPM?)

• Graphs/DAGs

• Relatively informal

?????

Page 4: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

This talk

• Relate database & workflow "styles"

• Develop a common graph formalism

• Need a common, expressive language that

• supports many database queries

• describes some (simple) workflows

Page 5: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Previous work

• Dataflow calculus (DFL), based on nested relational calculus (NRC)

• Provenance "run" model by Kwasnikowska & Van den Bussche (DILS 07, IPAW 08)

• "Provenance trace" model for NRC

• by (Acar, Ahmed & C. '08)

• Open Provenance Model (bipartite graphs)

• (Moreau et al. 2008-9), used in many WF systems

Page 6: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

NRC/DFL background

• A very simple, functional language:

• basic functions +, *,... & constants 0,1,2,3...

• variables x,y,z

• pair/record types (A:e,...,B:e), πA (e)

• collection (set) types

• {e,...} e ∪ e {e | x in e'} ∪e

Page 7: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

Page 8: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

• Suppose R = {(1,2,3), (4,5,6), (9,8,7)}

Page 9: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

• Suppose R = {(1,2,3), (4,5,6), (9,8,7)}

sum { x * y | (x,y,z) in R, x < y}

Page 10: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

• Suppose R = {(1,2,3), (4,5,6), (9,8,7)}

sum { x * y | (x,y,z) in R, x < y}

= sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}}

Page 11: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

• Suppose R = {(1,2,3), (4,5,6), (9,8,7)}

sum { x * y | (x,y,z) in R, x < y}

= sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}}

= sum {1 * 2, 4 * 5}

Page 12: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

• Suppose R = {(1,2,3), (4,5,6), (9,8,7)}

sum { x * y | (x,y,z) in R, x < y}

= sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}}

= sum {1 * 2, 4 * 5}

= sum {2,20}

Page 13: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

• Suppose R = {(1,2,3), (4,5,6), (9,8,7)}

sum { x * y | (x,y,z) in R, x < y}

= sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}}

= sum {1 * 2, 4 * 5}

= sum {2,20}

= 22

Page 14: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Another example

• In DFL, built-in functions / constants can be whole programs & files,

• as in Provenance Challenge 1 workflow:

let WarpParams := {align_warp(img,hdr})

| (img,hdr) in Inputs} in

let Reslices := {reslice(wp)

| wp in WarpParams} in

softmean(Reslices)

Page 15: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Goal: Define "provenance graphs" for DFL

Page 16: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Goal: Define "provenance graphs" for DFL

let WarpParams := {align_warp(img,hdr}) | (img,hdr) in Inputs} inlet Reslices := {reslice(wp) | wp in WarpParams} inin softmean(Reslices)

Page 17: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Goal: Define "provenance graphs" for DFL

let WarpParams := {align_warp(img,hdr}) | (img,hdr) in Inputs} inlet Reslices := {reslice(wp) | wp in WarpParams} inin softmean(Reslices)

http://www.flickr.com/photos/schneertz/679692806/

Page 18: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

First step: values

c

<>{}

...

elem

elem A1

An

v

v

v

or

v

v

or ...copyvor

Page 19: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Example value

1

<>

{}elem

elem

A

B2

3

<>A

B

Page 20: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Next step: evaluation nodes ("process")

c ...

1

n

e

fe

x letx

body

e

e

Constants,primitive functions

Variables & temporary bindings

head

Page 21: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Pairing

...

A1

An

e

<>e

πAe

Record building

Field lookup

Page 22: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Conditionals

iftest

then

e

eif

test

else

e

e

Note: Only taken branch is recorded

Page 23: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Sets: basic operations

{}e

∪1

2

e

e

Empty set

Singleton

Union

Page 24: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Sets: complex operations

∪e

forx

head

body

e

e

ebody...

Flattening

Iteration

Page 25: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Provenance graphs

• are graphs with "both value and evaluation structure"

!

" #

$

% &

!

"

#$%&" '

(

)

#$%&"

'

(

*

+

, '

'-(

./01"

2/34

(5 6%4"

./01!

2/34

$%&"

6%4"$%&"

!

" #

$

%

&'(

)

#

*%

+,-&'(

#

%

./01

Page 26: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

A bigger example

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

Page 27: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

Value structure

Page 28: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

Value structure

{} {}{}

C

C

2

C

CC

C

C

C

T

{}

{}

{}

{}<>

1

2

1 <>

1 C

{}

C

C

CC

C F

C

Page 29: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

Input values

{} {}{}

C

C

2

C

CC

C

C

C

T

{}

{}

{}

{}<>

1

2

1 <>

1 C

{}

C

C

CC

C F

C

Page 30: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

Return value

{} {}{}

C

C

2

C

CC

C

C

C

T

{}

{}

{}

{}<>

1

2

1 <>

1 C

{}

C

C

CC

C F

C

Page 31: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

Expression structure

Page 32: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

Expression structure

=

fstx

snd

empty

let R

let S

for ysUfor x

if

if

R=

{}x

snd

fst

fst

sndy

+

Page 33: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Building provenance graphs

• is complicated

• Here we'll use high-level "graph rewrite rule" formalism

• Mostly because it is nicer to look at than formal version

Page 34: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

cc c

ff f(v1,...,vn)

1

nvn

v1

...

1

nvn

v1

...

letx

head

body

v

ex

head

body

v

ex

letx copy

copy

Page 35: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

πAi<>

A1

Anvn

v1

...

<>

A1

Anv

v

...πAi...

vi copy...

...vi

<>A1

Anvn

v1

...

A1

Anvn

v1

... <> <>

A1

An

Page 36: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

if

e2

e1

True

e1

Trueif

test

then

test

then

else

if

e2

e1

False

e2

Falseif

test

else

test

then

elsecopy

copy

Page 37: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

empty?{} {} empty? False

...

elem

elem

v

v

...

elem

elem

v

v

empty?{} {} empty? True

Page 38: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

∅ ∅∅

{}

...

elem

elem

v

v

{}

...

elem

elem

v

v

{}

v

elem

v {}{}

{}∪...

v

v

...

v

v

elem

elem

{}

elem

elem

{}

elem

elem

......

Page 39: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

OK, take a deep breath!

Page 40: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

e

e

x copy

x copy

forx

head

body

{}

ex

...

elem

elem

vn

v1

head

body

{}

...elem

elem

vn

v1

body

forx {}

elem

elem

...

...

elem

elem

v

v

{}

...

elem

elem

v

v

{}

elem

elem

{} ∪

...

elem

elem

v

v

{}

...

elem

elem

v

v

{}

elem

elem

{} {}∪

elem

elem

......

...

Page 41: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

forx

head

body

{}elem

elem

2

1

+1

x

Page 42: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

forx

head

body

{}elem

elem

2

1

+1

x

Page 43: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

head

body

{}elem

elem

2

1

+1

forx {}

elem

elem

+1

x C

x C

Page 44: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

head

body

{}elem

elem

2

1

+1

forx {}

elem

elem

+1

x C

x C

Page 45: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

head

body

{}elem

elem

2

1

+

forx {}

elem

elem

+

x C

x C

1 1

1 1

Page 46: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

head

body

{}elem

elem

2

1

+

forx {}

elem

elem

+

x C

x C

1 1

1 1

Page 47: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

An example

head

body

{}elem

elem

2

1

forx {}

elem

elem

x C

x C

1 1

1 1

+ 2

+ 3

Page 48: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Graphs can "lie" (inconsistency)

•+ 5

2

2

Page 49: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Graphs can "lie" (inconsistency)

•+ 5

2

2

if copy

2

True test

else

Page 50: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Graphs can "lie" (inconsistency)

•+ 5

2

2

if copy

2

True test

else

4

3

2

1

head

body

elem

elem

body

forx {}

elem

elem

{}

3

4

Page 51: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Graphs can "lie" (inconsistency)

•+ 5

2

2

if copy

2

True test

else

4

3

2

1

head

body

elem

elem

body

forx {}

elem

elem

{}

3

4

"Locally" but not "globally" consistent

Page 52: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Graph queries

• Many possible approaches

• In paper: some Datalog

• Maybe overkill, seems fragile

• In code: some "annotation propagation" traversals

• Seems to handle where, "explanations", "summaries"

Page 53: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

Explaining

Page 54: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

Explaining

Page 55: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

Explaining

Page 56: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

Explaining

Note: Smallest consistent subgraph (NOT

transitive closure!)

Page 57: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

Summarizing

Page 58: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

!

"

#$%

&'()

*+,$-.

/&'()0

&'() 1

0

0

2+3

4#

5678 %8$%

2+3

%98-

"

#$%

&'()

*+,

$-./

&'() 0

&'()1

0

1 8:(%) 4#

;<=$8 %8$%

2+38=$8

#'6+"&'() 98<.

&'()

>'.)

&'()>'.)

2+3

?2+3

@

)

#$%

&'() $-.A

&'()

0&'()

1

#'6+)&'() 98<.

1

>'.)

2+3=8%+@

98<.

2+3

>'.)

=8%+!

98<.

&'()

>'.)

0

12+3

0

12+3

&'()

+ 2

Summarizing

{}1

1

Page 59: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Graphs are partially "replayable"

• If we change a value node, can try to "readjust" to recover consistency

• Formalized in (Acar, Ahmed, Cheney 08)

+ 4

2

2

Page 60: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Graphs are partially "replayable"

• If we change a value node, can try to "readjust" to recover consistency

• Formalized in (Acar, Ahmed, Cheney 08)

+ 4

2

17

Page 61: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Graphs are partially "replayable"

• If we change a value node, can try to "readjust" to recover consistency

• Formalized in (Acar, Ahmed, Cheney 08)

+

2

17

19

Page 62: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Graphs are partially "replayable"

• If we change a value node, can try to "readjust" to recover consistency

• Formalized in (Acar, Ahmed, Cheney 08)

+

2

17

19

if copy

2

test

else

False

Page 63: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Graphs are partially "replayable"

• If we change a value node, can try to "readjust" to recover consistency

• Formalized in (Acar, Ahmed, Cheney 08)

+

2

17

19

if copy

2

True test

else

Page 64: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Graphs are partially "replayable"

• If we change a value node, can try to "readjust" to recover consistency

• Formalized in (Acar, Ahmed, Cheney 08)

+

2

17

19

if

2

True test

else

Stuck!

????

Page 65: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Implementation in Haskell

• Summarized in paper, full code on request

• roughly 250 LOC for basic evaluator

• another 300 for graphviz translation, basic queries, examples

• Point?

• No claim of efficiency/scalability but easy to understand, experiment

• Elucidates some tricky details that pictures hide

• Similar "lightweight modeling" might be valuable for understanding/relating other WF/DB models

Page 66: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Related work• This work synthesizes/rearranges ideas from

several previous works & "folklore"

• traces (Acar, Ahmed, Cheney 2008)

• runs (Kwasnikowska, van den Bussche, DILS 2007, IPAW 2008)

• OPM graphs (Moreau et al. IPAW 2008 etc.)

• and many workflow systems

• More can be done to relate DB & workflow models

Page 67: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Future work

• This is work in progress

• Next steps:

• Extending to understand/model other workflow features

• Better grasp of "real" queries and features needed

• Implementa(tion|ability)?

• Optimization?

Page 68: A graph model for data and workflow provenance · 2019. 2. 25. · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan

Conclusions

• DB & WF provenance have much in common

• We develop common graph model

• with both intuitive & precise presentations

• Still much to do to relate and integrate DB & WF models

• let alone integrate models at scale in real systems