karma provenance framework v2 provenance challenge workshop/ggf18

22
Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University

Upload: cooper

Post on 20-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18. Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University. Outline. Architecture of Karma Workflow Setup & Collecting Provenance Provenance Traces “canonical” Challenge Queries Suggested Variations. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

Karma Provenance Framework v2

Provenance Challenge Workshop/GGF18

Yogesh L. SimmhanBeth Plale, Dennis Gannon, Srinath Perera

Indiana University

Page 2: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[2/25]2006-09-13

Outline

Architecture of Karma

Workflow Setup & Collecting Provenance

Provenance Traces

“canonical” Challenge Queries

Suggested Variations

Page 3: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[3/25]2006-09-13

Provenance Collection: Challenges & Uses Linked Environments for Atmospheric Discovery

(LEAD) project Weather & Severe Storm Prediction Applications

Provenance on workflow (process) & data products at fine granularity

Dynamic, Long running workflows Helps scientists to

search for workflows & data products estimate data quality, track workflow execution, and analyze & mine data products from runs

Page 4: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[4/25]2006-09-13

Karma Provenance Framework Lightweight – do not duplicate existing

metadata cataloging effort myLEAD personal metadata catalog ResCat service & data registry

Glue to integrate metadata on data & services with runtime workflow information

Scalability1 – 500 users, 100’s of workflows, 10,000’s of data products

[1] [1] Performance Evaluation of the Karma Provenance Framework, Simmhan, Y., et al.; IPAW, 2006

Page 5: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[5/25]2006-09-13

Karma Provenance Framework Key Provenance Activities generated

during lifetime of wrokflow Workflow | Service Invoked Data Consumed Data Produced Sending Response

Activities modeled as XML messages Published asynchronously by service|

workflow|client Presently use WS-Eventing messaging system

Activities stored in relational database

Page 6: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[6/25]2006-09-13

Karma Provenance ServiceKarma Provenance Service

ProvenanceListener

ProvenanceListener

ActivityDB

ActivityDB

Karma Architecture1

Workflow Instance10 Data Products Consumed & Produced by each Service

Workflow Instance10 Data Products Consumed & Produced by each Service

Service2

Service2 ……Service

1Service

1Service

10Service

10Service

9Service

910P/10C

10C

10P 10C 10P/10C

10P

Workflow Engine

Workflow Engine

Message Bus WS-Eventing Service API Message Bus WS-Eventing Service API WS-Messenger

Notification BrokerWS-Messenger

Notification Broker

Publish Provenance Activities as async

Notifications

ServiceInvoked & Sending Response, Data–Produced & –ConsumedActivities

WorkflowInvoked & SendingResponse Activities

ProvenanceQuery API

ProvenanceQuery API

Provenance Browser ClientProvenance

Browser Client

Query for Workflow, Process,& Data Provenance

Subscribe & Listen toActivity Notifications

[1] A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Simmhan, Y., et al., Submitted to ICWS Conference, 2006

Page 7: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[7/25]2006-09-13

Provenance Challenge Workflow Applications modeled as web-services

Generic Factory toolkit creates web-service wrappers for command-line applications

Service invokes a shell-script/application, passing command-line arguments

Created services automatically instrumented to generate provenance using Karma client library

Workflow composed as GPEL* script XBaya Workflow composer GUI Central GPEL workflow engine orchestrates

execution

*Grid Process Execution Language, an extension of the Business Process Execution Language (BPEL)

Page 8: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[8/25]2006-09-13

Provenance Challenge Workflow

Page 9: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[9/25]2006-09-13

Provenance Traces – Building Block Queries Data Provenance: get[Recursive]DataProvenance

What (ID), where (URL), when (Timestamp) How (Process, inputs)

Page 10: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[10/25]2006-09-13

Provenance Traces – Building Block Queries Process Provenance: getProcessProvenance

What (ID), when (Timestamp), who (Invoker) State (execution/completion status) Input & Output data products

Page 11: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[11/25]2006-09-13

Provenance Traces – Building Block Queries Workflow Trace: getWorkflowTrace

What (ID), when (Timestamp), who (Invoker) State (execution/completion status) Process provenance of workflow steps

Page 12: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[12/25]2006-09-13

Page 13: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[13/25]2006-09-13

Provenance Challenge Queries ! Answered by Karma Service API Directly Answered by Karma Service API,

with post-processing by client ~ Answered by access to backend DB (SQL) Not answered

Query 1 2 3 4 5 6 7 8 9

Result ! ! ~ ~ ~ ~

Page 14: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[14/25]2006-09-13

Provenance Challenge Queries: Q1 Find everything that caused Atlas X Graphic to be

as it is ! Answered by Karma Service API Directly This is the recursive data provenance of the Atlas

X Graphic file A call to

getRecursiveDataProvenance(

‘lead:uuid:1157946992-atlas-x.gif’)

returns this [www]

Page 15: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[15/25]2006-09-13

Provenance Challenge Queries: Q2 Find the process that led to Atlas X Graphic,

excluding all prior to softmean Answered by Karma Service API, with post-

processing by client1. First call getDataProvenance2. Then recursively get data provenance till

‘SoftmeanService’ is seenReturns this [www]

1. let $dataList := ['lead:uuid:1157946992-atlas-x.gif']2. while ($dataList != empty) do // get data provenance for this level a. $dataProvenance = karma.getDataProvenance($dataList[0]) // print process information & remove data from list b. Print $dataProvenance; $dataList.delete(0) c. if ($dataProvenance.getProducedBy() == 'SoftmeanService') break; // found

Softmean. Stop. // get input data used by this data & recurse up the tree d. foreach ($inputData in $dataProvenance.getUsingData()) do i. $dataList.add($inputData) 3. End

Page 16: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[16/25]2006-09-13

Provenance Challenge: Q4 Find all invocations of align_warp with parameter "-m 12"

that ran on a Monday ~ Answered by access to backend DB (SQL)1. Use SQL query to get matching invocations

2. Call getProcessProvenance to get description of align_warpReturns this [www]

SELECT invokee.workflow_id, invokee.service_id, invokee.workflow_node_id, invokee.workflow_timestep, invoker.workflow_id, invoker.service_id, invoker.workflow_node_id, invoker.workflow_timestep

FROM invocation_state_table invocation, entity_table invokee, entity_table invoker, notification_table notifications

WHERE invokee.entity_id = invocation.invokee_id AND invoker.entity_id = invocation.invoker_id AND notifications.source_id = invocation.invokee_id AND notifications.notification_type = 'ServiceInvoked' AND invokee.service_id =

'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' AND notifications.notification_xml LIKE'%<ModelMenuNumber>12</ModelMenuNumber>%‘AND DayOfWeek(invocation.request_receive_time) = 2; // 1=Sunday, 2=Monday, ...

Page 17: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[17/25]2006-09-13

Provenance Challenge: Q9 Find all the graphical atlas sets that have

metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.

Not answered We do not expect to answer such queries through

the provenance system We push the provenance information to external

metadata management systems such as MyLEAD, which can answer such “join” queries on data product metadata and provenance

Page 18: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[18/25]2006-09-13

Variations of Workflow Workflows with loops Workflows whose structure changes

dynamically or, as a simpler case, workflows with

conditional branches Hierarchical composition of workflows

workflows invoking other workflows ~Similar to user-views (UPenn), nested-

workflows (myGrid), …

Page 19: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[19/25]2006-09-13

Variations of Queries Find all [workflows | processes] with a

particular execution status [completed | failed | waiting for input] Dynamic attribute of provenance?

Query for client view and service view of the provenance Check for differences

Page 20: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

AcknowledgementsAlek Slominski (GPEL Engine)

Satoshi Shirasuna (XBaya Composer)

LEAD Members

NSF

Questionswww.extreme.indiana.edu/

karma

Page 21: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[21/25]2006-09-13

More here [www]

Sample Activities Published

Page 22: Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18

[22/25]2006-09-13

Karma DB Schema