karma provenance framework v2 provenance challenge workshop/ggf18 yogesh l. simmhan beth plale,...

21
Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University

Post on 19-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Karma Provenance Framework v2

Provenance Challenge Workshop/GGF18

Yogesh L. SimmhanBeth Plale, Dennis Gannon, Srinath Perera

Indiana University

[2/25]2006-09-13

Outline

Architecture of Karma

Workflow Setup & Collecting Provenance

Provenance Traces

“canonical” Challenge Queries

Suggested Variations

[3/25]2006-09-13

Provenance Collection: Challenges & Uses Linked Environments for Atmospheric Discovery

(LEAD) project Weather & Severe Storm Prediction Applications

Provenance on workflow (process) & data products at fine granularity

Dynamic, Long running workflows Helps scientists to search for workflows & data

products, Track workflow execution, Analyze & mine data products from runs

[4/25]2006-09-13

Karma Provenance Framework Lightweight – do not duplicate existing

metadata cataloging effort myLEAD personal metadata catalog ResCat service & data registry

Glue to integrate metadata on data & services with runtime workflow information

Scalability1 – 500 users, 100’s of workflows, 10,000’s of data products

[1] [1] Performance Evaluation of the Karma Provenance Framework, Simmhan, Y., et al.; IPAW, 2006

[5/25]2006-09-13

Karma Provenance ServiceKarma Provenance Service

ProvenanceListener

ProvenanceListener

ActivityDB

ActivityDB

Karma Architecture2

Workflow Instance10 Data Products Consumed & Produced by each Service

Workflow Instance10 Data Products Consumed & Produced by each Service

Service2

Service2 ……Service

1Service

1Service

10Service

10Service

9Service

910P/10C

10C

10P 10C 10P/10C

10P

Workflow Engine

Workflow Engine

Message Bus WS-Eventing Service API Message Bus WS-Eventing Service API WS-Messenger

Notification BrokerWS-Messenger

Notification Broker

Publish Provenance Activities as Notifications

Application–Started & –Finished, Data–Produced & –ConsumedActivities

Workflow–Started & –Finished Activities

ProvenanceQuery API

ProvenanceQuery API

Provenance Browser ClientProvenance

Browser Client

Query for Workflow, Process,& Data Provenance

Subscribe & Listen toActivity Notifications

[2] A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Simmhan, Y., et al., Submitted to ICWS Conference, 2006

[6/25]2006-09-13

Provenance Challenge Workflow Applications modeled as web-services

GFac toolkit creates service for command-line applications

Service invokes a shell-script wrapper of the application, passing command-line arguments

Created services automatically instrumented to generate provenance using Karma client library

Workflow composed as GPEL* script XBaya Workflow composer GUI Central GPEL workflow engine orchestrates

execution

*Grid Process Execution Language, an extension of the Business Process Execution Language (BPEL)

[7/25]2006-09-13

Provenance Challenge Workflow

[8/25]2006-09-13

Provenance Traces Data Provenance: get[Recursive]DataProvenance

What (ID), where (URL), when (Timestamp) How (Process, inputs)

[9/25]2006-09-13

Provenance Traces Process Provenance: getProcessProvenance

What (ID), when (Timestamp), who (Invoker) State (execution/completion status) Input & Output data products

[10/25]2006-09-13

Provenance Traces Workflow Trace: getWorkflowTrace

What (ID), when (Timestamp), who (Invoker) State (execution/completion status) Process provenance of workflow steps

[11/25]2006-09-13

[12/25]2006-09-13

Provenance Challenge Queries ! Answered by Karma Service API Directly Answered by Karma Service API,

with post-processing by client ~ Answered by access to backend DB (SQL) Not answered

Query 1 2 3 4 5 6 7 8 9

Result ! ! ~ ~ ~ ~

[13/25]2006-09-13

Provenance Challenge Queries: Q1 Find everything that caused Atlas X Graphic to be

as it is ! Answered by Karma Service API Directly This is the recursive data provenance of the Atlas

X Graphic file A call to

getRecursiveDataProvenance(

‘lead:uuid:1157946992-atlas-x.gif’)

returns this [www]

[14/25]2006-09-13

Provenance Challenge Queries: Q2 Find the process that led to Atlas X Graphic,

excluding all prior to softmean Answered by Karma Service API, with post-

processing by client1. First call getDataProvenance2. Then recursively get data provenance till

‘SoftmeanService’ is seenReturns this [www]

1. let $dataList := ['lead:uuid:1157946992-atlas-x.gif']2. while ($dataList != empty) do // get data provenance for this level a. $dataProvenance = karma.getDataProvenance($dataList[0]) // print process information & remove data from list b. Print $dataProvenance; $dataList.delete(0) c. if ($dataProvenance.getProducedBy() == 'SoftmeanService') break; // found

Softmean. Stop. // get input data used by this data & recurse up the tree d. foreach ($inputData in $dataProvenance.getUsingData()) do i. $dataList.add($inputData) 3. End

[15/25]2006-09-13

Provenance Challenge: Q4 Find all invocations of align_warp ( with parameter "-m

12") that ran on a Monday ~ Answered by access to backend DB (SQL)1. Use SQL query to get matching invocations

2. Call getProcessProvenance to get description of align_warpReturns this [www]

SELECT invokee.workflow_id, invokee.service_id, invokee.workflow_node_id, invokee.workflow_timestep, invoker.workflow_id, invoker.service_id, invoker.workflow_node_id, invoker.workflow_timestep

FROM invocation_state_table invocation, entity_table invokee, entity_table invoker, notification_table notifications

WHERE invokee.entity_id = invocation.invokee_id AND invoker.entity_id = invocation.invoker_id AND notifications.source_id = invocation.invokee_id AND notifications.notification_type = 'ServiceInvoked' AND invokee.service_id =

'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' AND notifications.notification_xml LIKE'%<ModelMenuNumber>12</ModelMenuNumber>%‘AND DayOfWeek(invocation.request_receive_time) = 2; // 1=Sunday, 2=Monday, ...

[16/25]2006-09-13

Provenance Challenge: Q9 Find all the graphical atlas sets that have

metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.

Not answered We do not expect to answer such queries through

the provenance system We push the provenance information to external

metadata management systems such as MyLEAD, which can answer such “join” queries on data product metadata and provenance

[17/25]2006-09-13

Variations of Workflow Workflows with loops Workflows whose structure changes

dynamically or, as a simpler case, workflows with

conditional branches Hierarchical composition of workflows

workflows invoking other workflows

[18/25]2006-09-13

Variations of Queries Find all [workflows | processes] with a

particular execution status [completed | failed | waiting for input]

Show the client view and service view of the provenance and check for differences

AcknowledgementsAlek Slominski (GPEL Engine)

Satoshi Shirasuna (XBaya Composer)

LEAD Members

NSF

Questionswww.extreme.indiana.edu/

karma

[20/25]2006-09-13

More here [www]

Sample Activities Published

[21/25]2006-09-13

Karma DB Schema