karma provenance framework v2 provenance challenge workshop/ggf18

of 22 /22
Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University

Author: cooper

Post on 20-Jan-2016




0 download

Embed Size (px)


Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18. Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University. Outline. Architecture of Karma Workflow Setup & Collecting Provenance Provenance Traces “canonical” Challenge Queries Suggested Variations. - PowerPoint PPT Presentation


  • Karma Provenance Framework v2Provenance Challenge Workshop/GGF18Yogesh L. SimmhanBeth Plale, Dennis Gannon, Srinath PereraIndiana University

  • Outline

    Architecture of KarmaWorkflow Setup & Collecting ProvenanceProvenance Tracescanonical Challenge QueriesSuggested Variations

  • Provenance Collection: Challenges & UsesLinked Environments for Atmospheric Discovery (LEAD) project Weather & Severe Storm Prediction ApplicationsProvenance on workflow (process) & data products at fine granularityDynamic, Long running workflowsHelps scientists to search for workflows & data productsestimate data quality, track workflow execution, andanalyze & mine data products from runs

  • Karma Provenance Framework Lightweight do not duplicate existing metadata cataloging effortmyLEAD personal metadata catalogResCat service & data registryGlue to integrate metadata on data & services with runtime workflow informationScalability1 500 users, 100s of workflows, 10,000s of data products

    [1] Performance Evaluation of the Karma Provenance Framework, Simmhan, Y., et al.; IPAW, 2006

  • Karma Provenance FrameworkKey Provenance Activities generated during lifetime of wrokflowWorkflow | Service InvokedData ConsumedData ProducedSending ResponseActivities modeled as XML messagesPublished asynchronously by service|workflow|clientPresently use WS-Eventing messaging systemActivities stored in relational database

  • Karma Architecture1Workflow Instance10 Data Products Consumed & Produced by each ServiceService2Service1Service10Service910P/10C10C 10P10C10P/10C10PWorkflow Engine[1] A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Simmhan, Y., et al., Submitted to ICWS Conference, 2006

  • Provenance Challenge WorkflowApplications modeled as web-servicesGeneric Factory toolkit creates web-service wrappers for command-line applicationsService invokes a shell-script/application, passing command-line argumentsCreated services automatically instrumented to generate provenance using Karma client libraryWorkflow composed as GPEL* scriptXBaya Workflow composer GUICentral GPEL workflow engine orchestrates execution*Grid Process Execution Language, an extension of the Business Process Execution Language (BPEL)

  • Provenance Challenge Workflow

  • Provenance Traces Building Block QueriesData Provenance: get[Recursive]DataProvenanceWhat (ID), where (URL), when (Timestamp)How (Process, inputs)

  • Provenance Traces Building Block QueriesProcess Provenance: getProcessProvenanceWhat (ID), when (Timestamp), who (Invoker)State (execution/completion status)Input & Output data products

  • Provenance Traces Building Block QueriesWorkflow Trace: getWorkflowTraceWhat (ID), when (Timestamp), who (Invoker)State (execution/completion status)Process provenance of workflow steps

  • Provenance Challenge Queries!Answered by Karma Service API Directly Answered by Karma Service API, with post-processing by client~Answered by access to backend DB (SQL) Not answered

  • Provenance Challenge Queries: Q1Find everything that caused Atlas X Graphic to be as it is!Answered by Karma Service API DirectlyThis is the recursive data provenance of the Atlas X Graphic fileA call to getRecursiveDataProvenance(lead:uuid:1157946992-atlas-x.gif)returns this [www]

  • Provenance Challenge Queries: Q2Find the process that led to Atlas X Graphic, excluding all prior to softmean Answered by Karma Service API, with post-processing by clientFirst call getDataProvenanceThen recursively get data provenance till SoftmeanService is seenReturns this [www]1. let $dataList := ['lead:uuid:1157946992-atlas-x.gif']2. while ($dataList != empty) do // get data provenance for this level a. $dataProvenance = karma.getDataProvenance($dataList[0]) // print process information & remove data from list b. Print $dataProvenance; $dataList.delete(0) c. if ($dataProvenance.getProducedBy() == 'SoftmeanService') break; // found Softmean. Stop. // get input data used by this data & recurse up the tree d. foreach ($inputData in $dataProvenance.getUsingData()) do i. $dataList.add($inputData) 3. End

  • Provenance Challenge: Q4Find all invocations of align_warp with parameter "-m 12" that ran on a Monday~ Answered by access to backend DB (SQL)Use SQL query to get matching invocationsCall getProcessProvenance to get description of align_warpReturns this [www]SELECT invokee.workflow_id, invokee.service_id, invokee.workflow_node_id, invokee.workflow_timestep, invoker.workflow_id, invoker.service_id, invoker.workflow_node_id, invoker.workflow_timestepFROM invocation_state_table invocation, entity_table invokee, entity_table invoker, notification_table notificationsWHERE invokee.entity_id = invocation.invokee_id AND invoker.entity_id = invocation.invoker_id AND notifications.source_id = invocation.invokee_id AND notifications.notification_type = 'ServiceInvoked' AND invokee.service_id = 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' AND notifications.notification_xml LIKE'%12%AND DayOfWeek(invocation.request_receive_time) = 2; // 1=Sunday, 2=Monday, ...

  • Provenance Challenge: Q9Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files. Not answeredWe do not expect to answer such queries through the provenance systemWe push the provenance information to external metadata management systems such as MyLEAD, which can answer such join queries on data product metadata and provenance

  • Variations of WorkflowWorkflows with loopsWorkflows whose structure changes dynamicallyor, as a simpler case, workflows with conditional branchesHierarchical composition of workflowsworkflows invoking other workflows~Similar to user-views (UPenn), nested-workflows (myGrid),

  • Variations of QueriesFind all [workflows | processes] with a particular execution status [completed | failed | waiting for input]Dynamic attribute of provenance?Query for client view and service view of the provenance Check for differences

  • AcknowledgementsAlek Slominski (GPEL Engine)Satoshi Shirasuna (XBaya Composer)

    LEAD Members

    NSF Questionswww.extreme.indiana.edu/karma

  • Sample Activities PublishedMore here [www]

  • Karma DB Schema

    Cannot just be guided by static information on workflow; long running workflows

    Dynamic: Structure & Semantics of workflow may change due to external events. E.g. detection of a hurricane during run may trigger another workflow, or steer current workflows directionAdaptive: How & where the workflow executes is determined at runtimeWS Eventing