provenance exchange, integration and querying - usenix · provenance exchange, integration and...
TRANSCRIPT
Provenance Exchange, Integration and Querying
Marta Ma&oso
Federal University of Rio de Janeiro, Brazil
Provenance Exchange, Integra1on and Querying
Contributors: • M. David Allen, Adriane Chapman, Barbara Blaustein, Len Seligman [5 GeHng It Together: Enabling Mul1-‐organiza1on Provenance Exchange]
• Anderson Marinho, Marta MaMoso, Claudia Werner, Vanessa Braganholo and Leonardo Murta [33 Challenges in managing implicit and abstract provenance data: experiences with ProvManager]
• Luiz M. R. Gadelha Jr., Marta MaMoso, Michael Wilde, Ian Foster [26 Provenance Query PaMerns for Many-‐Task Scien1fic Compu1ng]
MaMoso -‐ TaPP 2011 -‐ 2
Importance of provenance in Science • Interpret and reproduce data • Understand the experiment and chain of reasoning that was used in the produc1on of a result
• Verify that an experiment was performed according to acceptable procedures
• Iden1fy what were the inputs to an experiment and where they came from
• Assess data quality • Track who performed an experiment and who is responsible for its results (patents)
Provenance is as (or more!) important as the results (Davidson, Freire, Provenance and Workflows-‐ SIGMOD 2008)
MaMoso -‐ TaPP 2011 -‐ -‐ 3
Provenance along Wf levels
MaMoso -‐ TaPP 2011 -‐ 4
Provenance can support analyzing scien1fic experiments
• Before execu1on: o What programs may be used? Is there any alterna1ve methodology to explore?
o Is there any dependency between ac1vi1es? Which ac1vi1es are mandatory?
• Aeer execu1on: o What were the parameters used in the cri1cal result ? o What were the scien1fic workflow ac1vi1es used to obtain such result?
o Where are the output files generated by the distributed ac1vity A using the parameters P?
o How many 1mes the ac1vity A in version V was used in the experiment E?
MaMoso -‐ TaPP 2011 -‐ 5
Experiment Life Cycle*
6
Provenance Data
Analysis
Composition
Execution Visualiza1on
Query Discovery
Concep1on Reuse
Monitoring
Distribu1on
*MaMoso et al, 2010 -‐ Towards Suppor1ng the Life Cycle of Large Scale Scien1fic Experiments. IJBPIM
Exchange, Integrate, Querying
MaMoso -‐ TaPP 2011 -‐
Experiment Life Cycle &TaPP Sessions
7
Provenance Data
Analysis
Composition
Execution Visualiza1on
Query Discovery
Concep1on Reuse
Monitoring
Distribu1on
Provenance Models
Provenance in the Wild Provenance
Analysis
Exchange, Integrate, Querying
MaMoso -‐ TaPP 2011 -‐
Experiment Life Cycle &TaPP Papers
8
Provenance Data
Analysis
Composition
Execution Visualiza1on
Query Discovery
Concep1on Reuse
Monitoring
Distribu1on
Provenance Models
Provenance Analysis
Exchange, Integrate, Querying
• Paolo Missier [2 Incremental Workflow Improvement Through Analysis of Its Data Provenance]
• Mar1n Doerr and Maria Theodoridou, FORTH-‐ICS, Crete, Greece [3 CRMdig: A Generic Digital Provenance Model for Scien1fic Observa1on]
Provenance in the Wild
• Imad M. Abbadi and John Lyle [6 Provenance Challenges in Cloud Compu1ng]
• Peter Macko, Marc Chiarini, and Margo Seltzer [18 Collec1ng Provenance via the Xen Hypervisor]
• Elaine Angelino, Uri Braun, David A. Holland, Peter Macko, Daniel Margo, and Margo Seltzer [23 Provenance Integra1on Requires Reconcilia1on]
• Reng Zeng, Xudong He, Jiafei Li, Zheng Liu, W.M.P. van der Aalst [1 A Method to Build and Analyze Scien1fic Workflows from Provenance through Process Mining]
MaMoso -‐ TaPP 2011 -‐
Risers’ fa1gue analysis in oil eleva1on from ultra-‐deep waters following a coupled analysis
Es1mate risers life1me
Input Data to simulate Movements: Waves, wind, currents, ba1metryDados de onda vento, correnteza, bathymetry, etc. :
Generates large quan1ty of Data ...
(finite element meshes )
2. ... To do structural Analysis ofRisers (ANFLEX)
3. Results are analyzed POSFAL
MaMoso -‐ TaPP 2011 -‐
1. Coupled movement Analysis (TPN or Prosim)
9
Scien:fic experiment
Distributed Provenance
Wf 1
Wf 2
Offline analysis (vis cave)
Proven
ance Systems
Sub-‐workflow parallel execu1on in HPC clusters, clouds
Visualize and Share provenance data
with others scien1sts
Publish Experiment and
Workflow
10
Análise acoplada dos movimentos da plataforma
Analise estrutural de risers
Analysis of Fa1gue
Movements Filtering
Analysis of Risers’ Structure
Analysis of Movements of Plaqorm
Wf 3
MaMoso -‐ TaPP 2011 -‐
Some aspects of Composi1on
Provenance Data
Analysis
Composition
Execution Visualiza1on
Query Discovery
Concep1on
Reuse
Monitoring
Distribu1on
<<Semi-‐Automated>>Visualization
<<Automated>>EdgeCFD Preprocessor
<<Sub-‐Workflow, Sweep>>EdgeCFD Solver and Control Applications
file nn.part.in file nn.part.msh
file part.mat
filepart.ic
Filepart.edg
Visualizationfile .case
Visualizationfile nn.geo
Visualization file
velo_nnnn.vecnn
Visualization file
press_0000_sdnn
Visualization file
scal_nnnn_sdnn
Visualization file
DD_nnnn_sdnn
Derivation
MaMoso -‐ TaPP 2011 -‐ 11
Pillars of composi1on
The composi1on should be supported in scien1fic experiments
Concep
1on
Reuse
Confi
gura1o
n
Managem
ent
Composi1on
Provenance is orthogonal to
those pillars and it is generated in
each one of them
Provenance
MaMoso -‐ TaPP 2011 -‐ 12
Concrete Workflows for Ultra Deep Water Oil Explora1on
13
Prosim
Ocarflex
Posfal
TPN
Posinal
Posfal
Anflex
Workflow #1 Workflow #2 MaMoso -‐ TaPP 2011 -‐
Prosim
Ocarflex
Posfal
Conceptual Workflows and Concrete Workflows Limita1ons
14
Analysis of Movements of Plaqorm
Analysis of Risers’ Structure
Analysis of Fa1gue
Movements Filtering
MaMoso -‐ TaPP 2011 -‐
Experiment Lines*-‐ abstract workflow
15
Anflex, Orcaflex
Tpn
Orcaflex
Posfal
Mandatory ac1vity Op1onal ac1vity Varia1on point
Experiment Lines Derived Workflow #1
Derived Workflow #2
Análise acoplada dos movimentos da plataforma
Analise estrutural de risers
Fa1gue Analysis
Movements Filtering
Prosim, TPN
Risers StructuralAnalysis
Plaqorm Movement Analysis
Prosim
Anflex
Posfal
Possinal
*Experiment Line: Soeware Reuse in Scien1fic Workflows. SSDBM 2009: 264-‐272
MaMoso -‐ TaPP 2011 -‐
Workflow Deriva1on -‐ VisTrails
16
Análise acoplada dos movimentos da plataforma
Analise estrutural de risers
Analysis of Fa1gue
Movements Filtering
Analysis of Risers’ Structure
Analysis of Movements of
Plaqorm
Prosim
Orcaflex
Posfal
TPN
Posinal
Posfal
Anflex
MaMoso -‐ TaPP 2011 -‐
Workflow Deriva1on – Kepler
17
Análise acoplada dos movimentos da plataforma
Analise estrutural de risers
Analysis of Fa1gue
Movements Filtering
Analysis of Risers’ Structure
Analysis of Movements of
Plaqorm
MaMoso -‐ TaPP 2011 -‐
Deriva1on in GExpLine
18
1.Experiment Line Modeler
2. Configura8on Management features
3. Workflow Impor8ng 4. Workflow deriva8on
5. Prospec8ve Provenance Querying Support MaMoso -‐ TaPP 2011 -‐
Deriva1on Process
19
Derive concrete workflows from a conceptual workflow
Deriva1on informa1on is an important provenance data
It relates all concrete workflows (trials) for a single experiment (conceptual)
MaMoso -‐ TaPP 2011 -‐
Workflow Deriva1on – VisTrails and HPC
20
Análise acoplada dos movimentos da plataforma
Analise estrutural de risers
Analysis of Fa1gue
Movements Filtering
Analysis of Risers’ Structure
Analysis of Movements of
Plaqorm
TPN
Posinal
Posfal
Anflex
Anflex Parallel Execution reduced
from 37 h to 3 hours
MaMoso -‐ TaPP 2011 -‐
Workflow Execu1on Galileu Portal
Portlet Experiments
Server
Workflows
workflow Id
Workflow data
HPC Scheduller
Hydra
Hydra
Hydra
Hydra
…
Cluster, Cloud
Provenance
Call Hydra PBS
Condor
Workflow data
MaMoso -‐ TaPP 2011 -‐ 21
Issues in distributed provenance
• Provenance integra1on (local SWfMS and HPC wf execu1on)
• Provenance gathering in distributed/ heterogeneous environments
• Controlling provenance from parallel execu1on in distributed environments
• Using provenance for steering ac1vi1es in distributed environments
MaMoso -‐ TaPP 2011 -‐ 22
Provenance can support analyzing scien1fic experiments
• Before execu1on: o What programs may be used? Is there any alterna1ve methodology to explore?
o Is there any dependency between ac1vi1es? Which ac1vi1es are mandatory?
• Aeer execu1on: o What were the parameters used in the best result ? o What was the scien1fic workflow used to obtain such result? o Where are the output files generated by the distributed ac1vity A using the parameters P?
o How many 1mes the ac1vity A in version V was used in the experiment E?
MaMoso -‐ TaPP 2011 -‐ 23
Provenance Exchange, Integra1on and Querying
Contributors: • M. David Allen, Adriane Chapman, Barbara Blaustein, Len Seligman [5 GeHng It Together: Enabling Mul1-‐organiza1on Provenance Exchange]
• Anderson Marinho, Marta MaMoso, Claudia Werner, Vanessa Braganholo and Leonardo Murta [33 Challenges in managing implicit and abstract provenance data: experiences with ProvManager]
• Luiz M. R. Gadelha Jr., Marta MaMoso, Michael Wilde, Ian Foster [26 Provenance Query PaMerns for Many-‐Task Scien1fic Compu1ng]
MaMoso -‐ TaPP 2011 -‐ 24
Provenance Exchange, Integration and Querying
Marta Ma&oso
Federal University of Rio de Janeiro, Brazil