provenance exchange, integration and querying - usenix · provenance exchange, integration and...

25
Provenance Exchange, Integration and Querying Marta Ma&oso Federal University of Rio de Janeiro, Brazil

Upload: hadang

Post on 02-Dec-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Provenance Exchange, Integration and Querying  

Marta  Ma&oso          

 Federal  University  of  Rio  de  Janeiro,  Brazil  

Page 2: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Provenance    Exchange,  Integra1on  and  Querying  

Contributors:  • M.  David  Allen,  Adriane  Chapman,  Barbara  Blaustein,  Len  Seligman    [5  GeHng  It  Together:  Enabling  Mul1-­‐organiza1on  Provenance  Exchange]  

•  Anderson  Marinho,  Marta  MaMoso,  Claudia  Werner,  Vanessa  Braganholo  and  Leonardo  Murta    [33  Challenges  in  managing  implicit  and  abstract  provenance  data:  experiences  with  ProvManager]  

•  Luiz  M.  R.  Gadelha  Jr.,  Marta  MaMoso,  Michael  Wilde,  Ian  Foster    [26  Provenance  Query  PaMerns  for  Many-­‐Task  Scien1fic  Compu1ng]  

MaMoso    -­‐  TaPP  2011  -­‐      2  

Page 3: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Importance  of  provenance  in    Science  •  Interpret  and  reproduce  data  • Understand  the  experiment  and  chain  of  reasoning  that  was  used  in  the  produc1on  of  a  result  

•  Verify   that   an   experiment   was   performed  according  to  acceptable  procedures  

•  Iden1fy  what  were  the  inputs  to  an  experiment  and  where  they  came  from  

• Assess  data  quality  •  Track   who   performed   an   experiment   and   who   is  responsible  for  its  results  (patents)  

Provenance is as (or more!) important as the results (Davidson,  Freire,  Provenance  and  Workflows-­‐  SIGMOD  2008)  

MaMoso    -­‐  TaPP  2011  -­‐     -­‐  3  

Page 4: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Provenance  along  Wf  levels  

MaMoso    -­‐  TaPP  2011  -­‐     4  

Page 5: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Provenance  can  support  analyzing  scien1fic  experiments  

•  Before  execu1on:  o What  programs  may  be  used?  Is  there  any  alterna1ve  methodology  to  explore?    

o  Is  there  any  dependency  between  ac1vi1es?  Which  ac1vi1es  are  mandatory?  

•  Aeer  execu1on:  o What  were  the  parameters  used  in  the  cri1cal  result  ?  o What  were  the  scien1fic  workflow  ac1vi1es  used  to  obtain  such  result?  

o Where  are  the  output  files  generated  by  the  distributed  ac1vity  A  using  the  parameters  P?  

o  How  many  1mes  the  ac1vity  A  in  version  V  was  used  in  the  experiment  E?    

MaMoso    -­‐  TaPP  2011  -­‐      5  

Page 6: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Experiment  Life  Cycle*  

6  

Provenance  Data  

Analysis

Composition

Execution Visualiza1on  

Query  Discovery  

Concep1on  Reuse  

Monitoring  

Distribu1on  

*MaMoso  et  al,  2010  -­‐  Towards  Suppor1ng  the  Life  Cycle  of  Large  Scale  Scien1fic  Experiments.  IJBPIM  

Exchange,  Integrate,  Querying  

MaMoso    -­‐  TaPP  2011  -­‐    

Page 7: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Experiment  Life  Cycle  &TaPP  Sessions    

7  

Provenance  Data  

Analysis

Composition

Execution Visualiza1on  

Query  Discovery  

Concep1on  Reuse  

Monitoring  

Distribu1on  

Provenance  Models  

Provenance  in  the  Wild  Provenance  

Analysis  

Exchange,  Integrate,  Querying  

MaMoso    -­‐  TaPP  2011  -­‐    

Page 8: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Experiment  Life  Cycle  &TaPP  Papers  

8  

Provenance  Data  

Analysis

Composition

Execution Visualiza1on  

Query  Discovery  

Concep1on  Reuse  

Monitoring  

Distribu1on  

Provenance  Models  

Provenance  Analysis  

Exchange,  Integrate,  Querying  

•  Paolo  Missier  [2  Incremental  Workflow  Improvement  Through  Analysis  of  Its  Data  Provenance]  

•  Mar1n  Doerr  and  Maria  Theodoridou,  FORTH-­‐ICS,  Crete,  Greece  [3  CRMdig:  A  Generic  Digital  Provenance  Model  for  Scien1fic  Observa1on]  

Provenance  in  the  Wild  

•  Imad  M.  Abbadi  and  John  Lyle  [6  Provenance  Challenges  in  Cloud  Compu1ng]  

•  Peter  Macko,  Marc  Chiarini,  and  Margo  Seltzer  [18  Collec1ng  Provenance  via  the  Xen  Hypervisor]  

•  Elaine  Angelino,  Uri  Braun,  David  A.  Holland,  Peter  Macko,  Daniel  Margo,  and  Margo  Seltzer  [23  Provenance  Integra1on  Requires  Reconcilia1on]  

•  Reng  Zeng,  Xudong  He,  Jiafei  Li,  Zheng  Liu,  W.M.P.  van  der  Aalst  [1  A  Method  to  Build  and  Analyze  Scien1fic  Workflows  from  Provenance  through  Process  Mining]  

MaMoso    -­‐  TaPP  2011  -­‐    

Page 9: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Risers’  fa1gue  analysis  in  oil  eleva1on    from  ultra-­‐deep  waters  following  a  coupled  analysis  

Es1mate  risers    life1me  

Input  Data  to  simulate  Movements:  Waves,  wind,  currents,  ba1metryDados  de  onda    vento,    correnteza,    bathymetry,    etc.  :  

       

Generates  large  quan1ty  of  Data  ...  

(finite  element  meshes  )  

2.  ...  To  do    structural  Analysis  ofRisers  (ANFLEX)    

3.  Results  are  analyzed    POSFAL  

MaMoso    -­‐  TaPP  2011  -­‐    

1.  Coupled  movement  Analysis    (TPN  or  Prosim)  

 9  

Page 10: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Scien:fic  experiment  

Distributed  Provenance    

Wf  1  

Wf  2  

Offline  analysis  (vis  cave)  

Proven

ance  Systems  

Sub-­‐workflow  parallel  execu1on    in  HPC  clusters,  clouds  

Visualize  and  Share  provenance  data  

with  others  scien1sts  

Publish  Experiment  and    

Workflow  

10  

Análise  acoplada  dos    movimentos  da  plataforma  

Analise  estrutural  de  risers  

Analysis  of  Fa1gue  

Movements  Filtering  

Analysis  of  Risers’  Structure  

Analysis  of  Movements  of  Plaqorm  

Wf  3  

MaMoso    -­‐  TaPP  2011  -­‐    

Page 11: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Some  aspects  of  Composi1on  

Provenance  Data  

Analysis

Composition

Execution Visualiza1on  

Query  Discovery  

Concep1on  

Reuse  

Monitoring  

Distribu1on  

<<Semi-­‐Automated>>Visualization

<<Automated>>EdgeCFD Preprocessor

<<Sub-­‐Workflow,  Sweep>>EdgeCFD Solver  and  Control  Applications

file  nn.part.in file  nn.part.msh

file  part.mat

filepart.ic

Filepart.edg

Visualizationfile .case

Visualizationfile nn.geo

Visualization  file  

velo_nnnn.vecnn

Visualization  file  

press_0000_sdnn

Visualization  file  

scal_nnnn_sdnn

Visualization  file  

DD_nnnn_sdnn

Derivation

MaMoso    -­‐  TaPP  2011  -­‐      11  

Page 12: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Pillars  of  composi1on  

The  composi1on  should  be  supported  in  scien1fic  experiments  

Concep

1on  

Reuse  

Confi

gura1o

n    

Managem

ent  

Composi1on  

Provenance  is  orthogonal  to  

those  pillars  and  it  is  generated  in  

each  one  of  them  

Provenance  

MaMoso    -­‐  TaPP  2011  -­‐     12  

Page 13: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Concrete  Workflows  for    Ultra  Deep  Water  Oil  Explora1on  

13  

Prosim  

Ocarflex  

Posfal  

TPN  

Posinal  

Posfal  

Anflex  

Workflow    #1   Workflow    #2  MaMoso    -­‐  TaPP  2011  -­‐    

Page 14: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Prosim  

Ocarflex  

Posfal  

Conceptual  Workflows  and    Concrete  Workflows  Limita1ons  

14  

Analysis  of  Movements  of  Plaqorm  

Analysis  of  Risers’  Structure  

Analysis  of  Fa1gue  

Movements  Filtering  

MaMoso    -­‐  TaPP  2011  -­‐    

Page 15: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Experiment  Lines*-­‐  abstract  workflow  

15  

Anflex,  Orcaflex  

Tpn  

Orcaflex  

Posfal  

Mandatory  ac1vity  Op1onal  ac1vity  Varia1on  point  

Experiment  Lines  Derived  Workflow    #1  

Derived  Workflow    #2  

Análise  acoplada  dos    movimentos  da  plataforma  

Analise  estrutural  de  risers  

Fa1gue  Analysis    

Movements  Filtering  

Prosim,  TPN  

Risers  StructuralAnalysis    

Plaqorm  Movement  Analysis      

Prosim  

Anflex  

Posfal  

Possinal  

*Experiment  Line:  Soeware  Reuse  in  Scien1fic  Workflows.  SSDBM  2009:  264-­‐272  

MaMoso    -­‐  TaPP  2011  -­‐    

Page 16: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Workflow  Deriva1on  -­‐  VisTrails  

16  

Análise  acoplada  dos    movimentos  da  plataforma  

Analise  estrutural  de  risers  

Analysis  of  Fa1gue  

Movements  Filtering  

Analysis  of  Risers’  Structure  

Analysis  of  Movements  of  

Plaqorm  

Prosim  

Orcaflex  

Posfal  

TPN  

Posinal  

Posfal  

Anflex  

MaMoso    -­‐  TaPP  2011  -­‐    

Page 17: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Workflow  Deriva1on  –  Kepler  

17  

Análise  acoplada  dos    movimentos  da  plataforma  

Analise  estrutural  de  risers  

Analysis  of  Fa1gue  

Movements  Filtering  

Analysis  of  Risers’  Structure  

Analysis  of  Movements  of  

Plaqorm  

MaMoso    -­‐  TaPP  2011  -­‐    

Page 18: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Deriva1on  in  GExpLine  

18  

1.Experiment  Line  Modeler  

2.  Configura8on  Management  features  

3.  Workflow  Impor8ng  4.  Workflow  deriva8on  

5.  Prospec8ve  Provenance  Querying  Support  MaMoso    -­‐  TaPP  2011  -­‐    

Page 19: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Deriva1on  Process  

19  

Derive  concrete  workflows  from  a  conceptual  workflow  

Deriva1on  informa1on  is  an  important  provenance  data  

It  relates  all  concrete  workflows  (trials)  for  a  single  experiment  (conceptual)    

MaMoso    -­‐  TaPP  2011  -­‐    

Page 20: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Workflow  Deriva1on  –  VisTrails  and  HPC  

20  

Análise  acoplada  dos    movimentos  da  plataforma  

Analise  estrutural  de  risers  

Analysis  of  Fa1gue  

Movements  Filtering  

Analysis  of  Risers’  Structure  

Analysis  of  Movements  of  

Plaqorm  

TPN  

Posinal  

Posfal  

Anflex  

Anflex Parallel Execution reduced

from 37 h to 3 hours

MaMoso    -­‐  TaPP  2011  -­‐    

Page 21: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Workflow  Execu1on  Galileu  Portal  

Portlet  Experiments  

Server  

Workflows  

workflow  Id    

Workflow  data  

HPC  Scheduller  

Hydra  

Hydra  

Hydra  

Hydra  

…  

Cluster,  Cloud  

Provenance  

Call  Hydra    PBS    

Condor  

Workflow  data  

MaMoso    -­‐  TaPP  2011  -­‐      21  

Page 22: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Issues  in  distributed  provenance  

• Provenance  integra1on  (local  SWfMS  and  HPC  wf  execu1on)  

• Provenance  gathering  in  distributed/  heterogeneous  environments  

• Controlling  provenance  from  parallel  execu1on  in  distributed  environments  

• Using  provenance  for  steering  ac1vi1es  in  distributed  environments  

MaMoso    -­‐  TaPP  2011  -­‐      22  

Page 23: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Provenance  can  support  analyzing  scien1fic  experiments  

•  Before  execu1on:  o What  programs  may  be  used?  Is  there  any  alterna1ve  methodology  to  explore?    

o  Is  there  any  dependency  between  ac1vi1es?  Which  ac1vi1es  are  mandatory?  

• Aeer  execu1on:  o What  were  the  parameters  used  in  the  best  result  ?  o What  was  the  scien1fic  workflow  used  to  obtain  such  result?  o Where  are  the  output  files  generated  by  the  distributed  ac1vity  A  using  the  parameters  P?  

o How  many  1mes  the  ac1vity  A  in  version  V  was  used  in  the  experiment  E?    

MaMoso    -­‐  TaPP  2011  -­‐     23  

Page 24: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Provenance    Exchange,  Integra1on  and  Querying  

Contributors:  • M.  David  Allen,  Adriane  Chapman,  Barbara  Blaustein,  Len  Seligman    [5  GeHng  It  Together:  Enabling  Mul1-­‐organiza1on  Provenance  Exchange]  

•  Anderson  Marinho,  Marta  MaMoso,  Claudia  Werner,  Vanessa  Braganholo  and  Leonardo  Murta    [33  Challenges  in  managing  implicit  and  abstract  provenance  data:  experiences  with  ProvManager]  

•  Luiz  M.  R.  Gadelha  Jr.,  Marta  MaMoso,  Michael  Wilde,  Ian  Foster    [26  Provenance  Query  PaMerns  for  Many-­‐Task  Scien1fic  Compu1ng]  

MaMoso    -­‐  TaPP  2011  -­‐     24  

Page 25: Provenance Exchange, Integration and Querying - USENIX · Provenance Exchange, Integration and Querying! Marta%Ma&oso % %!!% Federal%! ... Marta Mattoso Created Date:

Provenance Exchange, Integration and Querying  

Marta  Ma&oso          

 Federal  University  of  Rio  de  Janeiro,  Brazil