yesworkflow: retrospective provenance without a runtime provenance recorder

Click here to load reader

Post on 06-Aug-2015

35 views

Category:

Data & Analytics

0 download

Embed Size (px)

TRANSCRIPT

  1. 1. Retrospec)veProvenancewithout aRun)meProvenanceRecorder TimothyMcPhillipsShawnBowers KhalidBelhajjameBertramLudscher*
  2. 2. Overview/ExecSummary Scien=cWorkows:ASAP! ScriptsareScien)cWorkows,too! ..sohowcanwehelpworkowscriptauthors? YesWorkow(YW) Prospec)veprovenancethroughYW-annota=ons: script+annota=ons@begin,@end,@in,@out=>workowmodel YW-Recon:Retrospec)veprovenancewithoutaprovenance recorder: ..add@URItemplate-annota=ons =>linkingscriptpersisted(meta-)datawithworkowmodel ...=>YWRetrospec)veProvenanceQueries ..usingYWworkowmodeltoqueryYW-reconstructedrun=me provenance! YesWorkowProvenance@TaPP'15 2
  3. 3. Scientific Workflows: ASAP! Automation wfs to automate computational aspects of science Scaling (exploit and optimize machine cycles) wfs should make use of parallel compute resources wfs should be able handle large data Abstraction, Evolution, Reuse (human cycles) wfs should be easy to (re-)use, evolve, share Provenance wfs should capture processing history, data lineage traceable data- and wf-evolution Reproducible Science Trident Workbench VisTrails YesWorkowProvenance@TaPP'15 3 Eswareinmal
  4. 4. Scientific Workflows Cabellosetal.ComputerPhysicsCommunica6ons182,2011 YesWorkowProvenance@TaPP'15 4
  5. 5. areawonderfulthing YesWorkowProvenance@TaPP'15 5 Dr.NorbertPodhorszki (then:UCDavis)
  6. 6. after simplifying a bit (here: Kepler/COMAD) YesWorkowProvenance@TaPP'15 6 Dr.SvenKhler (then:UCDavis)
  7. 7. I beg your pardon, I never promised you .. Thanks to our Graphical UI your scientific workflows will be much easier to develop, understand and maintain! Hmmthiswassupposedtobeeasierthanprogramming! YesWorkowProvenance@TaPP'15 7
  8. 8. Meanwhile, on a nearby planet Interactive Visualization YesWorkowProvenance@TaPP'15 8
  9. 9. SKOPE:SynthesizedKnowledgeOfPastEnvironments YesWorkowProvenance@TaPP'15 9 Bocinsky,Kohleretal.studyrain-fedmaizeofAnasazi FourCorners;AD6001500.ClimatechangeinuencedMesaVerdeMigra)ons;late 13thcenturyAD.Usesnetworkoftree-ringchronologiestoreconstructaspa)o- temporalclimateeldatafairlyhighresolu=on(~800m)fromAD12000.Algorithm es=matesjointinforma=onintree-ringsandaclimatesignaltoiden=fybesttree-ring chronologiesforclimatereconstruc=ng. K.Bocinsky,T.Kohler,A2000-yearreconstruc=onoftherain-fed maizeagriculturalnicheintheUSSouthwest.Nature Communica.ons.doi:10.1038/ncomms6618 implemented as an R Script
  10. 10. HPCBioWorkows@Illinois YesWorkowProvenance@TaPP'15 10 Na6onalPetascale Compu6ngFacility BroadIns)tute:Recommendedworkowforvariantanalysis LiudmilaMainzer, VictorJongeneel HPCBio@Illinois Quickly,say:#!/bin/bash
  11. 11. Its)metoshi^control YesWorkowProvenance@TaPP'15 11 backfrombeingconsumersofsomeone elses(=our)tools.. Justclickhere! ...totoolmakers! Scien=stswhoauthorworkowsasscripts! Gowherethewildthings(users!)are Yes,developforendusers butdontforgetthetoolmakers! Canwedothistogether?
  12. 12. GetModernClimate PRISM_annual_growing_season_precipitation SubsetAllData dendro_series_for_calibration dendro_series_for_reconstruction CAR_Analysis_unique cellwise_unique_selected_linear_models CAR_Analysis_union cellwise_union_selected_linear_models CAR_Reconstruction_union raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors CAR_Reconstruction_union_output ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif master_data_directory prism_directory tree_ring_datacalibration_years retrodiction_years ? YesWorkow: Yes,scriptsareworkows,too! ScriptvsWorkows/ASAP: Automation:***** Scaling:** Abstraction:* Provenance:**
  13. 13. Enter:YesWorkow!(yesworkow.org) YesWorkow(YW) Grass-rootseort mee=ngthescien=sts/userswheretheyR! R,Matlab,(i)Python,Jupyter, Scripts+simpleuserannota=ons =>Revealtheworkowmodel/abstrac)on thatunderliesthe(script)implementa6on =>YWcangiveusmoreofASAP! FirstYW:ASAP(Abstrac=on)... ThenYW-recon:ASAP(reconstruc=ngrun)meProvenance) 13YesWorkowProvenance@TaPP'15
  14. 14. YesWorkow.org YesWorkowProvenance@TaPP'15 14
  15. 15. RelatedWork,otherApproaches tobringworkow/provenancebenetstoscripts: Run)meProvenanceRecorders: use(R,Python,..)librariesand/orcodeinstrumenta)onto capturerun)meobservables leread/write,func=oncalls,programvariables&state, noWorkowsystem [Murta-Braganholo-Chiriga=-Koop-Freire-IPAW14] exploitPythonprolinglibrarytocapturerun=meprovenance =>helpswith"S"and"P" OS-levelcaptureof(system)provenance SometalksatTaPP!? YesWorkowProvenance@TaPP'15 15
  16. 16. YW(prospec.ve)and YW-Recon(retrospec.ve)Provenance 1.YW:AnnotateScript=>YWModel Annotate@BEGIN..@END,@IN,@OUT Visualize,share,behappyJ 2.Runscript Filesarereadandwrizen Folder-&Filenameshavemetadata 3.YW-Recon Use@URItagsthatlinkYWModelPersistedData RunURI-templatequeries cf.ls-R&RegExmatching 4.YW-Query Answertheusersprovenancequeries YesWorkowProvenance@TaPP'15 16
  17. 17. YWannota)ons:ModelyourWorkow! YesWorkowProvenance@TaPP'15 17
  18. 18. YesWorkow:Prospec)ve&Retrospec=ve Provenance(almost)forfree! YWannota=onsin thescript(R, Python,Matlab) areusedto recreatethe workowview fromthescript YesWorkowProvenance@TaPP'15 18 cassette_id sample_score_cutoff sample_spreadsheet le:cassette_{cassette_id}_spreadsheet.csv calibration_image le:calibration.img initialize_run run_log le:run/run_log.txt load_screening_results sample_namesample_quality calculate_strategy rejected_sample accepted_sample num_images energies log_rejected_sample rejection_log le:/run/rejected_samples.txt collect_data_set sample_id energy frame_number raw_image le:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw transform_images corrected_image le:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img total_intensitypixel_count corrected_image_path log_average_image_intensity collection_log le:run/collected_images.csv YW!
  19. 19. Voila!TheWorkowrevealed! YesWorkowProvenance@TaPP'15 19 cassette_id sample_score_cutoff sample_spreadsheet le:cassette_{cassette_id}_spreadsheet.csv calibration_image le:calibration.img initialize_run run_log le:run/run_log.txt load_screening_results sample_namesample_quality calculate_strategy rejected_sample accepted_sample num_images energies log_rejected_sample rejection_log le:/run/rejected_samples.txt collect_data_set sample_id energy frame_number raw_image le:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw transform_images corrected_image le:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img total_intensitypixel_count corrected_image_path log_average_image_intensity collection_log le:run/collected_images.csv
  20. 20. Get3viewsforthepriceof1! YesWorkowProvenance@TaPP'15 20 Processview Dataview Combinedview
  21. 21. GetModernClimate PRISM_annual_growing_season_precipitation SubsetAllData dendro_series_for_calibration dendro_series_for_reconstruction CAR_Analysis_unique cellwise_unique_selected_linear_models CAR_Analysis_union cellwise_union_selected_linear_models CAR_Reconstruction_union raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors CAR_Reconstruction_union_output ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif master_data_directory prism_directory tree_ring_datacalibration_years retrodiction_years PaleoclimateReconstruc)on(EnviRecon.org) YesWorkowProvenance@TaPP'15 21 explainedusingYesWorkow! KyleB.,(computa=onal)archaeologist: "Ittookmeabout20minutestocomment.Less thananhourtolearnandYW-annotate,all-told."
  22. 22. Provenance Lands 22 WorkowModeling&Design (a.k.a.prospec.veprovenance Workow-land) Run)meProvenance (a.k.a.traces,logs, retrospec.ve provenance, Trace-land) YesWorkowProvenance@TaPP'15
  23. 23. run/ raw q55 DRT240 e10000 image_001.raw ... ... ... ... image_037.raw e11000 image_001.raw ... ... ... image_037.raw DRT322 e10000 image_001.raw ... ... ... image_030.raw e11000 image_001.raw ... ... image_030.raw data DRT240 DRT240_10000eV_001.img ... ... ... DRT240_11000eV_037.img DRT322 DRT322_10000eV_001.img ... ... DRT322_11000eV_030.img collected_images.csv rejected_samples.txt run_log.txt YW-RECON:Prospec=ve&Retrospec)ve Provenance(almost)forfree! YesWorkowProvenance@TaPP'15 23 cassette_id sample_score_cutoff sample_spreadsheet le:cassette_{cassette_id}_spreadsheet.csv calibration_image le:calibration.img initialize_run run_log le:run/run_log.txt load_screening_results sample_namesample_quality calculate_strategy rejected_sample accepted_sample num_images energies log_rejected_sample rejection_log le:/run/rejected_samples.txt collect_data_set sample_id energy frame_number raw_image le:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw transform_images corrected_image le:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img total_intensitypixel_count corrected_image_path log_average_image_intensity collection_log le:run/collected_images.csv URI-templateslinkconceptualen==es torun)meprovenancele|behindby thescriptauthor facilita=ngprovenancereconstruc=on
  24. 24. YW(prospec.ve)and YW-Recon(retrospec.ve)Provenance 1.YW:AnnotateScript=>YWModel Annotate@BEGIN..@END,@IN,@OUT Visualize,share,behappyJ 2.Runscript Filesarereadandwrizen Folder-&Filenameshavemetadata 3.YW-Recon Use@URItagsthatlinkYWModelPersistedData RunURI-templatequeries cf.ls-R&RegExmatching 4.YW-Query Answertheusersprovenancequeries YesW

View more