teragrid science workflow survey using taverna and swift in olsg and siggrid wenjun wu, aashish...

21
TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Upload: dwain-dixon

Post on 25-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

TeraGrid Science Workflow SurveyUsing Taverna and Swift in OLSG and SIGGrid

Wenjun Wu, Aashish Adhikari

Thomas D. Uram, Michael Wilde, Michael E. Papka

Page 2: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Outline

• Open Life Science Gateway• Provide Web-Services for commonly used bio-applications such as

BLAST, CLUSTALW, InterProScan, ….• Bioinformatics researchers can compose Taverna workflows using

OLSG services• OLSG doesn’t manage Taverna workflows.

• Social informatics Grid• Use the swift engine inside the gateway for workflow management• The swift workflow engine enables users to write parallel scripts for

their domain applications on TeraGrid• With existing swift scripts, gateway developers can focus on the

application management and web interfaces

• Open Protein Simulation Science Gateway• Reuse the application framework from the SIDGrid science gateway • Generate the oops WebUI from the swift script

Page 3: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Using Taverna in Open Life Science Gateways

Page 4: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Compose Taverna workflows based on OLSG Services

Add WSDL ScavengersAdd WSDL Scavengers

Compose a workflowAdd methodsDefine input/outputLink components together

Compose a workflowAdd methodsDefine input/outputLink components together

Wrap Bio-Applications as Web-Services Wrap Bio-Applications as Web-Services

Page 5: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Social Informatics Data Gridhttp://sidgrid.ci.uchicago.edu

• SIDGrid enables social and behavioral scientists to collect and annotate data, collaborate and share data, and analyze and mine large data repositories

speech, gesture, facial expression, and physiological measurements

• SIDGrid Services• Data Importing/Exporting• Query• Streaming• Large scale data analysis on

TeraGrid, especially multimedia media processing and data mining tasks

Page 6: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

SIDGrid Workflow Management http://www.ci.uchicago.edu/swift/

• It supports data-intensive scientific applications that execute many tasks coupled by disk-resident datasets • A simple script language for describing workflows • Flexible data mapping mechanism for accessing large-scale

scientific dataset• Introduce an efficient task execution framework for high-

throughput computation• Easy to integrate into science gateways

Page 7: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

SIDGrid Science Gateway Framework

Render Gadgets Instance

Gadgets XML

SIDGrid Data URLsRun application specific

workflows

Application Mobyle XML

Application Mobyle XML

Swift workflow scripts

Gadgets XML

•Integrates social applications and provides web2.0 interface• Extended Mobyle Application XML for application description: swift script templates as applications•Swift workflow engine: start/stop/resume workflows

•Integrates social applications and provides web2.0 interface• Extended Mobyle Application XML for application description: swift script templates as applications•Swift workflow engine: start/stop/resume workflows

Page 8: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Run Social and Behavior Science Tools as SIDGrid Gadgets

3. Launch SIDGrid gadgets (Praat and workflow history gadget) to run analysis and monitor the progress

3. Launch SIDGrid gadgets (Praat and workflow history gadget) to run analysis and monitor the progress

SIDGrid Experiment browsing pageListing project files and available analysis tools;Providing browser-side gadget execution environmentThree steps to launch SIDGRID application gadgets:

SIDGrid Experiment browsing pageListing project files and available analysis tools;Providing browser-side gadget execution environmentThree steps to launch SIDGRID application gadgets:

1. Select data files to analyze1. Select data files to analyze

2. Select an analysis application2. Select an analysis application

Page 9: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Build a domain-specific scientific workspace

Fluid FlowFluid Flow

Breadcrumbs

ParaView

WorkflowConfiguratorMonitor

+

Develop swift workflow scripts for domain applicationsDevelop swift workflow scripts for domain applications

Define the application execution descriptionsDefine the application execution descriptions

Generate the execution profiles and Web gadgetsGenerate the execution profiles and Web gadgets

Reuse general purpose gadgets from the gadget library

Reuse general purpose gadgets from the gadget library

Create a customized a layoutmanager for these gadgetsCreate a customized a layoutmanager for these gadgets

Describe the event channels between the gadgetsDescribe the event channels between the gadgets

Page 10: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

TeraGrid Science Gateway:

Open Protein Simulator (OOPS) for the UChicago Department of Chemistry

and Institute for Biophysical Dynamics

Page 11: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Science topic: 3D Protein structure prediction

• 3D Protein structure prediction “ab-intio”• Specifically, protein targets that have few or no known homologies

• Primary algorithmic scaffold for the lab is called “ItFix” – iterative fixing

• Uses many parallel “rounds” of simulation: independent, randomly seeded simulated annealing runs

• Consensus structures are formed after each round to seed the next round

• Various exploratory algorithms and variations – e.g. “Loop Modeling” and “SPEED” are hung off of this framework

Page 12: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

ItFix: iterative fixing for structure prediction

Slide courtesy of Glen Hocky

Page 13: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Science Value

• The special realm of OOPS/ItFix in the folding world is to predict structure of new proteins whose sequence is known, but for which few or no homologies to known structures exist (making the common “template-based” approaches inneffective)

• OOPS is now being applied to massive-scale prediction on pathogens (eg Staph. aureus) and on metagenomes with biomedical and energy applications.

• The CPU demands for these applications are significant – 10s of millions of hours expected to be required over next few years

• The core OOPS protein library was recently re-written and is now being recalibrated for resource estimation

Page 14: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

14

Per-protein stats4MB, 3 files

BioPython and Protlib libs1 tar, ~2MB

Per-protein data4 files, ~ 100KB

Simulate100 - 10K X

Analyze

1 to 10X

Unique, per protein

Common stat data16 files, ~ 90MB

All-protein stats600MB, 8000 files

Per-protein output3 files, ~ 3MB

Repeated for each protein for each parameter set

Target: 100s proteins, 100’s paramsets

refined

Protein structure prediction data flow

Page 15: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Protein structure prediction

1. ItFix( Protein p, int nsim, int maxr, float temp, float dt)2. {3. ProtSim prediction[ ][ ];4. boolean converged[ ];5. PSimCf config;6. 7. config.st = temp;8. config.tui = dt;9. config.coeff = 0.1;10. 11. iterate r {12. prediction[r] =13. doRoundCf(p, nsim, config);14. converged[r] =15. analyze(prediction[r], r, maxr);16. } until ( converged[r] );17. }

15

Page 16: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Protein structure prediction

1. Sweep( )2. {3. int nSim = 1000;4. int maxRounds = 3;5. Protein pSet[ ] <ext; exec="Protein.map">;6. float startTemp[ ] = [ 100.0, 200.0 ];7. float delT[ ] = [ 1.0, 1.5, 2.0, 5.0, 10.0 ];8. foreach p, pn in pSet {9. foreach t in startTemp {10. foreach d in delT {11. ItFix(p, nSim, maxRounds, t, d);12. }13. }14. }15. }16. 17. Sweep();

16

10 proteins x 1000 simulations x3 rounds x 2 temps x 5 deltas

= 300K tasks

Page 17: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Structure prediction runs on TeraGrid and Blue Gene/P

Work of Tobin Sosnick, Karl Freed, Glen Hocky, Joe DeBartolo, Aashish Adhikari.

Page 18: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

T1af7 T1r69T1b72

Work of Tobin Sosnick, Karl Freed, Glen Hocky, Joe DeBartolo, Aashish Adhikari.

Page 19: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Work of Tobin Sosnick, Karl Freed, Glen Hocky, Joe DeBartolo, Aashish Adhikari.

Page 20: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Generating a Mobyle XML for the oops swift script

<parameter ismandatory="1" issimple="1" ismaininput="1">

<name>plist</name>

<prompt lang="en">input protein fasta file</prompt>

<type>

<datatype>

<class>File</class>

</datatype>

</type>

<format>

<code proglang="python"> ("","-plist="+str(value))[value is not None] </code>

</format>

<argpos>1</argpos>

</parameter>

main(){ string plistfile=@arg("plist",""); // input protein fasta file string indir=@arg("indir","oops.input"); //… string outdir=@arg("outdir","output"); //… string nsims=@arg("nsims","1"); // simulation num string st=@arg("st","100"); // start temperature string tui=@arg("tui","100"); // time update interval string coeff=@arg("coeff","0.1");string plist[] = readData(plistfile); RAMAIn ramain[] <ext; exec="RAMAInProts.map.sh",i=indir,p=plistfile>; RAMAOut ramaout[][] <ext;exec="RandProtRadialMapper.py",o=outdir,p=plistfile,n=nsims,c=create>; foreach sim in [ 0 : @toint(nsims) -1 ] { foreach prot,index in plist {ramaout[index][sim] = predictCf(prot, ramain[index], st, tui, coeff); VizOut outpng[] <ext; exec="pngmapper.py", o=metadir, p=@filename(ramaout[index][sim].pdb) >; outpng[0] = pngviz(ramaout[index][sim]);

} } Oops script

XML for generatingWeb gadget

Page 21: TeraGrid Science Workflow Survey Using Taverna and Swift in OLSG and SIGGrid Wenjun Wu, Aashish Adhikari Thomas D. Uram, Michael Wilde, Michael E. Papka

Build the Protein Folding Simulation Workspace Prototype using the framework

•A prototype portal for large-scale protein 3D structure simulations

• Domain specific gadgetProtein folding simulation gadget• Reusable gadgetsWorkflow history gadgetViewing Results gadgetFile Browsing gadget

• Integrate all these gadgets into an desktop application layout• Message passing between gadgets

•A prototype portal for large-scale protein 3D structure simulations

• Domain specific gadgetProtein folding simulation gadget• Reusable gadgetsWorkflow history gadgetViewing Results gadgetFile Browsing gadget

• Integrate all these gadgets into an desktop application layout• Message passing between gadgets