workflow as a service: an approach to workflow...
TRANSCRIPT
![Page 1: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/1.jpg)
Reginald Cushing, Adam Belloum, Vladimir Korkhov, Dmitry Vasyunin, Marian Bubak,
Carole Leguy
Institute for InformaticsUniversity of Amsterdam
3rd International Workshop on Emerging Computational Methods for the Life Sciences
18th June 2012
Workflow as a Service:An Approach to
Workflow Farming
![Page 2: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/2.jpg)
Outline
● Scientific Workflows● Farming Concepts● Workflow as a Service (WfaaS)● System overview
– Task Harnessing– Messaging
● Application Use Case● Results● Conclusions
![Page 3: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/3.jpg)
Scientific Workflows
● Composing experiments from reusable modules● Vertexes represent computation● Edges represent data dependency and data communication● Modules/Tasks communicate through channels represented by ports● Workflow engines distribute workload onto resources such as grids and clouds● Modules run in parallel thus achieving better throughput
![Page 4: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/4.jpg)
Farming Concepts
● Many scientific applications require a parameter space study a.k.a parameter sweep● In workflows parameter sweeps can be achieved by running multiple identical workflows with different parameter inputs● Cons: Every instance of a workflow has to be submitted to distributed resources where queue waiting times play significant role on throughput
![Page 5: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/5.jpg)
Farming Concepts
Task
Parameters organized on message queues
![Page 6: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/6.jpg)
Farming Concepts
Task
Parameters organized on message queues
Task processes data sequentially
![Page 7: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/7.jpg)
Farming Concepts
Task
Parameters organized on message queues
Task processes data sequentially
![Page 8: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/8.jpg)
Farming Concepts
Task
Parameters organized on message queues
Task processes data sequentially
![Page 9: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/9.jpg)
Farming Concepts
Task
Parameters organized on message queues
Task processes data sequentially
Adding more tasks increases message consumption rate
Challenge: How many tasks to create?
TaskTask
Too many - tasks get stuck on queues. Too few - optimal performance not achieved
![Page 10: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/10.jpg)
Workflow as a Service
● Workflow execution is persistent i.e. it runs, process data and does NOT terminate but wait for more data
● An active workflow instance can process multiple parameters
● Make better usage of computing resources
● A parameter space can be partitioned amongst a pool of active workflow instances (a farm of workflows)
● A workflow acts as a service by accepting requests to process data with given parameters
– Request 1: data A, parameters {p1,p2,...}– Request 2: data A, parameters {k1,k2,...}
● Multiple WfaaS processing requests form a farm of workflows
![Page 11: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/11.jpg)
System Overview
Loosely coupled modules revolving around a message Queues
![Page 12: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/12.jpg)
Enactment Engine
Dataflow engine (top-level scheduler) based on Freefluo§ engine
Models workflows as dataflow graphs
Vertices are tasks while edges are dependencies(data
Tasks have ports to simulate data channels
Dataflow model dictates that only tasks which have input are scheduled for execution. §http://freefluo.sourceforge.net
![Page 13: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/13.jpg)
Message Broker
Message broker plays a pivotal role in the system
Message queues act as a data buffer
Communicating tasks are time decoupled
Through queue sharing we can achieve scaling
Tasks communicate through messaging where messages contain references to actual data
![Page 14: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/14.jpg)
Submission System
Pluggable schedulers (bottom-level) for task match-making
Submitters (drivers) abstract actual resources such as cluster, grid, cloud
Scheduler matches a task to a submitter
Submitter does actual task/job submission
![Page 15: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/15.jpg)
Task Harnessing
Task harness is a late binding, pilot-job mechanism
A pilot-job (harness) is submitted which will pull the actual job
The harness separates data transport from scientific logic
Better control of tasks
![Page 16: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/16.jpg)
Task Auto-Scaling
Messages between tasks are monitored
Size of queued data and mean data processing time are used to calculate task load
Auto-scaling replicates a particular task to ameliorate the task load
Replicated tasks (clones) partition data by sharing same input message queues
![Page 17: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/17.jpg)
Parameter Mapping
● One to one mapping: each parameter is mapped to one workflow instance● Generates many workflow instances which end up stuck on queues waiting execution● High scheduling overhead, high concurrency
● Many to one mapping: all parameters are mapped to the same workflow instance● Only one workflow to schedule, takes long to process all the parameter space● Low scheduling overhead, Low concurrency
● Many to many: parameter space is partitioned amongst a farm of workflows● A number of workflows scheduled which accelerates processing● Low scheduling overhead, high concurrency
![Page 18: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/18.jpg)
Task harnessing
● WfaaS is enabled through task harnessing● A harness is a caretaker code that runs alongside the module on the resource/worker node● It implements a plugin architecture● Modules are dynamically loaded at runtime
● Data communication to and from the module is taken care of by the harness● The harness invokes the module with new requests of data processing● The harness is akin to a container while the module is akin to a service● The harness enables asynchronous module execution as communication is done through messaging
![Page 19: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/19.jpg)
Messaging
● In WfaaS modules communicate through messaging ● Message queues allow multiple instances of modules to share the same input space● Through message queues, data is partitioned amongst modules● Messaging circumvents the need to co-allocate resources
● A pull model implies that each module can process data at its own pace● Once a module has finished processing data it asks for more (pull)
![Page 20: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/20.jpg)
Application Use Case
● Arterial tree model geometry and representation of model parameters constrained to uncertainties● Parameters: flow velocity, brachial, radial, ulnar radii. Length of brachial, radial, ulnar. etc
● Biomedical study for which 3000 runs were required to perform global sensitivity analysis● Patient-specific simulation includes many parameters based on data measured in-vivo
![Page 21: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/21.jpg)
Results
● Left: WfaaS 100 simulations takes around 3h:15min● Right: Non WfaaS 100 simulations take 5h:15min● The WfaaS approach, each workflow instance performs multiple simulations which drastically reduces queue waiting times● The non-WFaaS approach generates 100 workflow instances with most of them getting stuck on job queues● In both cases worklows were competing for 28 worker nodes
![Page 22: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/22.jpg)
Conclusions
● WfaaS is an ideal approach to large parametric studies
● WfaaS reduces common scheduling overhead associated with queue waiting times
● WfaaS is achieved through task harnessing whereby caretaker routines can invoke the task multiple times
● A farm of wokflows can progress at its own pace through a parameter pulling mechanisim
![Page 23: Workflow as a Service: An Approach to Workflow Farmingsalsahpc.indiana.edu/...presentation_emcls2012.pdf · Workflow as a Service Workflow execution is persistent i.e. it runs, process](https://reader036.vdocuments.net/reader036/viewer/2022081323/5f0b0b457e708231d42e92d7/html5/thumbnails/23.jpg)
Further Information
● WSVLAM workflow management system– http://staff.science.uva.nl/~gvlam/wsvlam/
● Computational Sciences at University of Amsterdam
– http://uva.computationalscience.nl
● COMMIT– http://www.commit-nl.nl/new