applications and requirements for scientific workflow introduction may 1 2006 nsf geoffrey fox...

Applications and Requirementsfor Scientific Workflow

Introduction

May 1 2006NSF

Geoffrey FoxIndiana University

Major Themes• What is different now and why

– Scientific workflow is in realm of possibility now• What are the application requirements rather than

CS requirements– Prioritize, identify new issues, what old requirements

have been satisfied• Ground these in scenarios or in application

descriptions that lead to these requirements

• Phrase as transformative research – does term “scientific workflow” conjure up the innovative future or perhaps a bureaucratic past?

Applications• Extreme weather (LEAD)• Bioinformatics (myGrid, BIRN); high throughput screening• Virtual Observatory in Astronomy• Particle Physics• Generic Data Analysis• Earthquake Science• Ocean Data Assimilation

• Note most of following topics come from Computer Science and one needs to identify the driving higher level application requirement– Preserve mapping of application requirements to computer

science topic

Topics – Application/Component Specific• [Evangelinos] Support Ocean Data assimilation

– Matlab, Fortran, Parallel simulations– Dataflow standards for “large I/O”– Metascheduling– Customization of execution parameters

(provenance)• [AGray] Need workflow components supporting

powerful data analysis across fields• [Gil] Support workflows needed in “open access” data

accompanying scientific publication• [Hendler] Support information management as well as

computation

Topics - Overarching• [Ellisman] What do we mean by workflow; the word means different things

to different people; should we use different terms; need a better word (distributed scientific method)

• [JMyers, Barga] Categorize workflows and study use; evaluate and compare; identify common patterns

• [Discussion] What has changed? – data deluge is one critical change; is data a curse or a blessing

• [Ellisman] What is the “scientific method” (versus “Google method”) and its implication for workflow

• [Barga] What’s wrong with commercial solutions• [Laszewski] Support common Grid patterns• [Fox] Build benchmark set analogous to NAS in parallel computing• [Fahringer] Include all costs (e.g. Web Service security, SOAP) in

performance models• [Deelman1] Support restructuring and planning for performance

optimization • [JMyers] Manage workflows like content

• [Ackerman, KMyers, Scacchi, Deelman2] Support full people (scientific process) workflow including social and organizational issues

Topics – Desired Qualities• [Goble] Support users who are often under-resourced• [Discussion] Multiple classes of users: “power” “common case”

“education”; do users know what they want or not?– Note industry workflow captures WELL understood business processes

• [Several] Workflows will be re-used and shared• [Ellisman] Enable reproducible science• [Livny] Support high quality software• [Laszewski] Balance between features, performance, and

completeness. • [Goble] Easily assemble workflows, find services and adapt

previous workflows• [Goble] The workflow has to reflect the science not the services

invocation interface. • [Goble] Automated workflow design is unlikely, unpopular, and

undesirable as scientists know which services they want• [Goble] Support all services that users want – whether they

have a WSDL interface or not

Topics – Desired Features• [Several] Workflows should be scalable, fault-tolerant, restartable, adaptive

and repeatable; support multi-administration heterogeneous resources• [Discussion] What do application scientists mean by above qualities?

– [Livny] Why is size important? Complexity counts• [Altintas] Support end (instruments) to end (interactive data analysis)

science• [Szalay] Interactive analysis as well as batch• [Gannon] Workflows triggered by events without user interaction• [Knoblock] Techniques for rapidly constructing models of new sources or

services so that they can be rapidly and correctly integrated. • [Knoblock] Support for dynamically integrating data across multiple data

sources (i.e, databases or web services) that were not designed to work together.

• [Curbera] Support reasoning about correctness and composability• [Livny] What is meaning of correctness and reproducibility (e.g. random

numbers)• [Gil] Support collections of workflows addressing common scientific

questions• [Discussion] Need to support workflows of heterogeneous workflows of

different types; note industry worries about linking intra-enterprise systems across enterprises

Topics – Detailed Technology• [Laszewski] Extend the workflow language through a set of

core libraries such as fault tolerance and check pointing.• [Goble] Need a higher level language than BPEL• [Goble] There will be no one workflow language or

workflow system, as there is no one word processor, programming language or operating system.

• [Ellisman, Livny] Role of portals (science gateways) as “common case” user interface versus distributed programming for “power user”

• [Altintas] User interface customizable for different domains • [Deelman2] Virtual data to capture efficiently past and

future actions• [Curbera] Integrate internet-scale execution (REST) and

enterprise service bus ESB; • [Discussion] Web 2.0 like Google maps; Industry

distinction between interoperability and implementation

Topics -- Provenance

• [Freire] Support computational (workflow) steering and provenance generation

• [Goble] Workflows must allow effective management of resultant data and provenance

• [Barga, Moreau] Define generally provenance of execution even though multiple paradigms

• [Altintas] Track provenance of workflow design, execution, and intermediate and final results

• [Gannon] Initialization of workflow components are dependent on each other

• [Seth] Design provenance supporting customization of adaptable workflows

applications and requirements for scientific workflow introduction may 1 2006 nsf geoffrey fox...

Documents