copyright ©2003 turboworx, inc. 1 high performance workflows for networks and grids andrew h....

28
Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer [email protected]

Post on 18-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 1

High Performance Workflows for Networks and Grids

Andrew H. Sherman

Chief Technology Officer

[email protected]

Page 2: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 2

Outline

Technical Computing Workflows

Deploying Workflows in HPC Environments

TurboWorx Workflow Products

Page 3: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 3

• Complex technical computing problems and algorithms have become “business critical”

• Solutions often involve integrating several applications and many data sources into workflows

• Automated coarse-grain parallelism and grid computing are emerging as key technologies

Complex Technical Computationsare Critical in Many Industries

Page 4: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 4

• Complex technical computing problems and algorithms have become “business critical”

• Solutions often involve integrating several applications and many data sources into workflows

• Automated coarse-grain parallelism and grid computing are emerging as key technologies

Complex Technical Computationsare Critical in Many Industries

Life Sciences & Medicine

Discovery and Development

Data- & compute-intensive applications

Huge databases from multiple sources & in diverse formats

Manual workflows

Information-Based Medicine

Complex, heterogeneous databases & applications

Better and more effective diagnosis & treatment from faster, more accurate information interpretation

Automotive/AeroDesign and

Development Concurrent Engineering

requires integration and collaboration between Concept, Design and Development processes

Global design teams that work around the clock

Suppliers part of the design and development process

FinancePortfolio

Management/Pricing Scenario-based

modeling

Huge quantities of real-time data

Time is money!

Page 5: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 5

What is a Workflow?

“The automation of a business process, in whole or parts, where documents, information or tasks are passed from one participant to another to be processed, according to a set of procedural rules”

— Workflow Management Coalition

Page 6: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 6

Technical Computing Workflows

How do technical computing workflows differfrom traditional business process workflows?

Data flow vs. control flow

Widely distributed data (often with multiple owners)

Dynamic operating environment (e.g., the Grid)

Hierarchical workflow constructs

Requirement for parameterized executions

Evolving/Customized workflow definitions

Significance of collaboration and reuse

Page 7: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 7

Characterizing Technical Computing Workflows

Collaborative Production

Ad Hoc Administrative

Ref: Production Workflows (Leyman, Roller)

Repetition

Busi

ness

Valu

e

Technical

Workflows

Page 8: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 8

HPC Platforms: SMPs & Clusters

Linux Clusters

•Cost-effective

•Scalable

•Modular — easy to upgrade to faster, better cpus (e.g. 64-bit)

•Great for computation

Blade Solutions

•Similar attributes to Linux clusters

•More compact — Better flops/ft3

•Often cheaper

Linux UNIX

ComputationCluster

DatabaseServer

Shared Memory Multiprocessor

• Expensive to buy, costly to upgrade

• Poor scalability for computation

• Best use: Data storage & access

Page 9: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 9

HPC Platforms: Enterprise Grids

Enterprise Grids

•Efficient - Uses all the hardware available

•Provides user comfort and familiarity

•More than cycle stealing on idle desktops — usually includes computing on heterogeneous collections of servers

•Great for computation, particularly for Life Sciences, where desktop platforms are appropriate for many algorithms

Linux UNIX

ComputationCluster

DatabaseServer

AIX LinuxWindows Mac OS X Linux Linux

Page 10: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 10

Technical Computing and Workflows

Integrate, manage, and accelerate collections of heterogeneous applications, data, and platforms

Provide horsepower to process massive amounts of data by applying parallelism without source code modification

Address the needs of key user groups (end users, application experts, and IT staff) through easy-to-use interfaces

Facilitate collaboration and reuse to save time in the design, trials and testing, and deployment of new computing solutions

Workflows can address some critical computing challenges:

Page 11: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 11

But . . .

Scalability & performance: going beyond multithreading with “transparent parallelism”

Management of dynamic computing environments

Automated data and application staging

Integration with rapidly evolving grid standards(to support reuse and collaboration)

Desktop tools for workflow creation; portals for execution

Debugging and monitoring interfaces

There are difficulties to overcome:

Page 12: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 12

Traditional Workflow Implementation

Large, complex scripts to orchestrate applications Static embedded infrastructure control; usually aimed at single

machine Communication via temp files “Human-in-the-loop” operation

What’s wrong with this?

Page 13: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 13

Traditional Workflow Implementation

Large, complex scripts to orchestrate applications Static embedded infrastructure control; usually aimed at single

machine Communication via temp files “Human-in-the-loop” operation

Poor performance — Mainly aimed at SMPs (but scalability often limited)

Lack of automation is inefficient and error-prone

No support for application integration or data conversion

Difficult to create, maintain, modify (even for skilled programmers)

Little reusability or portability

What’s wrong with this?

Page 14: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 14

Typical “Human-in-the Loop” Workflow:

• Manual component startup• “Cut and paste” data movement• Sequential execution • Limited throughput due to “bottleneck components”

Access Data

Access Data AA BB CC

Store DataStore Data

Slow FastFast

Traditional Life Science Workflows

Page 15: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 15

Access Data

Access Data AA BB CC

Store DataStore Data

FastFast

A Better Way: Automation & Parallelism

TurboWorx High-Performance Workflow:

• Automated component startup & data conversion

• Transparent data-driven parallelism to eliminate bottlenecks

BB

BB

Fast

• Pipeline acceleration: asynchronous, dynamic, concurrent execution on distributed machines

Page 16: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 16

TurboWorx Enterprise Architecture

User

Data Storag

e

TurboWorx Hub

AIX LinuxWindows Mac OS X

WorkstationsComponent

Library

Linux Linux

Builder

Interfaces

Command Line

Web Portal

Compute Clusters (Managed by BQS/DRM Systems)

Data Repository

Page 17: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 17

Workflow Lifecycle

Design– End user or developer??– Component & workflow development environment– Integration with data– Testing & Debugging

Deployment– Local storage vs. centralized storage – Sharing & Collaboration

Execution– Execution interface: CLI, Proprietary GUI, Portal, Web/Grid

Service– Access Control for workflows and data– Resource management

Monitoring– Events reflecting from workflow and services execution

Refinement & Reuse

Page 18: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 18

TurboWorx Workflows

Atomic Components– Command-line programs (e.g. C/C++/Fortran, Perl), Java, Jython– XML wrappers created by wizards or by editing templates

Dataflow Components– Workflows built from other components (including other

workflows)– Automated data flow & transformations between components– Created using visual programming tool

Deployment– Components stored in a “Component Library” (Local or

Centralized)– Import/Export and component sharing (collaboration)– Data references via a virtual “Data Repository” interface

(supports WebDav, Avaki, FTP, NFS)

Design & Deployment

Page 19: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 19

TurboWorx Builder

ClustalW

ApplicationJava MethodJython Script

Component Library

Wizard

TurboWorx Component

AtomicComponent

Creation

WorkflowComponent

Creation

{ }

Page 20: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 20

Special Components: Conditionals

Page 21: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 21

Special Components: Loops

Support for: “For”, “While”, “Do Until”

While Loop:

Page 22: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 22

•Components to convert between groups of many data elementsand sequences of the individual data elements

•Support “Fork-Join” data parallelism

•Standard splitters/joiners provided with the TurboWorx system. Examples:

•Arrays: Convert between array and individual elements (in order)

•Collections: Convert between a Java.util.Collection and its elements

•Strings/Patterns: Split input stream based on regular expressions

•Users may create additional types using Jython or Java

Special Components: Splitters & Joiners

Page 23: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 23

Access Data

Access Data AA BB CC

Store DataStore Data

FastFast

Parallelism in Practice

TurboWorx High-Performance Workflow:

Slow

SPLIT

JOIN

Splitting enables pipeline parallelism (A, B, C run concurrently on different data)

Page 24: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 24

Access Data

Access Data AA BB CC

Store DataStore Data

FastFast

Parallelism in Practice

TurboWorx High-Performance Workflow:

BB

BB

Fast

SPLIT

JOIN

Scheduler determines amount of data parallelism dynamically at run time

Page 25: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 25

Protein Characterization Example

Overall Task:

Group protein domains into families

clustalwhmmbuildhmmsearch

clustalwhmmbuildhmmsearch

BLASTP

clustalw

clustalw

Key Programs

Identifyhomologous pairs

Build familiesaround pairs

Refine & optimizeprotein families

Find consensussequences

Compute identityscores vs. leaders

ProcessFamily

Subworkflow

Page 26: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 26

Example: “Process Family” Workflow

Page 27: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 27

Protein Family Example

Page 28: Copyright ©2003 TurboWorx, Inc. 1 High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com

Copyright ©2003 TurboWorx, Inc. 28

Take-Home Points

Technical computing workflows are important in various industries

Effective application of workflows requires HPC, including fault-tolerant automation and dynamic parallelism in a grid-like computing environment

TurboWorx workflow products offer one end-to-end solution for developing and deploying high performance technical workflows