a semantic type system and propagation mechanism for scientific workflows

1
A Semantic Type System and Propagation A Semantic Type System and Propagation Mechanism for Scientific Workflows Mechanism for Scientific Workflows Shawn Bowers 2 and Bertram Ludäscher 1,2,3 1 Dept. of Computer Science, 2 Genome Center, UC DAVIS 3 San Diego Supercomputer Center, UC San Diego Kepler contributors include GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs 022567 (SEEK), EAR-0225673 (GEON), DOE DE-FC02-01ER25486 (SciDAC/SDM), and DARPA F33615-00-C-1703 (Ptolemy). CYBERINFRASTRUCTURE FOR THE GEOSCIENCES The Problem: Design and Reuse of Scientific Workflows and their Components Scientific workflows are becoming increasingly important as a unifying mechanism for interlinking scientific data management, analysis, simulation, and visualization tasks. While current systems like Kepler permit the creation of executable workflows (e.g., from local components and web services), conceptual modeling and design of scientific workflows has been largely neglected so far. Thus design and resuse of (possibly thousands of legacy) components, actors, and workflows is difficult. Our Approach: We have developed a formal model for scientific workflows based on an actor- oriented modeling and design approach, originally developed for studying models of complex concurrent systems. Actor-oriented modeling separates two modeling concerns: component communication (dataflow) and overall workflow coordination (orchestration). Our framework includes a novel hybrid type system, separating further the concerns of conventional data modeling (structural data type) and conceptual modeling (semantic type). In our design methodology, semantic and structural mismatches can be handled independently or simultaneously via different types of adapters, giving rise to new methods of workflow design. The Benefits: Separation of modeling concerns: transport, structure, semantics port types • “smart” discovery and linkage of components and data sets • Workflow graph is an artifact that can be described, analyzed, shared • More independently reusable components • Mix of design strategies:step-wise refinement, bottom-up, top-down strategies, data-oriented, task-oriented, … GEON Web Service Based Information Integration (1) Actor-oriented SWF Modeling & Design www.kepler-project.org www.kepler-project.org (1) Semantic Extensions for SWF Design Annotation Propagation Problem: Given - a structural schemas S (input) and S’ (ouput) and an ontology O, - a semantic annotation α - a query annotation q Goal: compute α’ Scientific Workflow with Semantic Query Annotations Future Plans: Workflow engineers evolve workflows by applying design primitives (left), shown as transformations t; certain primitives can be grouped to form design strategies (right), where each design strategy is shown as a distinct dimension of a design space. Specific Challenges in Scientific Workflow Design: How to support ... (1) ... scientific workflow design process in general? (2) ... "smart" discovery of components (out of thousands ...) (3) ... "smart" linking of data to components (data binding) (4) ... "smart" linking of components to one another (service composition) (5) ... overall orchestration semantics (6) ... propagation of (semantic) type information Approach: Separation of Concerns in SWF Modeling and Design: Data ports have ... - a transport type (move data via: object, reference, SRB, GridFTP, scp, ..) - a structural type (XML DTD-ish) - a semantic type (OWL-ish) - a token consumption type (in/out rates of tokens/actor firing) (1) Design methodology based on an abstract model of SWFs; allows mixes of top-down (stepwise refinement), bottom-up, data-driven, task-driven, structure-driven, semantics-driven ... design (2) concept/ontology-based actor discovery (3) semantic annotations of data and actors (4) use of both structural and semantic types to type- check desired connections and guide suitable pre-/post- actors; introduction of structural and/or semantic adapters ("shims"); basic idea: use logic constraints to express types (5) employ Ptolemy's Models of Computation/Directors: Process Network, SDF, ... (6) use query annotations of actors and a procedure similar to the "Chase" (resolution) Interplay between structural and semantic type information W 0 t W 1 W 2 W m W n t t Workflow Design Workflow Implementation Top-Down Bottom-Up Input Driven Output Driven Structure Driven Semantic Driven Task Driven Data Driven

Upload: regan-rutledge

Post on 31-Dec-2015

47 views

Category:

Documents


5 download

DESCRIPTION

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES. W 0. Workflow Design. Top-Down. t. W 1. t. Task Driven. W 2. Data Driven. Bottom-Up. …. GEON Web Service Based Information Integration. Structure Driven. W m. Output Driven. Semantic Driven. t. Workflow Implementation. W n. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Semantic Type System and Propagation  Mechanism for Scientific Workflows

A Semantic Type System and Propagation A Semantic Type System and Propagation Mechanism for Scientific WorkflowsMechanism for Scientific Workflows

Shawn Bowers2 and Bertram Ludäscher1,2,3

1Dept. of Computer Science, 2Genome Center, UC DAVIS 3San Diego Supercomputer Center, UC San Diego

Kepler contributors include GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs 022567 (SEEK), EAR-0225673 (GEON), DOE DE-FC02-01ER25486 (SciDAC/SDM), and DARPA F33615-00-C-1703 (Ptolemy).

CYBERINFRASTRUCTUREFOR THE GEOSCIENCES

The Problem: Design and Reuse of Scientific Workflows and their Components

Scientific workflows are becoming increasingly important as a unifying mechanism for interlinking scientific data management, analysis, simulation, and visualization tasks. While current systems like Kepler permit the creation of executable workflows (e.g., from local components and web services), conceptual modeling and design of scientific workflows has been largely neglected so far. Thus design and resuse of (possibly thousands of legacy) components, actors, and workflows is difficult.

Our Approach:We have developed a formal model for scientific workflows based on an actor-oriented modeling and design approach, originally developed for studying models of complex concurrent systems. Actor-oriented modeling separates two modeling concerns: component communication (dataflow) and overall workflow coordination (orchestration). Our framework includes a novel hybrid type system, separating further the concerns of conventional data modeling (structural data type) and conceptual modeling (semantic type). In our design methodology, semantic and structural mismatches can be handled independently or simultaneously via different types of adapters, giving rise to new methods of workflow design.

The Benefits:• Separation of modeling concerns: transport, structure, semantics port types• “smart” discovery and linkage of components and data sets• Workflow graph is an artifact that can be described, analyzed, shared• More independently reusable components• Mix of design strategies:step-wise refinement, bottom-up, top-down strategies, data-oriented, task-oriented, …• Some costly semantic annotations can be automatically derived

GEON Web Service Based Information Integration

(1) Actor-oriented SWF Modeling & Design

www.kepler-project.orgwww.kepler-project.org

(1) Semantic Extensions for SWF Design

Annotation Propagation Problem: Given - a structural schemas S (input) and S’ (ouput) and an ontology O,- a semantic annotation α- a query annotation qGoal: compute α’

Scientific Workflow with Semantic Query Annotations

Future Plans: Workflow engineers evolve workflows by applying design primitives (left), shown as transformations t; certain primitives can be grouped to form design strategies (right), where each design strategy is shown as a distinct dimension of a design space.

Specific Challenges in Scientific Workflow Design: How to support ...(1) ... scientific workflow design process in general? (2) ... "smart" discovery of components (out of thousands ...)(3) ... "smart" linking of data to components (data binding)(4) ... "smart" linking of components to one another (service composition)(5) ... overall orchestration semantics(6) ... propagation of (semantic) type information

Approach: Separation of Concerns in SWF Modeling and Design: Data ports have ...- a transport type (move data via: object, reference, SRB, GridFTP, scp, ..)- a structural type (XML DTD-ish)- a semantic type (OWL-ish)- a token consumption type (in/out rates of tokens/actor firing)

(1) Design methodology based on an abstract model of SWFs; allows mixes of top-down (stepwise refinement), bottom-up, data-driven, task-driven, structure-driven, semantics-driven ... design(2) concept/ontology-based actor discovery(3) semantic annotations of data and actors(4) use of both structural and semantic types to type-check desired connections and guide suitable pre-/post-actors; introduction of structural and/or semantic adapters ("shims"); basic idea: use logic constraints to express types(5) employ Ptolemy's Models of Computation/Directors: Process Network, SDF, ... (6) use query annotations of actors and a procedure similar to the "Chase" (resolution)

Interplay between structural and semantic type information

W0 tW1

W2

Wm

Wn

t

t

WorkflowDesign

WorkflowImplementation

Top-Down

Bottom-Up

Input Driven

Output Driven

Structure Driven

Semantic Driven

Task Driven

Data Driven