requirements for cabig infrastructure to support semantic workflows

22
1 Yolanda Gil ([email protected]) USC Information Sciences Institute January 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda Gil, PhD Information Sciences Institute and Department of Computer Science University of Southern California [email protected] http://www.isi.edu/~gil QuickTime™ and a decompressor are needed to see this picture.

Upload: charity-conley

Post on 02-Jan-2016

33 views

Category:

Documents


2 download

DESCRIPTION

Requirements for caBIG Infrastructure to Support Semantic Workflows. Yolanda Gil, PhD Information Sciences Institute and Department of Computer Science University of Southern California [email protected] http://www.isi.edu/~gil. Outline. Brief background on semantic workflows - PowerPoint PPT Presentation

TRANSCRIPT

1Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Requirements for caBIG Infrastructure

to Support Semantic Workflows

Yolanda Gil, PhDInformation Sciences Institute and

Department of Computer ScienceUniversity of Southern California

[email protected]

http://www.isi.edu/~gil

QuickTime™ and a decompressor

are needed to see this picture.

2Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Outline

Brief background on semantic workflows• Semantic workflow representations in Wings

Five uses of semantic workflows to assist users and their resulting requirements• Reproducibility• Validation• Metadata generation• Data discovery• Workflow discovery

Requirements for architecture components• Ontology repositories and services• Data/metadata catalogs and services• Component/service catalogs and services• Workflow catalogs and services

3Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Benefits of Semantic Workflows [Gil JSP-09]

Execution management: Automation of workflow execution

Managing distributed computation

Managing large data sets

Security and access control

Provenance recording Low-cost high fidelity reproducibility

Semantics and reasoning: Workflow retrieval and discovery

Automation of workflow generation

Systematic exploration of design space

Validation of workflows Automated generation of metadata

Guarantees of data pedigree

“Conceptual” reproducibility

4Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Semantic Workflows in Wings [Kim et al CCPEJ 08; Gil et al IEEE eScience 09; Gil et al K-CAP 09; Kim et al IUI 06; Gil et al IEEE IS 2010]

Workflows augmented with semantic constraints • Each workflow constituent has a variable associated with it

– Nodes, links, workflow components, datasets– Workflow variables can represent collections of data as well as classes of software components

• Constraints are used to restrict variables, and include: – Metadata properties of datasets– Constraints across workflow variables

• Incorporate function of workflow components: how data is transformed

Reasoning about semantic constraints in a workflow• Algorithms for semantic enrichment of workflow templates• Algorithms for matching queries against workflow catalogs• Algorithms for generating workflows from high-level user requests

• Algorithms for generating metadata of new data products• Algorithms for assisting users w/creation of valid workflow templates

5Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Semantic Workflows

in WINGS Workflow templates Dataflow diagram

• Each constituent (node, link, component, dataset) has a corresponding variable

Semantic properties Constraint

s on workflow variables

(TestData dcdom:isDiscrete false)(TrainingData dcdom:isDiscrete false)

6Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Semantic Constraints as Metadata Properties

Constraints on reusable template (shown below)

Constraints on current user request (shown above)

[modelerInput_not_equal_to_classifierInput: (:modelerInput wflow:hasDataBinding ?ds1) (:classifierInput wflow:hasDataBinding ?ds2) equal(?ds1, ?ds2) (?t rdf:type wflow:WorkflowTemplate) > (?t wflow:isInvalid "true"^^xsd:boolean)]

7Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Outline

Brief background on semantic workflows• Semantic workflow representations in Wings

Five uses of semantic workflows to assist users and their resulting requirements• Reproducibility• Validation• Metadata generation• Data discovery• Workflow discovery

Requirements for architecture components• Ontology repositories and services• Data/metadata catalogs and services• Component/service catalogs and services• Workflow catalogs and services

8Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Uses of Semantic Workflows:1) Easily Replicate Previously Published Results

A catalog of carefully crafted workflows of select state-of-the-art methods to cover a wide range of common analyses• Many implementations of same algorithm, some proprietary

• Same implementation but new versions and bug fixes

With such catalog, the effort involved in reproducing results is greatly reduced

Semantics needed to assist users to use workflows correctly

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

9Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Resulting Requirements (1)

Semantic representations of workflows need to abstract from software implementation• Representing abstract classes of software components – Instances are the implemented codes– Workflow steps refer to component classes

• Representing abstract kinds of data (eg exclude format)

Semantic reasoning needed to specialize workflow• To map the abstract workflow into an execution-ready workflow

• To insert lower level steps (eg data transformations)

10Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Uses of Semantic Workflows:2) Ensure Correct Use of State-of-the-Art Methods

Analytic software and methods are well documented but all is text (papers, manuals, etc)• Time consuming, hard to spot interdependencies, no validation

Semantics needed to guide users to set up workflows correctly and customize them to their datasets and goals

QuickTime™ and a decompressor

are needed to see this picture.

11Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Requirements (2)

Semantic workflows can check constraints and guide users• Representing requirements of software components

– Constraints on input data– Constraints on parameter settings given properties of input data

• Representing metadata properties of datasets Semantic reasoning needed:

• To check constraints of each workflow step• To propagate constraints across the workflow

12Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Uses of Semantic Workflows:3) Automatic Generation of Metadata

Metadata annotations are tedious and involved• Often not done, an obstacle to sharing and to reuse

Semantic workflows can automate the generation of metadata for analysis data products

QuickTime™ and a decompressor

are needed to see this picture.

13Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Requirements (3)

Semantic representations needed:• Representing expected characteristics of output dataset for each software component given the input metadata

• Representing metadata properties of input datasets

Semantic reasoning needed:• To propagate metadata for each workflow step • To propagate metadata across the workflow

14Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Uses of Semantic Workflows:4) Discovery of Relevant Data

Need a dataset of updated

common (known) locito annotate findings, where can I find one?

Workflows reused from a catalog may require additional data besides what is provided by the user

Semantic workflows can help identify characteristics of required datasets and query data catalogs to find them for the user

15Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Requirements (4)

Semantic representations needed:• Metadata properties of any additional input datasets in the workflow, including:– Default properties for the given workflow– Augmented properties that result from the specific input data provided by the user

Semantic reasoning needed:• Propagation of semantic constraints through the workflow

• Formulation of queries to data catalogs based on semantic properties required of datasets in the workflow

16Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Uses of Semantic Workflows:5) Retrieval of Workflows

Hard to find workflows for the type of analysis a user wants• Semantic information is not provided when creating the

workflow• However, retrieval queries are often based on metadata

properties of data– e.g., “Find workflows that can normalize data which is continuous and

has missing values [<- constraints on inputs] to create a decision tree model [constraint on intermediate data products]”

Semantic workflows needed to augment user-provided workflows with semantic constraints from metadata catalogs and component catalogs

QuickTime™ and a decompressor

are needed to see this picture.

17Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Requirements (5)

Semantic representations are needed:• For workflow constituents

– Metadata properties of input, intermediate and final data products

– Metadata properties of workflow and component function• For user queries

– Express workflow sketches containing partial data descriptions (constraints)

Reasoning capabilities• Automatic creation of metadata for expected workflow data

products• Workflow matching to queries (exact and partial)

18Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Outline

Brief background on semantic workflows• Semantic workflow representations in Wings

Five uses of semantic workflows to assist users and their resulting requirements• Reproducibility• Validation• Metadata generation• Data discovery• Workflow discovery

Requirements for architecture components• Ontology repositories and services• Data/metadata catalogs and services• Component/service catalogs and services• Workflow catalogs and services

19Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Requirements on Core Ontology Repositories and Services

Component/service ontologies• Extend with semantic representations that support reasoning, not just their execution

Workflow ontologies• Develop workflow ontologies that enable shared workflow repositories

• Develop semantic layer for the workflow ontologies– Workflow steps must be able to represent component classes

– Support reasoning about workflows in all architecture components

20Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Requirements on Data/Metadata Catalogs and Services

Representing abstracts kinds of data (eg exclude format)

Representing metadata properties that are relevant to data analysis• Eg: the organization that contributed the data may be less relevant than the instrument used to collect it, its calibration, its quality and accuracy, etc.

21Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Requirements on Component/Service Catalogs and Services Represent abstract classes of software components

• Instances correspond to implemented codes/services Represent constraints on input data

• Metadata properties that make the component appropriate for a given input dataset

Represent constraints on output data• Metadata properties of expected input datasets given

the required outcome of the execution of the component

Represent constraints on parameter values• Constraints on parameter settings given properties of

input or output data Represent how metadata properties of inputs is

related to metadata of outputs• Metadata properties of output datasets given the

properties of the input datasets

22Yolanda Gil ([email protected])

USC Information Sciences Institute

January 10, 2010

Requirements on Workflow Catalogs and Services Semantic reasoning to specialize workflows

• Given user requirements and a high-level workflow, automatically generate valid execution-ready workflows

• Automatically insert lower level steps when needed (eg data format conversions)

Semantic reasoning to propagate constraints of each workflow step• Check constraints of each workflow step and propagate them

throughout the workflow• Incorporate constraints coming from the user’s

requirements with constraints from the individual steps of the workflow

Formulation of data catalog queries based on the metadata properties of a given dataset in the workflow

Workflow discovery and matching for a given user query• Need a language to express user queries as workflow

sketches containing partial data descriptions (constraints) and partial dataflow patterns

• Need semantic reasoning for matching such queries, both exact and partial matching