requirements for cabig infrastructure to support semantic workflows

Click here to load reader

Post on 02-Jan-2016

27 views

Category:

Documents

2 download

Embed Size (px)

DESCRIPTION

Requirements for caBIG Infrastructure to Support Semantic Workflows. Yolanda Gil, PhD Information Sciences Institute and Department of Computer Science University of Southern California gil@isi.edu http://www.isi.edu/~gil. Outline. Brief background on semantic workflows - PowerPoint PPT Presentation

TRANSCRIPT

TRELLISBrief background on semantic workflows
Semantic workflow representations in Wings
Five uses of semantic workflows to assist users and their resulting requirements
Reproducibility
Validation
Execution management:
Validation of workflows
January 10, 2010
Semantic Workflows in Wings [Kim et al CCPEJ 08; Gil et al IEEE eScience 09; Gil et al K-CAP 09; Kim et al IUI 06; Gil et al IEEE IS 2010]
Workflows augmented with semantic constraints
Each workflow constituent has a variable associated with it
Nodes, links, workflow components, datasets
Workflow variables can represent collections of data as well as classes of software components
Constraints are used to restrict variables, and include:
Metadata properties of datasets
Constraints across workflow variables
Reasoning about semantic constraints in a workflow
Algorithms for semantic enrichment of workflow templates
Algorithms for matching queries against workflow catalogs
Algorithms for generating workflows from high-level user requests
Algorithms for generating metadata of new data products
Algorithms for assisting users w/creation of valid workflow templates
*
Each constituent (node, link, component, dataset) has a corresponding variable
Semantic properties
Constraints on reusable template (shown below)
Constraints on current user request (shown above)
[modelerInput_not_equal_to_classifierInput:
Brief background on semantic workflows
Semantic workflow representations in Wings
Five uses of semantic workflows to assist users and their resulting requirements
Reproducibility
Validation
1) Easily Replicate Previously Published Results
A catalog of carefully crafted workflows of select state-of-the-art methods to cover a wide range of common analyses
Many implementations of same algorithm, some proprietary
Same implementation but new versions and bug fixes
With such catalog, the effort involved in reproducing results is greatly reduced
Semantics needed to assist users to use workflows correctly
*
Representing abstract classes of software components
Instances are the implemented codes
Workflow steps refer to component classes
Representing abstract kinds of data (eg exclude format)
Semantic reasoning needed to specialize workflow
To map the abstract workflow into an execution-ready workflow
To insert lower level steps (eg data transformations)
*
2) Ensure Correct Use of State-of-the-Art Methods
Analytic software and methods are well documented but all is text (papers, manuals, etc)
Time consuming, hard to spot interdependencies, no validation
*
Representing requirements of software components
Constraints on input data
Representing metadata properties of datasets
Semantic reasoning needed:
To propagate constraints across the workflow
*
Metadata annotations are tedious and involved
Often not done, an obstacle to sharing and to reuse
*
Representing metadata properties of input datasets
Semantic reasoning needed:
To propagate metadata across the workflow
*
4) Discovery of Relevant Data
Workflows reused from a catalog may require additional data besides what is provided by the user
Semantic workflows can help identify characteristics of required datasets and query data catalogs to find them for the user
Need a dataset of
*
Semantic representations needed:
Metadata properties of any additional input datasets in the workflow, including:
Default properties for the given workflow
Augmented properties that result from the specific input data provided by the user
Semantic reasoning needed:
Propagation of semantic constraints through the workflow
*
Uses of Semantic Workflows:
5) Retrieval of Workflows
Hard to find workflows for the type of analysis a user wants
Semantic information is not provided when creating the workflow
However, retrieval queries are often based on metadata properties of data
e.g., “Find workflows that can normalize data which is continuous and has missing values [<- constraints on inputs] to create a decision tree model [constraint on intermediate data products]”
Semantic workflows needed to augment user-provided workflows with semantic constraints from metadata catalogs and component catalogs
*
Metadata properties of workflow and component function
For user queries
Reasoning capabilities
Workflow matching to queries (exact and partial)
*
Brief background on semantic workflows
Semantic workflow representations in Wings
Five uses of semantic workflows to assist users and their resulting requirements
Reproducibility
Validation
Component/service ontologies
Extend with semantic representations that support reasoning, not just their execution
Workflow ontologies
Develop semantic layer for the workflow ontologies
Workflow steps must be able to represent component classes
Support reasoning about workflows in all architecture components
*
Representing metadata properties that are relevant to data analysis
*
Instances correspond to implemented codes/services
Represent constraints on input data
Metadata properties that make the component appropriate for a given input dataset
Represent constraints on output data
Metadata properties of expected input datasets given the required outcome of the execution of the component
Represent constraints on parameter values
Constraints on parameter settings given properties of input or output data
Represent how metadata properties of inputs is related to metadata of outputs
*
Given user requirements and a high-level workflow, automatically generate valid execution-ready workflows
Automatically insert lower level steps when needed (eg data format conversions)
Semantic reasoning to propagate constraints of each workflow step
Check constraints of each workflow step and propagate them throughout the workflow
Incorporate constraints coming from the user’s requirements with constraints from the individual steps of the workflow
Formulation of data catalog queries based on the metadata properties of a given dataset in the workflow
Workflow discovery and matching for a given user query
Need a language to express user queries as workflow sketches containing partial data descriptions (constraints) and partial dataflow patterns

View more