scientific workflow management in the vl-e framework sub-program 2.5 department of computer science...

24
Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Post on 23-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Scientific workflow management in the VL-e framework

Sub-program 2.5 Department of Computer Science

Universiteit van Amsterdam

Page 2: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Outline

• Background– Scientific experiments, Workflow and e-Science

framework

• Workflow management in the VL-e framework– The approach followed review the related work– Application use cases and workflow support

• Future work

Page 3: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Scientific experiments & e-Science

Step1: designing an experiment

Step2:performing the

experiment Step3:analyzing the

experiment resultssuccess

Complex experiments: have complex processes require interdisciplinary expertise require large scale resources

Grid & high level support

Scientific workflows

Page 4: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Scientific Workflow Management Systems in an e-Science environment

• Functionalities:– Automating experiment

routines;– Rapid prototyping of

experimental computing systems;

– Hiding integration details between resources;

– Managing experiment lifecycle;

• Cross different layers of middleware for managing:

– Data; – Computing;– Information;– Knowledge.

Generic Grid middleware

Data management

Computing tasks

Information

Knowledge

SWMS High level workflow services

Engine

Use

r su

pp

ort

Domain specific Applications

e-Science framework

Grid infrastructure

Workflow Management system

In the VL-e project the targeted e-science framework is …

Page 5: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

VL-e workflow wish list

• Classified in 4 categories:– Functionality and Capability– User interface characteristics– Run time capabilities– Software engineering aspects

VL-e SIG Workflow meeting Jan 11th, 2005, 10:00–11:30, H220 (NIKHEF building)• Present: Belleman, Belloum, Bouwhuis, Breanndán, Kaletas, Konijnenburg,

Marshall, Rauwerda, Sterk, Sluiter, Terpstra, Vasunin, wibisono, Yakali.

• A list of 36 points was established to characterise the ideal workflow for the VL-e

Page 6: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Prioritize the workflow requirements based on the VL-e Applications

• Classified in 4 categories:– Application domains Model;– Engineering;– Underlying middleware;

– Workflow management system:• Composition/ Engine (runtime issues)/User support

• A list of 12 points was established to characterise the practical workflow for VL-e

VL-e sub-program 2.5 in collaboration with SP1.X developers• SP1.X contributors: Belleman, Klous, Konijnenburg, Marshall, Rauwerda, Sluiter,

Terpstra,

Page 7: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Application use cases and workflow requirements

Application use cases– Different rounds: a series of meetings – Distinguish workflow requirement

Summary– From the resource perspective:

• To support legacy tools;• standard middleware, e.g., web/grid services;• To be able to invoke resources from different systems;• Provides a rich library of workflow components;

– From the application process perspective:• To efficiently manage parallel processes/tasks in an experiment (Job farming);• To efficiently explore large parameter space (Parameter sweep);• To support knowledge based information processing (semantic level data integration).

– From the perspective of using a SWMS:• To provide a friendly user interface (preferably a GUI);• To support the development of new workflow components ( java, scripts, C++,

documentation and support);• To be able to execute tasks on distributed resources (clusters or Grid);• To be stable at runtime;• To be able to interoperate with different workflow management systems.

Page 8: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Workflow management in VL-e

• First prototype– VLAM-G– Shortcoming (GUI, control flow, monitoring etc. + software engineering)

• Approach– Collect and analyze application use cases – Review the state of art of workflow systems– Propose workflow systems for the PoC environment– Be active in use case projects – Learn lessons from use cases– Propose a new design

Based on the list of 36 items was established to characterize the ideal workflow for the VL-e, the VLAM-G scored: 13 Yes, 5 but need to be reimplementation, 09 No, 02 Partially supported, 6 In progress or Planned

Page 9: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Survey of existing workflow systems

http://staff.science.uva.nl/~gvlam//doc/P2/WorkflowSurvey

Participants: Belloum, De Boer, Guevara-Masis, Korkhov, Mirzadeh, Terpstra, van Hooft, Vasunin, wibisono, Yakali, Zhao.

Page 10: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Survey results

• Based on the survey and the practical tests on the nine workflow systems, we learn:

– All of the systems are still in beta-versions (even in alpha), and have the tendency to crash when we do relatively complex tests.

– None of the systems have support for collaboration, data sharing, and information management.

– None of the systems enforce best practice or provide support for knowledge capture.

– Most of systems are not geared to use Grid based systems, they have been built to work on a single system with some features to submit jobs on a remote host (user still exposed to some Grid related issues like writing RSLs).

– We have had some problems when testing some features described in the documentation.http://staff.science.uva.nl/~gvlam//doc/P2/SWMSRecommendationReport.pdf

Participants: Belloum, De Boer, Korkhov, Terpstra, van Hooft, Vasunin, wibisono, Zhao.

Page 11: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Recommendation for PoC R1(Part of the short term solution)

http://staff.science.uva.nl/~gvlam//doc/P2/SWMSRecommendationReport.pdf

Participants: Belloum, De Boer, Korkhov, Terpstra, van Hooft, Vasunin, wibisono, Zhao.

Page 12: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Use cases and small project teams

• Use case project teams– Participants from SPs from P1, P2, P3 and P4.– Contributions from workflow team: distinguish reusable

components and provide integration solution. – We are also active in project management, such as

decomposing the implementation into concrete tasks, and track the progress.

• Inside SP2.5, we divide the group members – SP1.2 Belloum & Korkhov– SP1.3 Belloum & De Boer– SP1.4 Zhao & Vasunin – SP1.5 Zhao & Wibisono– SP1.6 Belloum & Paul & De Boer

Page 13: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Collaboration with VL-e Applications

• SP1.2 – AID-Food informatics-IvI– WCFS case: searching in “Research Management

System” (Selected by the VLeIT) (ongoing …)

• SP1.3 – AMC-IvI– High-volume data management in the PoC SRB

(Selected by the VLeIT) (ongoing …)

• SP1.4 - IBED-IvI– Run KansK toolbox in Workflow environment

(Master thesis project) (ongoing …)

Page 14: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Collaboration with VL-e Applications

• SP1.5 IBU-IvI– Histone code - semantic data integration (Selected

by VLeIT) (ongoing …)

– Running R scripts on multiple nodes using web service (Finished)

– Running R scripts in workflows (ongoing …)

– Ridge-O-grammer (ongoing …)

• SP1.6 AMOLF-IvI– SRB Meta data update from file header (Selected

by VLeIT) (ongoing …)

Page 15: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

SP1.2: WCFS case: searching in Research Management System”

indexindexerdocuments

config

searcher

queryformulation

question

list

ontologyrepositories

interface

AID tools

Lab. ExpInSample

OutData

AnalysisInData

OutData

Situation Problem

Researchquestion

Answer / conclusion

LiteratureLit Report

• Much data in scientific research

• But:– No reuse: data not available across projects– No context: meaning of data not known– Not reproducible experiments– Only successful experiments traceable

• Wish:– Research Management System: manage

experimental data for WCFS researchers

Page 16: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

SP1.3: High-volume data management in the PoC SRB

• The goal of the use case is to:– Facilitate the data management and analysis

for the functional MRI studies bu using PoC resources for computation and resources

• Matrix cluster • SRB

• FMRI pilot is going to be developed as a first step.

Page 17: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

SP1.4: Run KansK toolbox in Workflow environment

• To be integrated in workflow – VLAM

• The toolbox main processes are dealing with the data preparation, evaluate, prediction, and display

• The workflow is about the prediction of the location of the birds

Page 18: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

SP1.5: Histone code - semantic data integration

Model Alignment / Model Extension

Data Acquisition

e.g. Dbconnection, API, screen scraper

Map

e.g. Table -> RDF + model

Flat map to RDF RDF to structured RDF

Assign LSID’s

• Scaling problems– Sesame– Jena

Data Import

Data ImportUCSC tables RDF repository

Data ExplorationExtract overlapping genome locations

Knowledge & Data Discovery

Page 19: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Read data

Normalization

F test

Gene data generator

R web services

Model

Raw data

Normalized data

FILE

V plot

MatrixFDR

Gene data

Model

Local Grid

Activity

Data

SP1.5: Running R scripts in workflows

SP1.5 side

(Frans and Han)

SP2.5 side

(Wibi, Zhiming)Define concrete description

Provide UML based analysis diagrams

Have a meeting: decompose the task

Implement the functionality in the modules (Kepler Actor or VLAM module)

Work together and give necessary support.

Integrating modules into a workflow (a integration meeting)

Refine the modules Refine workflow

Final demonstration

Page 20: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

SP1.5: Ridge-O-grammer

Input: Tamscriptome map

Slide Window Median (SWM)

Slide Window Median Probability (SWMP)

Histogram of frequencies

(HF)

Histogram of probabilities

(HP)

False Discovery Rate (FDR)

Output: List of Ridges

The outcome of this work is going to be presented at “Netherlands Bioinformatics Conference” - 24 April 2006

identify ridges(regions of increased gene expression)

Page 21: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

On going development Activities on the rapid prototyping environment

• Simple file management tools for SRB, and GridFTP

• R scripts in workflow system

• Parameters sharing of workflow components.

• Service discovery using P2P approach

• Parameter Sweep and Job farming

Page 22: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Future work

• By far the most active and rapidly progressing WMS is Kepler

• Beta-version March 2006.

• Kepler/Ptolomy has two ways of extending the Systems:

• Actors• Directors

Page 23: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

Summary

• Survey results showed that the e-science WMS targeted in VL-e – Does not exist yet– Collaboration with other Workflow project will

likely speed up the development process

• Project teams working on application use case is the only way to progress

• VLAM is still quite useful for rapid prototyping

Page 24: Scientific workflow management in the VL-e framework Sub-program 2.5 Department of Computer Science Universiteit van Amsterdam

ReferencesPeople:

Adam Belloum (SP2.5 leader), Zhiming Zhao, Paul van Hooft (post doc), Adianto Wibisono, Dmitry Vasyunin , Vladimir Korkhov , Frank Terpstra (Ph.D students), Piter de Boer (Programmer)

VL-e Reports:1. PoC recommendation report;

Publications:1. Z. Zhao; A. Belloum; H. Yakali; P.M.A. Sloot and L.O. Hertzberger: Dynamic Workflow in a Grid

Enabled Problem Solving Environment, in Proceedings of the 5th International Conference on Computer and Information Technology , pp. 339-345 . IEEE Computer Society Press, Shanghai, China, September 2005.

2. Z. Zhao; A. Belloum; A. Wibisono; F. Terpstra; P.T. de Boer; P.M.A. Sloot and L.O. Hertzberger: Scientific workflow management: between generality and applicability, in Proceedings of the International Workshop on Grid and Peer-to-Peer based Workflows, pp. 357-364. IEEE Computer Society Press, Melbourne, Australia , September 19th-21st 2005.

3. Z. Zhao; A. Belloum; P.M.A. Sloot and L.O. Hertzberger: Agent technology and scientific workflow management in an e-Science environment, in Proceedings of the 17th IEEE International conference on Tools with Artificial Intelligence, pp. 19-23. IEEE Computer Society Press, Hongkong, China, November 14th-16th 2005.

Activity:1. Int’l workshop on Workflow systems in e-Science, organized by Zhiming Zhao and Adam Belloum, in

the context of ICCS06, Reading University, May 28, 2006.

2. Workshop on Workflow systems in e-Science, to be held during the next e-Science conference in Amsterdam December 2006.