dcis and workflows c. vuerli contributions by g. sipos, k. varga, p. kacsuk

45
DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Upload: loraine-gaines

Post on 25-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

DCIs and Workflows

C. Vuerli

Contributions by G. Sipos, K. Varga, P. Kacsuk

Page 2: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Definitions• Distributed computing: distributed systems to solve

computational problems– Computer clusters: sets of loosely connected or tightly

connected computers that work together so that in many respects they can be viewed as a single system

– Grids: distributed, highly loosely coupled, heterogeneous and geographically dispersed systems with non-interactive workloads that involve a large number of files

– HPC systems: enable users to run a single instance of parallel software over many processors.

– Cloud computing: applications and services offered over the Internet from data centers all over the world, which collectively are referred to as the “clouds”. The main enabling technology for cloud computing is virtualization which abstracts the physical infrastructure – which is the most rigid component – and makes it available as a soft component, easy to use and manage.

Page 3: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

DCI Projects• EDGI (European Desktop Grid Initiative)

– http://edgi-project.eu/• EGI – InSPIRE (Integrated Sustainable Pan-European

Infrastructure for Researchers in Europe)– http://www.egi.eu/projects/egi-inspire/

• EMI (European Middleware Initiative)– http://www.eu-emi.eu/

• IGE (Initiative for Globus in Europe)– http://www.ige-project.eu/

• StratusLab (Enhancing Grid Infrastructures with Virtualization and Cloud Technologies)– http://www.stratuslab.eu/

• VENUS-C (Virtual multidisciplinary EnviroNments USing Cloud Infrastructures)– http://www.venus-c.eu/

Page 4: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

EGI• European

– Over 35 countries• Grid

– Secure sharing• Infrastructure

– Computers– Data– Instruments– …. and beyond!!

Page 5: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

EGI Resource Infrastructure Providers:National Grid Initiatives

(Snapshot: April 2012)

Page 6: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

EGI, EGI.eu, EGI-InSPIRE• EGI – an open collaboration

– To support the digital European Research Area through a pan-European research infrastructure based on services federated from the NGIs

• EGI.eu – a Dutch foundation owned by the NGIs– To coordinate the work of EGI (operations, technology,

user support, policy, community & communications, administration)

– Sustainable small coordinating organisation• EGI-InSPIRE – an FP7 project

– Supports EGI.eu and EGI to transition to sustainable operational model

Page 7: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

EGI’s Strategic Focus• Community & Coordination

– Community building through events. • Next event: EGI Technical Forum, Madrid, Sept. 2013• http://tf2013.egi.eu/

– Community networking, marketing and outreach through the NGIs

• Operational Infrastructure– Operate a European wide infrastructure– Offer its use to other research infrastructures– Build a federated cloud environment

• Virtual Research Environments– Enable 3rd party integration & operation of VREs

Page 8: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Guiding Principles• Be a neutral resource provider

– Any application, any domain, any technology– A platform for domain specific innovation & use– Integration of any compliant resource

• End-user needs and technologies change– Allow communities to deploy their own services– Give communities the power to meet their own

needs

Page 9: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Virtual Organizations• VOs – Virtual Organizations

– Groups of researchers with similar scientific interests and requirements, who are able to work collaboratively with other members and/or share resources (e.g. data, software, expertise, CPU, storage space), regardless of geographical location.

• Researchers must join a VO in order to use grid computing resources provided by EGI.

• Each virtual organization manages its own membership list, according to the VO’s requirements and goals.

• EGI provides support, services and tools to allow VOs to make the most of their resources

Page 10: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk
Page 11: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Virtual Reseach Communities• VRCs – Virtual Research Communities

– Self-organized research communities which give individuals within their community a clear mandate to represent the interests of their research field within the EGI ecosystem.

– They can include one or more VOs and act as the main communication channel between the researchers they represent and EGI

• EGI establishes partnerships with individual VRCs through a Memorandum of Understanding (MoU).

• VRCs can access the computing resources and data storage provided by the EGI community through open source software solutions. VRC members can store, process and index large datasets and can interact with partners using the secured services of EGI’s production infrastructure

Page 12: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Virtual Reseach Environments• VREs – Virtual Research Environments

– A combination of environments that provide the researcher with easy access to the services deployed in EGI to enable their data analysis activities.

– Initially, the VRE consisted of a command line interface for the researcher to access the services deployed across the infrastructure.

– Over the years, higher-level generic tools and domain specific environments have been developed by many research communities to simplify the data analysis process.

Page 13: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

SSO Authentication• SSO – Single Sign On

– A session/user authentication process that permits a user to present his/her credentials to access multiple applications and resources.

– The process authenticates the user for all the applications they have been given rights to and eliminates further prompts when users switch from one application to another during a particular session.

Page 14: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

SSO Authentication Concepts

• Authentication– The process of determining whether someone or

something is, in fact, who or what it is declared to be on the basis of provided credentials.

– Authentication is commonly done through the use of username/password pairs.

– The use of digital certificates issued and verified by a Certificate Authority (CA) as part of a PKI (Public Key Infrastructure) is considered likely to become the standard way to perform authentication on the Internet.

Page 15: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

SSO Authentication Concepts

• IdP – Identity Provider– Provides Single Sign-On services. In addition

to a simple yes/no response to an authentication request, the Identity Provider can provide a rich set of user-related data to the Service Provider

• IdF – Identity Federation– Enables IdPs to exchange identity information

securely across domains, providing browser-based SSO

Page 16: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

EGI Platforms

Community Platform

EGI resources (cores & storage)

VMVM

VMEGI Core

InfrastructurePlatform

EGI Cloud Infrastructure

Platform

Virtual Research Communities

EG

I Co

llabo

ration

platfo

rm

Community Platform

(gLite Grid)

Grid services

Page 17: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Core Infrastructure Platform Services

• Service monitoring• Service usage accounting• Security (Authentication, Authorization)• Staged rollout of services• ...

Page 18: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Unified Middleware Distribution• External technology providers: EMI (ARC, gLite, UNICORE, dCACHE),

IGE (Globus), EDGI (desktop grids), SAGA, …• UMD capabilities captured in UMD roadmap• Staged Rollout: updates of the supported middleware are first released

to and tested by Early Adopter sites before being made available to all sites through the production repositories.

Operations

Provisioning Infrastructure

StagedRollout

CriteriaVerification

ProductionCriteria

Definition

External Technology Providers

Deployed Software

Helpdesk unit

Requirements

Software

Core infrastructure platform

Page 19: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

User communities of the UMD

User

User

User

User User

User

User

VO VO VO

Virtual Research

Community

Virtual Research

Community

Mem

bers

Virtual

Org

anisationsR

esearch com

mun

ities

Largest user communities:

• High Energy Physics• Astronomy & Astrophysics• Life Sciences• Earth Sciences• Computational Chemistry• Fusion• …

~25.000 users in ~200 Registered VOshttp://operations-portal.egi.eu/vo

Page 20: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

UMD use case 1:workload management with gLite

Computing service Storage Service

Site X of YOUR VO

Information System

Submit job

query

Retrieve output

Create job definitionSubmit job(batch executable + small inputs)Broker service

User environment

publishstate

VO Management Service (VOMS)

createProxy (~login)

process

Retrieve status&

(small) output files

Logging and bookkeeping service

Job status

Job status Loggin

g

Read/write files

20

Core in

frast

ruct

ure p

latfo

rm

Page 21: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Core in

frast

ruct

ure p

latfo

rm

UMD use case 2:data management with gLite

Computing service Storage service

Site X of YOUR VO

Information System

Query

Command line or Portal

publishstate

VO Management Service (VOMS)

createproxy

Upload fileDownload file

File Catalog and/orMetadata catalog

Register file Lookup file File

content

Filereferences 21

Page 22: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Cloud infrastructure platform

Hardware

OS

Gridmiddleware

Scientific portal

Hypervisor

Community specific, grid enabled services

OS OS

Scientific portal

Scientific portal

Hardware

OS

Grid middleware

Scientific portals(SaaS environments)

Community specific, grid

enabled services

OS OS

Custom services

OS OS

Page 23: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

• Structural biology – We-NMR project: Gromacs training environments

• Ecology – BioVel project: Remote hosting of OpenModeller service

• Linguistics – CLARIN project: Scalable ‘British National Corpus’ service (BNCWeb)

• Software development – SCI-BUS project: Simulated environments for portal testing

• Space science – ASTRA-GAIA project: Data integration with scalable workflows

Some of the cloud infrastructure pilot use cases

Join as pilot use case, or as provider of high level service on top of the EGI Cloud!

Page 24: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

What does the EGI.eu User Community Support Team do?

European level coordination and support to enable research communities to evolve into routine EGI users

– Knowing and driving technical activities(Virtual Team projects)

– Knowing and responding to technical needs• Support the development and deployment of innovative,

collaborative VREs– Integrating applications with EGI platforms– Training and consultancy– Facilitate the reuse of services across communities– Provide services for the EGI Collaboration Platform

• Applications Database, Training Marketplace, Requirements tracker, CRM

Page 25: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Services of the Collaborative platform 1: Applications Database

Benefits:• Gives recognition to reusable

scientific applications,application developer tools, portals, workflow systems, etc.

• Gives recognition to application developers (people profiles)

• Access through web page AND web gadget

• Community features such as commenting, rating, tag-based groupings

http://appdb.egi.eu

Page 26: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Benefits:• Register & share

• training events, expertise, services, materials, resources, online courses, university courses

• Browse and search items

• Community features such as commenting, rating

• Access through web page and web gadget

http://training.egi.eu

Services of the Collaborative platform 2:

Training Marketplace

Page 27: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Services of the Collaborative platform 3: Customer Management System (CRM)

• To capture leads and needs of scientific communities• EGI CRM – main capabilities (based on vTiger CRM):

– Register projects/communities– Register personal leads– Register details about ‘Potential for EGI use’ and

‘Interest in e-infrastructures’

• Available to NGI International Liaisons and their collaborator at http://crm.egi.eu

Page 28: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Services of the Collaborative platform 4: Requirements tracker

Technology Coordination

Board

EGIRequirements

TrackerNGIs

VRCs

VOs

User Community

Support Team of EGI.eu

Technology providersprojects

events

Input channels for community

requirements

EGI Helpdesk

User Community

Board

Operations Management

Board

Resource centres

Structured scientific

communities

NGIs&

projects http://go.egi.eu/requirements

Page 29: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

EGI Web Gadgets: Embed EGI services into Websites!

• The simplest way to reuse EGI services: embed EGI gadgets into your website

• Benefits– Customisability– Reusability– Compatibility

• Existing gadgets:– Application Database– Training Marketplace– Requirement Tracker– Grid Relational Catalogue

• Use and develop gadgets & share them through EGI:

www.egi.eu/user-support/gadgets

Page 30: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

EGI Virtual Team projects• NGI International Liaisons

– Single point of contact in NGIs for various activities, including Technical Outreach

• Virtual Team projects: https://wiki.egi.eu/wiki/Virtual_Team_Projects – Short (1-6 month) project with multiple NGIs involved– Setup through NILs– Help EGI enlarge its user base

• Active VT projects with UCST’s involvement:– Science Gateway Primer (Final editing of the Primer document)– NGI-ELIXIR ESFRI collaboration (Since October)– Technology study for CTA ESFRI (Since January)– Towards a Chemistry, Molecular & Materials Science and Technology

Virtual Research Community (In setup)

Page 31: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Wokflows

• Progression of steps (tasks, events, interactions) that– comprise a work process– involve two or more persons– create or add value to the organization's activities.

• Sequential workflow– each step is dependent on occurrence of the

previous step• Parallel workflow

– two or more steps can occur concurrently

Page 32: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Workflow Management Systems

• A Workflow Management System (WMS) is a piece of software that provides an infrastructure to setup, execute, and monitor scientific workflows– An important function of an WMS during the workflow

execution, or enactment, is the coordination of operation of individual components that constitute the workflow; this process is also often referred to as orchestration

– Scientific workflows and WMSs have emerged to provide an easy-to-use way of specifying the tasks that have to be performed during a specific in silico experiment. The need to combine several tools into a single research analysis still holds, but technical details of workflow execution are now delegated to Workflow Management Systems.

Page 33: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Workflow Editor The skeleton of a workflow is represented by a Graph.Jobs denote the activities, which envelop insulated computations Channels are directed edges of a graph, directed from the output ports towards the input ports.

Page 34: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Workflow Editor

We can rely on a local and a public workflow repository .

Page 35: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

• Graph creation• Concrete workflow creation• Concrete workflow configuration

– Job types and corresponding properties

– Port properties

• Certificate handling• Submission

– Log examination

• Submitted instance management• Result evaluation• Repository handling (export/import)

Workflow services

Page 36: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

CTA GatewayWorkflows instances

Page 37: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

CTA GatewayWorkflows instances

Page 38: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

SHIWA MotivationIsolation by technology

– Many different workflow systems exists We expect no change– Technological choice isolates users and user communities– Technological choice determines the adopted DCI/middleware

(WS, gLite, Globus, etc.)

Benefits of the SSP – SHIWA Simulation Platform– Share your own workflow, re-use workflows of others– Create and execute ‘meta-workflows’: built from smaller workflows

that use different workflow languages/technologies– Combine workflows and DCIs

SHIWA ER-flow– SHIWA: FP7 R&D project. Created the SHIWA Simulation Platform (2010-2012)– ER-flow: FP7 support action project. Disseminates the SHIWA technology (2012-

2014).

Page 39: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Generic use-cases

DCI #1DCI #2

CE SECE SE

DCI – distributed computing infrastructure; CE/SE – computing/storage element

Page 40: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Generic use-cases

DCI #1DCI #2

CE SECE SE

combine WFs

find and re-use

with own data

on own CE?

DCI – distributed computing infrastructure; CE/SE – computing/storage element

Page 41: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

SHIWA Simulation Platform

Page 42: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Access to various workflow engines and DCIs where these WF applications can be run

CloudsLocal clusters

Supercomputers

Desktop grids(BOINC, Condor, etc.)

Cluster grids(EGI, OSG, …)

Supercomputer grids

(PRACE, XSEDE)

Grid systems

Distributed Computing Infrastructures (DCIs)

What does the developer of a (meta)workflow need?

SHIWA Repository

Access to a large set of ready-to-run scientific workflow applications

SHIWA Portal

A portal/desktop to integrate, parameterize and run these applications

Web services

+ Workflow engines

Page 43: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

SHIWA Repository

Facilitates publishing and sharing workflows

Supports:• Abstract workflows with

multiple implementations of over 10 workflow systems

• Concrete workflows with execution specific data

Available:• From the SHIWA Portalhttp://ssp.shiwa-workflow.eu• Standalone interface:http://repo.shiwa-workflow.eu

Page 44: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Example: Create meta-workflow from Moteur and WS-PGRADE workflows

SHIWA Repository

SHIWA portal

DCI 1gLite

DCI 2Cloud

DCI nARC

Access to Moteur WF

Connected workflow engines

Moteur WF engine

Page 45: DCIs and Workflows C. Vuerli Contributions by G. Sipos, K. Varga, P. Kacsuk

Remarks• Scientific workflow landscape is fragmented

– Many workflow system with specialised & small user base

– Many workflows with specialised & small user base

• SHIWA Simulation Platform enables integration– Within disciplines– Across disciplines

• ER-flow project provides user support– Consultancy & Workshops– User forum– Open for new communities