gergely sipos mta sztaki laboratory of parallel and distributed systems lpds.sztaki.hu

18
EGEE-II INFSO-RI- 031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems www.lpds.sztaki.hu [email protected] Life sciences applications on the EGEE Grid

Upload: tymon

Post on 15-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Life sciences applications on the EGEE Grid. Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems www.lpds.sztaki.hu [email protected]. The EGEE Project. Aim of EGEE: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Gergely Sipos

MTA SZTAKILaboratory of Parallel and Distributed Systems

www.lpds.sztaki.hu

[email protected]

Life sciences applicationson the EGEE Grid

Page 2: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 2

The EGEE Project

• Aim of EGEE: “to establish a seamless European Grid infrastructure for the support of the European Research Area (ERA)”

• EGEE– 1 April 2004 – 31 March 2006– 71 partners in 27 countries, federated in regional Grids

• EGEE-II– 1 April 2006 – 30 April 2008– Expanded consortium

• EGEE-III– 1 May 2008 – 30 April 2010– Transition to sustainable model

Page 3: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Life sciences cluster in EGEE

Life sciences is one of the strategic communities for EGEE

• Life sciences cluster in EGEE:– To increase the impact of EGEE on this community– To drive the development of the EGEE services– To develop domain specific, high level services– Main topics:

Drug discovery Medical imaging Bioinformatics

Page 4: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

4

Enabling Grids for E-sciencE

Biomed Virtual Organization

Size of the infrastructure today:• > 250 sites in 48 countries• > 68 000 CPU cores• ~ 20 PB disk + tape MSS• > 150 000 jobs/day• > 9000 registered usersOut of which, Biomed VO:• > 100 sites in 30 countries• ~ 17 000 CPU• > 150 registered users

Page 5: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

6

Enabling Grids for E-sciencE

Life sciences applications

Resources

Communication layer (GEANT, Internet...)

EGEE middleware services

Applications

Pro

du

ctio

n g

rid

infr

astr

uct

ure

lev

el

Resources Resources Resources Resources

Applications Applications Applications

Domain-specific services Domain-specific services

App

licat

ions

leve

l

Page 6: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

7

Enabling Grids for E-sciencE

Application example 1: WISDOM

Resources

Communication layer (GEANT, Internet...)

Biomed Virtual Organization, EGEE middleware services

WISDOM

Pro

du

ctio

n g

rid

infr

astr

uct

ure

lev

el

Resources Resources Resources Resources

AMGA metadata catalogDIANE grid job scheduler

GAP user interface moduleApp

licat

ions

leve

l

Page 7: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WISDOM In silico Drug Discovery

• WISDOM: http://wisdom.healthgrid.org/• Goal: find new drugs for neglected and emerging

diseases– Neglected diseases lack R&D– Emerging diseases require very rapid response time

• Need for an optimized environment– To achieve production in a limited time– To optimize performances

• Method: grid-enabled virtual docking– Cheaper than in vitro tests– Faster than in vitro tests

Page 8: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

9

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

High throughput virtual dockingEnabling Grids for E-sciencE

Chemical compounds :Chembridge – 500,000Drug like – 500,000

Targets :Plasmepsin II (1lee, 1lf2, 1lf3)Plasmepsin IV (1ls5)(enzymes)

Millions of chemicalcompounds available

in laboratories

High Throughput Screening1-10$/compound, nearly impossible

Molecular docking (FlexX, Autodock)~80 CPU years, 1 TB data

Computational data challenge~6 weeks on ~1000/1600 computers

Hits screeningusing assays performed onliving cells

Chemical compounds : ZINCMolecular docking : FlexX, AutodockTargets structures : PDBGrid infrastructure : EGEE

Leads

Clinical testing

Drug

Page 9: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Computing model & workflow

Simulationjobs run on theEGEE Grid

Simulationresults stored

on the EGEE Grid

Page 10: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Efficiency

Estimated duration on 1 CPU 88.3 years

Duration on EGEE 6 weeks

Cumulative number of Grid jobs 54,000

Maximum number of concurrent CPUs used

2,000

Approximated throughput 2 sec/docking

• Second data challenge for avian flu drug analysis– 8 targets against 300,000 compounds

(2,400,000 simulations)

Page 11: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Statistics of deployment

• First Data Challenge: July 1st - August 15th 2005– Target: malaria– 80 CPU years– 1 TB of data produced– 1700 CPUs used in parallel– 1st large scale docking on world-wide e-infrastructure

• Second Data Challenge: April 15th - June 30th 2006 – Target: avian flu– 100 CPU years– 800 GB of data produced– 1700 CPUs used in parallel– Infrastructure was configured in 45 days

• Third Data Challenge: October 1st - 15th December 2006 – Target: malaria– 400 CPU years– 1,6 TB of data produced– Up to 5000 CPUs used in parallel– Very high docking throughput: > 100.000 compounds per hour

Page 12: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

14

Enabling Grids for E-sciencE

Application example 2: Bronze standard

Resources

Communication layer (GEANT, Internet...)

Biomed Virtual Organization, EGEE middleware services

Bronze standard workflow

Pro

du

ctio

n g

rid

infr

astr

uct

ure

lev

el

Resources Resources Resources Resources

MOTEUR workflow manager

App

licat

ions

leve

l

Page 13: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Scientific challenge

• Medical image registration is the process by which two images acquired independently are registered into a common frame.

Unregistered Registered

O1

O2

T

• Registration accuracy is critical for many image analysis procedures• Bronze Standard is a statistical procedure to estimate the performance of registration algorithms

Page 14: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Implementation on EGEE Enabling Grids for E-sciencE

A Params

PFRegister

Service

GetFromEGEE YasminaPFMatchICP

CrestLines

B

Baladin

FormatConv GetFromEGEE GetFromEGEE

GetFromEGEE

FormatConv

FormatConv FormatConv

MultiTransfoTest

ParamsParams Params

Params

Params

Accuracy Translation Accuracy Rotation

WriteResults

WriteResults

WriteResults WriteResults

Params

MethodToTest

Params Params

~100 image pairs

~800 EGEE jobs

Page 15: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

17

Enabling Grids for E-sciencE

Application example 3: Bioinformatics Grid Portal

Resources

Communication layer (GEANT, Internet...)

Biomed Virtual Organization, EGEE middleware services

Bioinformatics Grid Portal

Pro

du

ctio

n g

rid

infr

astr

uct

ure

lev

el

Resources Resources Resources Resources

App

licat

ions

leve

l

Page 16: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

18

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

18

GPSA: Bioinformatics Grid Portal

• Scientific objectives– Protein sequence analysis– Analyse data from high-throughput Biology: genome projects, structural biology, ….

• Tools–Web interface: NPS@–Protein databases are stored on grid storage as flat files

SWISS-PROT, SP-TrEMBL, NRL_3D, PATTINPROT, …

– Legacy bioinformatics applications

FASTA, BLAST, PSI-BLAST, SSEARCH, …

• Contact– http://npsa-pbil.ibcp.fr/– [email protected]

Page 17: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

20

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

How to get involved with EGEE

• More information on EGEE:– http://www.eu-egee.org – Life Sciences cluster:

http://technical.eu-egee.org/index.php?id=258 – Coordinator of life sciences cluster:

Vincent BRETON ([email protected])

• To get your own application ported to EGEE:– Support team: http://www.lpds.sztaki.hu/gasuc

• To get access to Biomed Virtual Organization– Obtain a certificate from NIIF CA: http://www.ca.niif.hu/– Register to Virtual Organization:

https://voms.cnaf.infn.it:8443/voms/bio/webui/request/user/create – Access grid from P-GRADE Portal, Bioinformatics Grid Portal, etc.

• EGEE User Forum, Catania, Italy, 2-6 March, 2009:– http://indico.cern.ch/conferenceDisplay.py?confId=40435

Page 18: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

21

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 21

www.eu-egee.org

www.lpds.sztaki.hu

Gergely Sipos

[email protected]