“grid platform for drug discovery” project

23
2003/10/3 UK Jpana N+N Meeting 1 “Grid Platform for Drug Discovery” Project Mitsuhisa Sato Center for Computational Physics, University of Tsukuba, Japan

Upload: geona

Post on 15-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

“Grid Platform for Drug Discovery” Project. Mitsuhisa Sato Center for Computational Physics, University of Tsukuba, Japan. Our Grid Project. JST-ACT program: “Grid platform for drug discovery”, funded by JST(Japan Science and Technology Corporation), 1.3 M$/ 3 years started from 2001 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

1

“Grid Platform for Drug Discovery” Project

Mitsuhisa Sato

Center for Computational Physics,University of Tsukuba, Japan

Page 2: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

2

Our Grid Project• JST-ACT program: “Grid platform for drug discovery”, fund

ed by JST(Japan Science and Technology Corporation), 1.3 M$/ 3 years started from 2001– Tokushima University, Toyohashi Inst. Of Tech., University of Tsuk

uba, Fuji Res. Inst. Corp.

• ILDG: International Lattice QCD Data Grid– CCP, U. of Tsukuba, EPCC UK, SciDAC US.– Design of QCDML– QCD Meta database by web services, QCD data sharing by SRM a

nd Globus replica …

Page 3: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

3

High Throughput Computing for drug discovery

• Exhaustive parallel conformation search and docking over Grid

• Accumulation computing results into large scale database and reuse

• High performance ab initio MO calculation for large molecules on clusters

“Combinatorial Computing”Using Grid

Page 4: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

4

Grid applications of our drug-discovery• Conformation search : find possible confirmations• Docking search: compute energy of combination of molecules • Quantitative Structure-Activity Relationships (SQAR) analysis:

finding rules of drug design

ConformationSearch

Druglibraries

Docking Search

(using ab initio MO calculation)

Conform-ations

DockingComputation

results

QSARanalysis

target

CONFLEX-GGrid enabled Conformation

Search application

MO in clustersJob submission for MO

Coarse-grain MO for Grid(REMD, FMO)

Design of XML for results

Web service interface

Page 5: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

5

Stepwise Rotation

Corner Flap

Edge Flip

CONFLEX• Algorithm: tree searchAlgorithm: tree search

– Local conformation changesLocal conformation changes

– Initial conformation selectionInitial conformation selection

• We are implementing with We are implementing with

OmniRPCOmniRPC

– Tree search action is dynamic!!!Tree search action is dynamic!!! Conformation search tree

AntiE=0.0 kcal/mol

Gauche +E=0.9 kcal/mol

Gauche -

E=0.9 kcal/mol

Page 6: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

6

Gird Platform for drug discovery

Univ. of Tsukuba AIST

Toyohashi Inst. Of Tech.Tokushima Univ.

Control & monitoring•Scheduling and monitoring of computations•distributed data base management•Design of G rid middleware

Development of large-scaleab-initio MO program

Cluster for CONFLEXdevelopment of conformation searchprogram (CONFLEX)

Database for CONFLEX results

Database of MO calculation results

3D structure databasefor drug design

wide-area networkwide-area network

request

requestrequest

request

Page 7: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

7

What can Grid do?Parallel Applications, programming, and our view  for grid

• “Typical” Grid Applications– Parametric execution: Execute the

same program with different parameters using an large amount of computing resources

– master-workers type of parallel program

• “Typical” Grid Resources– A Cluster of Clusters: some PC

Clusters are available– Dynamic resources: load and status are changed time-to-time.

PCPCPCPC

PC ClusterPC Cluster

PCPCPCPC

PC ClusterPC Cluster

Our View

PCPCPCPC

PC ClusterPC Cluster

Grid Grid EnvironmeEnvironme

ntnt

Grid Grid EnvironmeEnvironme

ntnt

Page 8: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

8

Parallel programming in Grid– Using Globus shell (GSH)

• Submit batch job scripts to remote nodes• staging and workflow

– Grid MPI (MPICH-G, PACX MPI, …)• General-purpose, but difficult and error-prone • No support for dynamic resource and fault-tolerance• No support for Firewall, clusters with private network.

– Grid RPC• a good and intuitive programming interface• Ninf, NetSolve, …

OmniRPCOmniRPC

Page 9: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

9

Overview of OmniRPCA Grid RPC system for parallel computing

• Provide seamless parallel programming environment from clusters to grid. – It use “rsh” for a cluster, “GRAM” for a grid managed by Globus, “ssh” for a conve

ntional remote nodes.– Program development and testing in PC clusters– Product run in Grid to exploit huge computing resources– User can switch configuration with “host file” without any modification

• Make use of remote clusters of PC/SMP as Grid computing resource– Support for clusters in firewall and private address

PCPC

PC ClusterPC Cluster

PCPCPCPC

PCPC

PCPC

PC ClusterPC Cluster

PCPCPCPC

PCPC

PCPC

PC ClusterPC Cluster

PCPCPCPC

PCPC

PCPC

PC ClusterPC Cluster

PCPCPCPC

PCPC

ClientClient

Grid Grid EnvironmentEnvironment<? xml version=“1.0 ?>

<OmniRpcConfig> <Host name=“dennis.omni.hpcc.jp” > <Agent invoker=“globus” mxio=“on”/> <JobScheduler type=“rr” maxjob=“20”/></Host></OmniRpcConfig>

Host fileHost file

Page 10: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

10

Overview of OmniRPC (cont.)

• Easy-to-use parallel programming interface

– A gridRPC based on Ninf Grid RPC– Parallel programming using asynchronou

s call API– The thread-safe RPC design allows to us

e OpenMP in client programs

• Support Master-workers parallel programs for parametric search grid applications

– Persistent data support in remote workers for applications which requires large data

• Monitor and performance tools

int main(int argc, char **argv){ int i, A[100][100],B[100][100][100],C[100][100][100];

OmniRpcRequest reqs[100]; OmniRpcInit(&argc, &argv);

for(i = 0; i< 100; i++) reqs[i] = OmniRpcCallAsync(“mul”,100, B[i], A, C[i]); OmniRpcWaitAll(100,reqs); . OmniRpcFinalize(); return 0;}

Page 11: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

11

OmniRPC features• need Globus?

– No, you can use “ssh” as well as “globus”– It is very useful for an application people.– “ssh” can solve “firewall” problem.

• Data persistence model?– Parameter search type application need to share the initial data. – OmniRPC support it.

• Can use many (remote) clusters?– Yes, OmniRPC supports “cluster of clusters”.

• How to use in different machine and environment ?– You can switch the configuration by “config file” without modification on so

urce program.

• Why not “Grid PRC” standard?– OmniRPC provides high level interface, to avoid “scheduling” and “fault-tol

erance” from users.

Page 12: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

12

OmniRPC Home Page

http://www.omni.hpcc.jp/omnirpc/

Page 13: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

13

Conflex from Cluster to Grid

• For large bimolecules, the number of combinational trial structure will be huge!

• Geometry optimization of large molecular structures requires more time to compute!

• Geometry optimization phase takes more than 90% in total execution time

• So far, executed on PC Cluster by using MPI

Grid allows to use huge computing resources to overcome these problem!

Page 14: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

14

Our Grid Platform

Univ. of TsukubaUniv. of TsukubaDennis ClusterDennis ClusterDual P4 Xeon 2.4GHzDual P4 Xeon 2.4GHz10 nodes10 nodesAlice ClusterAlice ClusterDual Athlon 1800+Dual Athlon 1800+14 nodes14 nodes

AISTAISTUME ClusterUME ClusterDual P3 1.4GHzDual P3 1.4GHz32 nodes32 nodes

Tokushima Univ.Tokushima Univ.Toku ClusterToku ClusterP3 1.0GHzP3 1.0GHz8 nodes8 nodes

Toyohashi Univ. of Toyohashi Univ. of Tech.Tech.Toyo ClusterToyo ClusterDual Athlon 2000+Dual Athlon 2000+8 nodes8 nodes

Tsukuba Tsukuba WANWAN

SINETSINET

Page 15: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

15

Summary of Our Grid Environment

Dual P3 1.4GHz

P3 1GHz

Dual Athlon 1800+

Dual Athlon 1800+

Dual P4 Xeon 2.4GHz

Machine overview

32

88

1410

# of Nodes

--Dennis

0.6924.40Toku

2.122.73UME

0.5513.00Toyo

11.220.18Alice

Throughput (MB/s)#

RTT *( ms)#

Cluster

* Round-Trip Time# All measurement Dennis Cluster and Each Cluster

Page 16: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

16

CONFLEX-G:Grid enabled CONFLEX

• Parallelize molecular geometry optimization phase using Master/Worker model.

• OmniRPC persistent data model (automatic initializable remote module facility) allows to reuse workers for each call.

– Eliminate initializing worker program at every PRC.

Selection of Initial Structure

LocalPerturbation

GeometryOptimization

Comparison & Store

ConformationDatabase

PC

PC Cluster A

PCPCPC

PC

PC Cluster B

PCPCPCPC

PC Cluster C

PCPCPC

Page 17: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

17

Experiment Setting• CONFLEX’s version: 402q• Test data: Two Molecular samples

– C17 (51 atoms)

– AlaX16a (181 atoms).

• Authentication method :SSH • CONFLEX-G client program was executed on the server node of

Dennis cluster• We used all nodes in clusters of our grid

Page 18: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

18

Sample Molecules

320

522

# of opt. trial structures

96000= 26.7(h)

300160AlaX16a(181 atoms)

8351.6 48C17(51 atoms)

Estimated total exec.time for all Trial structures in Dennis’s Single CPU (s)

Average exec. time to opt. trial structure (s)

# of trial structure at one opt. phase(degree of parallelism)

data

Page 19: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

19

Comparison between OmniRPC and MPI in Dennis Cluster

C17 ( 51 atoms, degree of parallelism 48 )

0

200

400

600

800

1000

1200

1 2 4 8 16 20

Number of Workers

Tot

al e

xecu

tion

Tim

e (s

)

SequentialMPIOmniRPC

Overhead of   On-Demand Initialization of worker program in OmniRPC

10 times Speedup using OmniRPC

Page 20: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

20

Execution time of AlaX16a(181 atoms, degree of parallelism 160)

0 500 100015002000250030003500

Total execution time (s)

Dennis MPI(20w)Dennis(20w)

Alice(28w)

UME(64w)Dennis+Alice(48w)

Dennis+UME(84w)Alice+UME(92w)

Dennis+Alice+UME(112w) 64 times Speedup

Page 21: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

21

Discussion

• Performance of CONFLEX-G was observed to be almost equals to that of CONFLEX with MPI– Overheads to initialize workers was found. It will be required to imporve.

• We could achieve performance improvement using multiple clusters, – A speedup of 64 on 112 workers in AlaX16a(181 atoms)– However … , In our experiment:

• Each workers takes only one or two trial structures, too few!• Load in-balance occurs because exec. time of each opt. varies.

• We expect more speed up for larger molecule.

Page 22: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

22

Discussion (cont’d)

• Possible improvement:– Exploit more parallelism

• Parallelize the outer loop to increase the number of structure optimization at a time

– Efficient Job Scheduling • Heavy jobs -> fast machines• light jobs -> slow machines

– Can we estimate execution time ?

– Parallelize worker program by SMP(OpenMP)• Increase the performance of worker• Reduce the number of workers

Page 23: “Grid Platform for Drug Discovery” Project

2003/10/3UK Jpana N+N Meeting

23

Summary and Future work• Conflex-G: Grid-enabled molecular confirmation search.

– We used OmniRPC to make it grid-enabled.– We are actually doing product-run..

• For MO simulation (Docking), we are working on coarse-grain MO, as well as job submission– REMD (replica exchange program using NAMD)– FMO (Fragment MO)

• For QSAR– Design of ML to describe computation results– Web service interface to access the database