cloud computing for chemical property prediction

24
Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK [email protected] Microsoft Cloud Futures Conference, Redmond, 8 th April am: David Leahy, Jacek Cala, Hugo Hiden, c Searson, Vladimir Sykora, Martyn Taylor, Simon Woodman hanks to: soft External Research for their financial support for Projec tophe Poulain, Savas Parastatidis

Upload: flynn-humphrey

Post on 02-Jan-2016

37 views

Category:

Documents


4 download

DESCRIPTION

Microsoft Cloud Futures Conference, Redmond, 8 th April 2010. Cloud Computing for Chemical Property Prediction. Paul Watson School of Computing Science Newcastle University, UK [email protected]. The team: David Leahy, Jacek Cala, Hugo Hiden, - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cloud Computing for Chemical Property Prediction

Cloud Computing for Chemical Property Prediction

Paul Watson

School of Computing Science

Newcastle University, UK

[email protected]

Microsoft Cloud Futures Conference, Redmond, 8th April 2010

The team: David Leahy, Jacek Cala, Hugo Hiden,Dominic Searson, Vladimir Sykora, Martyn Taylor, Simon Woodman

With thanks to:- Microsoft External Research for their financial support for Project Junior- Christophe Poulain, Savas Parastatidis

Page 2: Cloud Computing for Chemical Property Prediction

Chemists want to know:

Q1. What are the properties of this molecule?

Toxicity

Solubility

Biological Activity

Q2. What molecule would have aqueous solubility of 0.1 μg/mL?

Page 3: Cloud Computing for Chemical Property Prediction

Answering the Question by performing experiments

..... time consuming, expensive, ethical Issues

Page 4: Cloud Computing for Chemical Property Prediction

Quantitative Structure Activity Relationship - predict properties based on similar molecules

An alternative to experimentation: QSAR

Activity ≈ f( )

quantifiable structural attributes, e.g. #atoms

logp shape

.....

Page 5: Cloud Computing for Chemical Property Prediction

New/ImprovedModels

New DataorModel-Builders

DataModel-Builders

Model Generation

Models

Generating the models -Discovery Bus (Leahy et al)

www.openqsar.com

Page 6: Cloud Computing for Chemical Property Prediction

Selected Descriptors + Responses

Separate Training & Test Data

Chemical Structures & their Activities

Calculate Descriptors from Structures

Training Data

Combine Descriptors

Descriptors + Responses

Filter Descriptors

Combined Descriptors + Responses

MultipleLinear

Regression

Neural Network

Partial Least Squares

Classification Trees .....Build &

Test ModelsIndependently

Select Best Models

Add to Model Database

Test Data

Page 7: Cloud Computing for Chemical Property Prediction

Increasing amounts of data for model building...

CHEMBL : data on 622,824 compounds,collected from 33,956 publications

WOMBAT-PK: data on 1230 compounds,for over 13,000 clinical measurements

WOMBAT : data on 251,560 structures,for over 1,966 targets

All contain structure information & numerical activity data

More models Better models

Computationally expensive:5 years for new datasets on existing server

Page 8: Cloud Computing for Chemical Property Prediction

JUNIOR Project Aim

Use Azure to generate models in weeks not years.... using as much of the available data as possible.... make models available on www.openqsar.com

... so that researchers can generate predictions for their own molecules

Page 9: Cloud Computing for Chemical Property Prediction

Selected Descriptors + Responses

Separate Training & Test Data

Chemical Structures & their Activities

Calculate Descriptors from Structures

Training Data

Combine Descriptors

Descriptors + Responses

Filter Descriptors

Combined Descriptors + Responses

MultipleLinear

Regression

Neural Network

Partial Least Squares

Classification Trees .....Build &

Test ModelsIndependently

Select Best Models

Add to Model Database

Test Data

Potential for concurrency...

Page 10: Cloud Computing for Chemical Property Prediction

Approach

• avoid rewriting all existing Discovery Bus software

• move existing Discovery Bus to Amazon Cloud– without parallelisation

• move critical tasks to run concurrently on Azure• base solution around e-Science Central ....

Page 11: Cloud Computing for Chemical Property Prediction

Clouds to the rescue?

• Building scalable, dependable, science applications is still hard .....

• e-Science Central– Science Cloud Platform

Page 12: Cloud Computing for Chemical Property Prediction

Science Cloud Options

Cloud Infrastructure: Storage & Compute

Science Cloud Platform

ScienceApp 1 .... Science

App n

Users

Cloud Infrastructure:Storage & Compute

Scie

nce

Ap

p 1

....

Scie

nce

Ap

p n

Users

Page 13: Cloud Computing for Chemical Property Prediction

Cloud Infrastructure: Storage & Compute

Science Cloud Platform

ScienceApp 1 .... Science

App n

Users

e-Science Central

Science Cloud Platformfor developers

Science as a Servicefor users

Page 14: Cloud Computing for Chemical Property Prediction

What should the Science Cloud Platform Include?

Store

Analyse

Automate

Share

Data (instruments, experimental data, sensors...)

Identify the common needs of our e-Science Users

Page 15: Cloud Computing for Chemical Property Prediction

Science CloudPlatform

App ....

Workflow Enactment

App API

Social Networking

Security

Processing

e-Science Central

Storage

App

Analysis Services

CloudInfrastructure

Provenance

Metadata

Page 16: Cloud Computing for Chemical Property Prediction
Page 17: Cloud Computing for Chemical Property Prediction
Page 18: Cloud Computing for Chemical Property Prediction
Page 19: Cloud Computing for Chemical Property Prediction

Workflow Enactment

App API

Social Networking

Security

Processing

e-Science Central

Storage

Analysis Services

Azure

Provenance

Metadata

Discovery BusPlanner

Amazon

Page 20: Cloud Computing for Chemical Property Prediction

Discovery Bus invokes e-Science Central Workflow via API

Workflow decomposed to Message Plan1

2

3

Temporary workflow storage assigned, Message Plan queuedfor execution.

Workflow temporary storage

4

Mes

sage

s se

nt in

seq

uenc

e

Call Message

Response Message

Internal Service

Azure Service

NFS

HTTPCall Message

Response Message

RMI / JMS

HTTP Post

5 Results data stored in e-Science Central folder

Workflow ExecutionCompletes

5 Discovery Bus notified with results

Message Plan

Page 21: Cloud Computing for Chemical Property Prediction

Task Task TaskWeb Node

Worker Node

Worker Node

Worker Node

Queue

e-Science Central

Azure

Blob Storage

Results

Page 22: Cloud Computing for Chemical Property Prediction

• Running across up to 100 Azure nodes• Azure utilisation increasing (average ~60% over runs)• Moving more admin tasks to Azure / e-Science Central

– model validation– co-ordination

Current Status – Work in Progress

Page 23: Cloud Computing for Chemical Property Prediction

CPU Utilization

Page 24: Cloud Computing for Chemical Property Prediction

Summary

• Discovery Bus exemplifies a good Cloud pattern– large, variable, bursty requirements– proposal to apply to software verification

• clouds do NOT make it easier to build complex, scalable, dependable distributed systems– we need higher-level “Science Cloud Platforms”– e-Science Central is our attempt at this

• using the Azure Cloud we have a scalable system that can handle large new datasets

• the models are being made freely available– www.openqsar.com