quantitative medicine feb 2009

1

Quantitative medicineA “killer app” for grid

1

Ian Foster

Computation Institute

Argonne National Lab & University of Chicago

2

Thanks in particular to …

Carl Stephan Steve Ravi Jonathan Kesselman Erberich Tuecke Madduri Silverstein

3

Quantitative medicine is the key to reducing healthcare costs and improving healthcare outcomes

Patients with same diagnosis

4

Quantitative medicine is the key to reducing healthcare costs and improving healthcare outcomes

Patients with same diagnosis

Misdiagnosed

Non-responders,toxic responders

Non-toxic responders

5

Asthma Drugs 40-70%Beta-2-agonists

Hypertension Drugs 10-30%ACE Inhibitors

Heart Failure Drugs 15-25% Beta Blockers

Anti Depressants 20-50%SSRIs

Cholesterol Drugs 30-70% Statins

Major drugs ineffective for many…

Source: Amy Miller, Personalized Medicine Coalition

6

Patient ID Number

Danenberg

Tum

or

Pro

file

Sca

le

Colorectal cancer: clinical trial data Salonga et al. Clin Cancer Res 2000; 6: 1322-1327.

Same clinical disease, but different response to same chemotherapy,

depending on gene expression profile

8

The right treatment for the right person

at the right time

Trial and Error

Personalized MedicineCurrent Practice

Personalized medicine is quantitativePersonalized medicine is quantitative

One size fits all

One size fits all

Trial and error

Source: Amy Miller, Personalized Medicine Coalition

9

To realize the promise of quantitative medicine, we must break down barriers

to information sharing …

Discovering effective

personalized treatments

Determining the right

treatment for the individual

… and deliver new analytical tools to make sense of large quantities of data

10

Why it is hard?

A large, dispersed community

Huge quantities of data Great diversity of data Inadequate computing

capabilities Lack of a culture of

sharing Privacy concerns

Basic Research

Clinical Practice

Clinical Trials

trial subjects, outcomes

libraryOut

com

es,

tissu

e ba

nksc

reen

ing

test

s

ongoing

investigative

studies

pathways

11

Healthcare and infrastructure Increased recognition that information systems and data

understanding are limiting factor…much of the promise associated with health IT requires high levels

of adoption … and high levels of use of interoperable systems (in which information can be exchanged across unrelated systems) ….

RAND COMPARE

Health system is complex, adaptive systemThere is no single point(s) of control. System behaviors are often

unpredictable and uncontrollable, and no one is “in charge.”W Rouse, NAE Bridge

Need to blur boundary from research to clinical…I advocate … a model of virtual integration rather than true

vertical integration….

George Halvorson, CEO Kaiser

12

Virtual organizations

Grids and SOA

13

Children’s Oncology Group Grid

Globus

14

Childrens’ Oncology Gridclinical imaging trials (Erberich)

15

Wide-area medical interface service

Maps local medical workflow actions to wide area ops Image workflow, EHR, …

Transparently manages federation of Security Data replication and recovery Data discovery

En

terp

rise/G

ridIn

terfa

ce S

erv

ice

DICOM protocols

Grid protocols

(Web services)

DICOM

XDS

HL7

Vendor-specific

Wid

e A

rea

Serv

ice A

ctor

Plug-in adapters

17

US National Institutes of Health infrastructure activities

Biomedical Research Informatics Network (BIRN) National Center for Research Resources (NCRR) General infrastructure, with initial focus on

neuroscience applications

Cancer Biology Informatics Grid (caBIG) National Cancer Institute (NCI) Initial focus on the cancer research community;

BIGhealth initiative seeks to broaden it

Globus

19

ApplnService

Create

Index service

StoreRepository ServiceAdvertize

Discover

Invoke;get results

Introduce

Container

Transfer GAR

Deploy

Ohio State University and Argonne/U.Chicago

Service oriented medicine:caGrid, Introduce, and gRAVI

Introduce Define service Create skeleton Discover types Add operations Configure security

Grid Remote Application Virtualization Infrastructure Wrap executables

Globus

20

As of Oct19, 2008:

122 participants105 services

70 data35 analytical

21

Microarray clustering using Taverna

1. Query and retrieve microarray data from a caArray data service:cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub

2. Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService

1. Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage

Workflow in/output

caGrid services

“Shim” servicesothers

Wei Tan

22

Outsourcing analysis: caBIG’s geWorkbench/TeraGrid interface

R. Madduri, U.Chicago, Taverna team

23

Schizophrenia as a neuropsychiatric model (Potkin, UCI)

A brain illness with subtle structural and functional changes

Active area of imaging research with many competing theories and approaches

Progress hampered by Inconsistent data & lack

of replications Noncomparable imaging

techniques Small, diverse patient

populations

24

Multi-Site User Query

Data Provenance Information

Derived data processing

FIPS Results

FMRI/MRI Images Processing Pipelines

HIDB(s)(Distributed)

Data Grid

fMRI Scanner

Clinical Data Input

Functional BIRN (fBIRN) information integration vision

DICOM, NIFTIDICOM, NIFTI

25

FBIRN multi-site study, 2006

UNM

UMN

UI

UCI

BWHMGH

UCLA

UCSD

Stanford

= 3 or 4T site= 3 or 4T site

= 1.5T site= 1.5T site

= Development site= Development site

Duke/UNC

Yale

26

Lessons learned from BIRN (G. Farber)

There is little point in sharing data unless there is community agreement on how to standardize data collection

There continues to be a communications/ease of use gap between computer scientists and biomedical researchers

Sharing heterogeneous data from biomedical experiments is a challenge to existing data sharing infrastructures

Complex queries are a really hard problem

27

Health informatics services model

AnalysisAnalysis

ManagementManagement

IntegrationIntegration

PublicationPublication

Polic

y a

nd S

ecur

ityPo

licy

and

Sec

urity

Decision SupportDecision Support

RadiologyRadiology MedicalRecordsMedicalRecordsLabsLabsPathologyPathology GenomicsGenomics

ApplicationsApplications

Source: Carl Kesselman

28

Decision support for HIV drug ranking (Peter Sloot et. al)

29

Clinical Parameters: -weight

- opportunistic infections and tumors

-survival

Molecular DynamicsBinding Affinity

ProteinStructure& Binding

Affinity

VIROLABDRUG RANKING

DECISION SUPPORT

Text Mining Drugranking 1st order logic

Complex Networks Epidemics

Agent-Based Entry

SimulationPhenotype

CA Based Immune Response

Protease and RTmutations

30

Virolab: DSS Virtual Laboratory

Experimentdeveloper

Scientist ClinicalVirologist

ExperimentPlanning

Environment

Experiment scenario ViroLab

PortalVirtual Laboratory runtime components

(Required to select resources and execute experiment scenarios)

Computational services

(WS, WSRF, components, jobs)

Data services(DAS data sources, standalone databases)

Grids (EGEE), Clusters, Computers, Network

Users

Interfaces

Runtime

Services

Infrastructure

Drug RankingScenario

31

Many many tasks:Identifying potential drug targets

2M+ ligands Protein xtarget(s)

(Mike Kubal, Benoit Roux, and others)

32

start

report

DOCK6Receptor

(1 per protein:defines pocket

to bind to)

ZINC3-D

structures

ligands complexes

NAB scriptparameters

(defines flexibleresidues, #MDsteps)

Amber Score:1. AmberizeLigand

3. AmberizeComplex5. RunNABScript

end

BuildNABScript

NABScript

NABScript

Template

Amber prep:2. AmberizeReceptor4. perl: gen nabscript

FREDReceptor

(1 per protein:defines pocket

to bind to)

Manually prepDOCK6 rec file

Manually prepFRED rec file

1 protein(1MB)

6 GB2M

structures(6 GB)

DOCK6FRED ~4M x 60s x 1 cpu~60K cpu-hrs

Amber~10K x 20m x 1 cpu

~3K cpu-hrs

Select best ~500

~500 x 10hr x 100 cpu~500K cpu-hrsGCMC

PDBprotein

descriptions

Select best ~5KSelect best ~5K

For 1 target:4 million tasks

500,000 cpu-hrs(50 cpu-years)

33

DOCK on BG/P: ~1M tasks on 118,000 CPUs

CPU cores: 118784 Tasks: 934803 Elapsed time: 7257 sec Compute time: 21.43 CPU years Average task time: 667 sec Relative Efficiency: 99.7% (from 16 to 32 racks) Utilization:

Sustained: 99.6% Overall: 78.3%

• GPFS

• 1 script (~5KB)

• 2 file read (~10KB)

• 1 file write (~10KB)

• RAM (cached from GPFS on first task per node)

• 1 binary (~7MB)

• Static input data (~45MB)IoanRaicu

ZhaoZhang

MikeWilde

Time (secs)

34

NAE Grand Challenges

34

35

Computation Institutewww.ci.uchicago.edu www.ci.anl.gov

Thank you!

quantitative medicine feb 2009

Business

caarray data service

data understanding

service oriented medicine

chicago appln service

information systems

healthcare outcomes

healthcare costs

clinical trial datasalonga