quantitative medicine feb 2009
TRANSCRIPT
1
Quantitative medicineA “killer app” for grid
1
Ian Foster
Computation Institute
Argonne National Lab & University of Chicago
2
Thanks in particular to …
Carl Stephan Steve Ravi Jonathan Kesselman Erberich Tuecke Madduri Silverstein
3
Quantitative medicine is the key to reducing healthcare costs and improving healthcare outcomes
Patients with same diagnosis
4
Quantitative medicine is the key to reducing healthcare costs and improving healthcare outcomes
Patients with same diagnosis
Misdiagnosed
Non-responders,toxic responders
Non-toxic responders
5
Asthma Drugs 40-70%Beta-2-agonists
Hypertension Drugs 10-30%ACE Inhibitors
Heart Failure Drugs 15-25% Beta Blockers
Anti Depressants 20-50%SSRIs
Cholesterol Drugs 30-70% Statins
Major drugs ineffective for many…
Source: Amy Miller, Personalized Medicine Coalition
6
Patient ID Number
Danenberg
Tum
or
Pro
file
Sca
le
Colorectal cancer: clinical trial data Salonga et al. Clin Cancer Res 2000; 6: 1322-1327.
Same clinical disease, but different response to same chemotherapy,
depending on gene expression profile
7
8
The right treatment for the right person
at the right time
Trial and Error
Personalized MedicineCurrent Practice
Personalized medicine is quantitativePersonalized medicine is quantitative
One size fits all
One size fits all
Trial and error
Source: Amy Miller, Personalized Medicine Coalition
9
To realize the promise of quantitative medicine, we must break down barriers
to information sharing …
Discovering effective
personalized treatments
Determining the right
treatment for the individual
… and deliver new analytical tools to make sense of large quantities of data
10
Why it is hard?
A large, dispersed community
Huge quantities of data Great diversity of data Inadequate computing
capabilities Lack of a culture of
sharing Privacy concerns
Basic Research
Clinical Practice
Clinical Trials
trial subjects, outcomes
libraryOut
com
es,
tissu
e ba
nksc
reen
ing
test
s
ongoing
investigative
studies
pathways
11
Healthcare and infrastructure Increased recognition that information systems and data
understanding are limiting factor…much of the promise associated with health IT requires high levels
of adoption … and high levels of use of interoperable systems (in which information can be exchanged across unrelated systems) ….
RAND COMPARE
Health system is complex, adaptive systemThere is no single point(s) of control. System behaviors are often
unpredictable and uncontrollable, and no one is “in charge.”W Rouse, NAE Bridge
Need to blur boundary from research to clinical…I advocate … a model of virtual integration rather than true
vertical integration….
George Halvorson, CEO Kaiser
12
Virtual organizations
Grids and SOA
13
Children’s Oncology Group Grid
Globus
14
Childrens’ Oncology Gridclinical imaging trials (Erberich)
15
Wide-area medical interface service
Maps local medical workflow actions to wide area ops Image workflow, EHR, …
Transparently manages federation of Security Data replication and recovery Data discovery
En
terp
rise/G
ridIn
terfa
ce S
erv
ice
DICOM protocols
Grid protocols
(Web services)
DICOM
XDS
HL7
Vendor-specific
Wid
e A
rea
Serv
ice A
ctor
Plug-in adapters
16
17
US National Institutes of Health infrastructure activities
Biomedical Research Informatics Network (BIRN) National Center for Research Resources (NCRR) General infrastructure, with initial focus on
neuroscience applications
Cancer Biology Informatics Grid (caBIG) National Cancer Institute (NCI) Initial focus on the cancer research community;
BIGhealth initiative seeks to broaden it
Globus
18
19
ApplnService
Create
Index service
StoreRepository ServiceAdvertize
Discover
Invoke;get results
Introduce
Container
Transfer GAR
Deploy
Ohio State University and Argonne/U.Chicago
Service oriented medicine:caGrid, Introduce, and gRAVI
Introduce Define service Create skeleton Discover types Add operations Configure security
Grid Remote Application Virtualization Infrastructure Wrap executables
Globus
20
As of Oct19, 2008:
122 participants105 services
70 data35 analytical
21
Microarray clustering using Taverna
1. Query and retrieve microarray data from a caArray data service:cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub
2. Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService
1. Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage
Workflow in/output
caGrid services
“Shim” servicesothers
Wei Tan
22
Outsourcing analysis: caBIG’s geWorkbench/TeraGrid interface
R. Madduri, U.Chicago, Taverna team
23
Schizophrenia as a neuropsychiatric model (Potkin, UCI)
A brain illness with subtle structural and functional changes
Active area of imaging research with many competing theories and approaches
Progress hampered by Inconsistent data & lack
of replications Noncomparable imaging
techniques Small, diverse patient
populations
24
Multi-Site User Query
Data Provenance Information
Derived data processing
FIPS Results
FMRI/MRI Images Processing Pipelines
HIDB(s)(Distributed)
Data Grid
fMRI Scanner
Clinical Data Input
Functional BIRN (fBIRN) information integration vision
DICOM, NIFTIDICOM, NIFTI
25
FBIRN multi-site study, 2006
UNM
UMN
UI
UCI
BWHMGH
UCLA
UCSD
Stanford
= 3 or 4T site= 3 or 4T site
= 1.5T site= 1.5T site
= Development site= Development site
Duke/UNC
Yale
26
Lessons learned from BIRN (G. Farber)
There is little point in sharing data unless there is community agreement on how to standardize data collection
There continues to be a communications/ease of use gap between computer scientists and biomedical researchers
Sharing heterogeneous data from biomedical experiments is a challenge to existing data sharing infrastructures
Complex queries are a really hard problem
27
Health informatics services model
AnalysisAnalysis
ManagementManagement
IntegrationIntegration
PublicationPublication
Polic
y a
nd S
ecur
ityPo
licy
and
Sec
urity
Decision SupportDecision Support
RadiologyRadiology MedicalRecordsMedicalRecordsLabsLabsPathologyPathology GenomicsGenomics
ApplicationsApplications
Source: Carl Kesselman
28
Decision support for HIV drug ranking (Peter Sloot et. al)
29
Clinical Parameters: -weight
- opportunistic infections and tumors
-survival
Molecular DynamicsBinding Affinity
ProteinStructure& Binding
Affinity
VIROLABDRUG RANKING
DECISION SUPPORT
Text Mining Drugranking 1st order logic
Complex Networks Epidemics
Agent-Based Entry
SimulationPhenotype
CA Based Immune Response
Protease and RTmutations
30
Virolab: DSS Virtual Laboratory
Experimentdeveloper
Scientist ClinicalVirologist
ExperimentPlanning
Environment
Experiment scenario ViroLab
PortalVirtual Laboratory runtime components
(Required to select resources and execute experiment scenarios)
Computational services
(WS, WSRF, components, jobs)
Data services(DAS data sources, standalone databases)
Grids (EGEE), Clusters, Computers, Network
Users
Interfaces
Runtime
Services
Infrastructure
Drug RankingScenario
31
Many many tasks:Identifying potential drug targets
2M+ ligands Protein xtarget(s)
(Mike Kubal, Benoit Roux, and others)
32
start
report
DOCK6Receptor
(1 per protein:defines pocket
to bind to)
ZINC3-D
structures
ligands complexes
NAB scriptparameters
(defines flexibleresidues, #MDsteps)
Amber Score:1. AmberizeLigand
3. AmberizeComplex5. RunNABScript
end
BuildNABScript
NABScript
NABScript
Template
Amber prep:2. AmberizeReceptor4. perl: gen nabscript
FREDReceptor
(1 per protein:defines pocket
to bind to)
Manually prepDOCK6 rec file
Manually prepFRED rec file
1 protein(1MB)
6 GB2M
structures(6 GB)
DOCK6FRED ~4M x 60s x 1 cpu~60K cpu-hrs
Amber~10K x 20m x 1 cpu
~3K cpu-hrs
Select best ~500
~500 x 10hr x 100 cpu~500K cpu-hrsGCMC
PDBprotein
descriptions
Select best ~5KSelect best ~5K
For 1 target:4 million tasks
500,000 cpu-hrs(50 cpu-years)
33
DOCK on BG/P: ~1M tasks on 118,000 CPUs
CPU cores: 118784 Tasks: 934803 Elapsed time: 7257 sec Compute time: 21.43 CPU years Average task time: 667 sec Relative Efficiency: 99.7% (from 16 to 32 racks) Utilization:
Sustained: 99.6% Overall: 78.3%
• GPFS
• 1 script (~5KB)
• 2 file read (~10KB)
• 1 file write (~10KB)
• RAM (cached from GPFS on first task per node)
• 1 binary (~7MB)
• Static input data (~45MB)IoanRaicu
ZhaoZhang
MikeWilde
Time (secs)
34
NAE Grand Challenges
34
35
Computation Institutewww.ci.uchicago.edu www.ci.anl.gov
Thank you!