stephen friend koo foundation / sun yat-sen cancer center 2012-03-12
Post on 28-Nov-2014
768 Views
Preview:
DESCRIPTION
TRANSCRIPT
Moving beyond linear investigations Both of the science and of how we work
Integrating layers of omics data models and building compute spaces capable of enabling models
to be evolved by teams of teams
Koo Foundation / Sun Yat-Sen Cancer Center March12, 2012
Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization)
Seattle/ Beijing/ Amsterdam
So what is the problem?
Most approved therapies were assumed to be monotherapies for diseases represen4ng homogenous popula4ons
Our exis4ng disease models o9en assume pathway knowledge sufficient to infer correct therapies
Familiar but Incomplete
Reality: Overlapping Pathways
Explosion of Biological Genomic & Clinical Informa<on
• Computa<onal methods for integra<ng massive molecular and clinical datasets obtained across sizable popula<ons into predic<ve disease models can recapitulate complex biological systems
• Data should feed and refine a set of models that inform our understanding of disease causality as well as generate new mechanisms, targets, diagnos<cs and knowledge.
The value of appropriate representations/ maps
Equipment capable of generating massive amounts of data
“Data Intensive” Science- Fourth Scientific Paradigm
Open Information System
IT Interoperability
Host evolving computational models in a “Compute Space”
WHY NOT USE “DATA INTENSIVE” SCIENCE
TO BUILD BETTER DISEASE MAPS?
what will it take to understand disease?
DNA RNA PROTEIN (dark maWer)
MOVING BEYOND ALTERED COMPONENT LISTS
2002 Can one build a “causal” model?
Preliminary Probabalistic Models- Rosetta /Schadt
Gene symbol Gene name Variance of OFPM explained by gene expression*
Mouse model
Source
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11] C3ar1 Complement component
3a receptor 1 46% ko Purchased from Deltagen, CA
Tgfbr2 Transforming growth factor beta receptor 2
39% ko Purchased from Deltagen, CA
Networks facilitate direct identification of genes that are
causal for disease Evolutionarily tolerated weak spots
Nat Genet (2005) 205:370
Building Realistic, Predictive Models of Disease: Can this lead us from gene to drug?
"Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003) "Variations in DNA elucidate molecular networks that cause disease." Nature. (2008) "Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009) ….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
"Identification of pathways for atherosclerosis." Circ Res. (2007) "Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome
"Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005) “..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
"An integrative genomics approach to infer causal associations ...” Nat Genet. (2005) "Increasing the power to detect causal associations… “PLoS Comput Biol. (2007) "Integrating large-scale functional genomic data ..." Nat Genet. (2008) …… Plus 3 additional papers in PLoS Genet., BMC Genet.
d
Metabolic Disease
CVD
Bone
Methods
Extensive Publications now Substantiating Scientific Approach Probabilistic Causal Bionetwork Models
• >60 Publications from Rosetta Genetics Group (~30 scientists) over 5 years including high profile papers in PLoS Nature and Nature Genetics
50 network papers http://sagebase.org/research/resources.php
List of Influential Papers in Network Modeling
(Eric Schadt)
Equipment capable of generating massive amounts of data A-
“Data Intensive” Science- Fourth Scientific Paradigm Score Card for Medical Sciences
Open Information System D-
IT Interoperability D
Host evolving computational models in a “Compute Space F
.
We still consider much clinical research as if we were “hunter gathers”- not sharing
TENURE FEUDAL STATES
Clinical/genomic data are accessible but minimally usable
Little incentive to annotate and curate data for other scientists to use
Mathematical models of disease are not built to be
reproduced or versioned by others
Lack of standard forms for future rights and consents
Sage Mission
Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the elimination of human disease
Sagebase.org
Data Repository
Discovery Platform
Building Disease Maps
Commons Pilots
Sage Bionetworks Collaborators
Pharma Partners Merck, Pfizer, Takeda, Astra Zeneca, Amgen,Roche, Johnson &Johnson
27
Foundations Kauffman CHDI, Gates Foundation
Government NIH, LSDF, NCI
Academic Levy (Framingham) Rosengren (Lund) Krauss (CHORI)
Federation Ideker, Califano, Nolan, Schadt
A) Miller 159 samples B) Christos 189 samples
C) NKI 295 samples
D) Wang 286 samples
Cell cycle
Pre-mRNA
ECM
Immune response
Blood vessel
E) Super modules
Zhang B et al., Towards a global picture of breast cancer (manuscript).
28
NKI: N Engl J Med. 2002 Dec 19;347(25):1999.
Wang: Lancet. 2005 Feb 19-25;365(9460):671.
Miller: Breast Cancer Res. 2005;7(6):R953.
Christos: J Natl Cancer Inst. 2006 15;98(4):262.
Model of Breast Cancer: Co-expression
What is this?
Bayesian networks enriched in inflamma<on genes correlated with disease severity in pre-‐frontal cortex of 250 Alzheimer’s pa<ents.
What does it mean?
Inflamma<on in AD is an interac<ve mul<-‐pathway system. More broadly, network structure organizes complex disease effects into coherent sub-‐systems and can priori<ze key genes.
Are you joking?
Gene valida<on shows novel key drivers increase Abeta uptake and decrease neurite length through an ROS burst. (highly relevant to AD pathology)
CHRIS GAITERI-‐ALZHEIMER’S
RULES GOVERN
PLAT
FORM
NEW
MAP
S
PLATFORM Sage Platform and Infrastructure Builders-
( Academic Biotech and Industry IT Partners...)
PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots-
(Federation, CCSB, Inspire2Live....)
Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning
Leveraging Existing Technologies
Taverna
Addama
tranSMART
Watch What I Do, Not What I Say sage bionetworks synapse project
Most of the People You Need to Work with Don’t Work with You
sage bionetworks synapse project
My Other Computer is Cloudera Amazon Google
sage bionetworks synapse project
Sage Metagenomics Project
• > 10k genomic and expression standardized datasets indexed in SCR • Error detection, normalization in mG • Access raw or processed data via download or API in downstream analysis • Building towards open, continuous community curation
Processed Data (S3)
Sage Metagenomics using Amazon Simple Workflow
Full case study at http://aws.amazon.com/swf/testimonials/swfsagebio/
Synapse Roadmap
Q1-2012 Q2-2012 Q3-2012 Q4-2012 Q1-2013 Q2-2013
Synapse Platform Functionality
Data / Analysis Capabilities
Q3-2013 Q4-2013
Internal Alpha Public Beta Testing Synapse 1.0 Synapse 1.5 Future
• Data Repository • Projects and security • R integration • Analysis provenance
• Search • Controlled Vocabularies • Governance of restricted data
• 40+ manually curated clinical studies • 8000 + GEO / Array Express datasets • Clinical, genomic, compound sensitivity • Bioconductor and custom R analysis
• TCGA • METABRIC breast cancer challenge
• Workflow templates • Publishing figures • Wiki & collaboration tools • Integrated management of cloud resources
• Social networking • User-customized dashboards • R Studio integration • Curation tool integration
• Predictive modeling workflows • Automated processing of common genomics platforms
• TBD: Integrations with other visualization and analysis packages
CTCAP The Federa<on Portable Legal Consent Sage Congress Project
Four Pilots involving Sage Bionetworks
RULES GOVERN
PLAT
FORM
NEW
MAP
S
Clinical Trial Comparator Arm Partnership (CTCAP)
Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.
Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.
Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].
Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.
Started Sept 2010
Shared clinical/genomic data sharing and analysis will maximize clinical impact and enable discovery
• Graphic of curated to qced to models
The Federa<on
(Nolan and Haussler)
sage federation: model of biological age
Faster Aging
Slower Aging
Clinical Association - Gender - BMI - Disease Genotype Association Gene Pathway Expression Pr
edicted Age (liver expression)
Chronological Age (years)
Age Differential
Reproducible science==shareable science
Sweave: combines programmatic analysis with narrative
Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –
Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
Dynamic generation of statistical reports using literate data analysis
Federated Aging Project : Combining analysis + narra<ve
=Sweave Vignette Sage Lab
Califano Lab Ideker Lab
Shared Data Repository
JIRA: Source code repository & wiki
R code + narrative
PDF(plots + text + code snippets)
Data objects
HTML
Submitted Paper
TP53 mut
CDKN2A copy
MDM2 expr
HGF expr
CML linage EGFR mut
EGFR mut
EGFR mut
CML lineage
ERBB2 expr
BRAF mut
BRAF mut
NRAS mut
BRAF mut
NRAS mut
KRAS mut
BRAF mut
NRAS mut
KRAS mut
#1 BRAF mut
#2 NRAS mut #1 BRAF mut
#3 KRAS mut #2 NRAS mut #1 BRAF mut
#3 KRAS mut #2 NRAS mut #1 BRAF mut
#1 EGFR mut
#1 ERBB2 expr
#1 EGFR mut
#2 CML lineage #1 EGFR mut
#1 CML lineage
#1 HGF expr
#2 TP53 mut #3 CDKN2A copy #1 MDM2 expr
Can the approach make new discoveries?
For 11/12 compounds, the #1 predictive feature in an unbiased analysis corresponds to the known stratifier of sensitivity
48
Portable Legal Consent
Ac<va<ng Pa<ents
Enabling Sharing
Becoming Partners
(John Wilbanks)
weconsent.us
Sage Congress Project April 20 2012
RealNames Parkinson’s Project Fanconi’s Anemia
Revisi<ng Breast Cancer Prognosis
(Responders Compe<<ons-‐ IBM-‐DREAM)
Networking Disease Model Building
top related