stephen friend koo foundation / sun yat-sen cancer center 2012-03-12

Post on 28-Nov-2014

768 Views

Category:

Health & Medicine

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Stephen Friend, Mar 12, 2012. Koo Foundation / Sun Yat-Sen Cancer Center, Taipei, Taiwan

TRANSCRIPT

Moving beyond linear investigations Both of the science and of how we work

Integrating layers of omics data models and building compute spaces capable of enabling models

to be evolved by teams of teams

Koo Foundation / Sun Yat-Sen Cancer Center March12, 2012

Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization)

Seattle/ Beijing/ Amsterdam

So  what  is  the  problem?  

     Most  approved  therapies  were  assumed  to  be  monotherapies  for  diseases  represen4ng  homogenous  popula4ons  

 Our  exis4ng  disease  models  o9en  assume  pathway  knowledge  sufficient  to  infer  correct  therapies  

Familiar but Incomplete

Reality: Overlapping Pathways

Explosion  of  Biological  Genomic  &  Clinical  Informa<on  

•  Computa<onal  methods  for  integra<ng  massive  molecular  and  clinical  datasets  obtained  across  sizable  popula<ons  into  predic<ve  disease  models  can  recapitulate  complex  biological  systems  

•  Data  should  feed  and  refine  a  set  of  models  that  inform  our  understanding  of  disease  causality  as  well  as  generate  new  mechanisms,  targets,  diagnos<cs  and  knowledge.  

The value of appropriate representations/ maps

Equipment capable of generating massive amounts of data

“Data Intensive” Science- Fourth Scientific Paradigm

Open Information System

IT Interoperability

Host evolving computational models in a “Compute Space”

WHY  NOT  USE    “DATA  INTENSIVE”  SCIENCE  

TO  BUILD  BETTER  DISEASE  MAPS?  

what will it take to understand disease?

                   DNA    RNA  PROTEIN  (dark  maWer)    

MOVING  BEYOND  ALTERED  COMPONENT  LISTS  

2002 Can one build a “causal” model?

Preliminary Probabalistic Models- Rosetta /Schadt

Gene symbol Gene name Variance of OFPM explained by gene expression*

Mouse model

Source

Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg

Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]

Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple

(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg

(Columbia University, NY) [11] C3ar1 Complement component

3a receptor 1 46% ko Purchased from Deltagen, CA

Tgfbr2 Transforming growth factor beta receptor 2

39% ko Purchased from Deltagen, CA

Networks facilitate direct identification of genes that are

causal for disease Evolutionarily tolerated weak spots

Nat Genet (2005) 205:370

Building Realistic, Predictive Models of Disease: Can this lead us from gene to drug?

"Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003) "Variations in DNA elucidate molecular networks that cause disease." Nature. (2008) "Genetics of gene expression and its effect on disease." Nature. (2008)

"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009) ….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc

"Identification of pathways for atherosclerosis." Circ Res. (2007) "Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)

…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome

"Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005) “..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)

"An integrative genomics approach to infer causal associations ...” Nat Genet. (2005) "Increasing the power to detect causal associations… “PLoS Comput Biol. (2007) "Integrating large-scale functional genomic data ..." Nat Genet. (2008) …… Plus 3 additional papers in PLoS Genet., BMC Genet.

d  

Metabolic Disease

CVD

Bone

Methods

Extensive Publications now Substantiating Scientific Approach Probabilistic Causal Bionetwork Models

• >60 Publications from Rosetta Genetics Group (~30 scientists) over 5 years including high profile papers in PLoS Nature and Nature Genetics

  50 network papers   http://sagebase.org/research/resources.php

List of Influential Papers in Network Modeling

(Eric Schadt)

Equipment capable of generating massive amounts of data A-

“Data Intensive” Science- Fourth Scientific Paradigm Score Card for Medical Sciences

Open Information System D-

IT Interoperability D

Host evolving computational models in a “Compute Space F

.

We still consider much clinical research as if we were “hunter gathers”- not sharing

 TENURE      FEUDAL  STATES      

Clinical/genomic data are accessible but minimally usable

Little incentive to annotate and curate data for other scientists to use

Mathematical models of disease are not built to be

reproduced or versioned by others

Lack of standard forms for future rights and consents

Sage Mission

Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by

contributor scientists with a shared vision to accelerate the elimination of human disease

Sagebase.org

Data Repository

Discovery Platform

Building Disease Maps

Commons Pilots

Sage Bionetworks Collaborators

  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca, Amgen,Roche, Johnson &Johnson

27

  Foundations   Kauffman CHDI, Gates Foundation

  Government   NIH, LSDF, NCI

  Academic   Levy (Framingham)   Rosengren (Lund)   Krauss (CHORI)

  Federation   Ideker, Califano, Nolan, Schadt

A) Miller 159 samples B) Christos 189 samples

C) NKI 295 samples

D) Wang 286 samples

Cell cycle

Pre-mRNA

ECM

Immune response

Blood vessel

E) Super modules

Zhang B et al., Towards a global picture of breast cancer (manuscript).

28

NKI: N Engl J Med. 2002 Dec 19;347(25):1999.

Wang: Lancet. 2005 Feb 19-25;365(9460):671.

Miller: Breast Cancer Res. 2005;7(6):R953.

Christos: J Natl Cancer Inst. 2006 15;98(4):262.

Model of Breast Cancer: Co-expression

What  is  this?  

Bayesian  networks  enriched  in  inflamma<on  genes    correlated  with  disease  severity  in  pre-­‐frontal  cortex  of  250  Alzheimer’s  pa<ents.  

What  does  it  mean?  

Inflamma<on    in  AD  is  an  interac<ve  mul<-­‐pathway  system.    More  broadly,  network  structure  organizes  complex  disease  effects  into  coherent  sub-­‐systems  and  can  priori<ze  key  genes.  

Are  you  joking?  

Gene  valida<on  shows  novel  key  drivers  increase  Abeta  uptake  and  decrease  neurite  length  through  an  ROS  burst.  (highly  relevant  to  AD  pathology)  

CHRIS  GAITERI-­‐ALZHEIMER’S  

RULES GOVERN

PLAT

FORM

NEW

MAP

S

PLATFORM Sage Platform and Infrastructure Builders-

( Academic Biotech and Industry IT Partners...)

PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots-

(Federation, CCSB, Inspire2Live....)

Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning

Leveraging Existing Technologies

Taverna

Addama

tranSMART

Watch What I Do, Not What I Say sage bionetworks synapse project

Most of the People You Need to Work with Don’t Work with You

sage bionetworks synapse project

My Other Computer is Cloudera Amazon Google

sage bionetworks synapse project

Sage Metagenomics Project

•  > 10k genomic and expression standardized datasets indexed in SCR •  Error detection, normalization in mG •  Access raw or processed data via download or API in downstream analysis •  Building towards open, continuous community curation

Processed Data (S3)

Sage Metagenomics using Amazon Simple Workflow

Full case study at http://aws.amazon.com/swf/testimonials/swfsagebio/

Synapse Roadmap

Q1-2012 Q2-2012 Q3-2012 Q4-2012 Q1-2013 Q2-2013

Synapse Platform Functionality

Data / Analysis Capabilities

Q3-2013 Q4-2013

Internal Alpha Public Beta Testing Synapse 1.0 Synapse 1.5 Future

•  Data Repository •  Projects and security •  R integration •  Analysis provenance

• Search • Controlled Vocabularies • Governance of restricted data

•  40+ manually curated clinical studies •  8000 + GEO / Array Express datasets •  Clinical, genomic, compound sensitivity •  Bioconductor and custom R analysis

• TCGA •  METABRIC breast cancer challenge

•  Workflow templates •  Publishing figures •  Wiki & collaboration tools •  Integrated management of cloud resources

•  Social networking •  User-customized dashboards •  R Studio integration •  Curation tool integration

•  Predictive modeling workflows •  Automated processing of common genomics platforms

•  TBD: Integrations with other visualization and analysis packages

CTCAP  The  Federa<on  Portable  Legal  Consent  Sage  Congress  Project  

Four  Pilots  involving  Sage  Bionetworks  

RULES GOVERN

PLAT

FORM

NEW

MAP

S

Clinical Trial Comparator Arm Partnership (CTCAP)

  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.

  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.

  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].

  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.

Started Sept 2010

Shared clinical/genomic data sharing and analysis will maximize clinical impact and enable discovery

•  Graphic  of  curated  to  qced  to  models  

The  Federa<on  

(Nolan  and  Haussler)  

sage federation: model of biological age

Faster Aging

Slower Aging

Clinical Association -  Gender -  BMI -  Disease Genotype Association Gene Pathway Expression Pr

edicted  Age  (liver  expression)  

Chronological  Age  (years)  

Age Differential

Reproducible  science==shareable  science  

Sweave: combines programmatic analysis with narrative

Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –

Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9

Dynamic generation of statistical reports using literate data analysis

Federated  Aging  Project  :    Combining  analysis  +  narra<ve    

=Sweave Vignette Sage Lab

Califano Lab Ideker Lab

Shared  Data  Repository  

JIRA:  Source  code  repository  &  wiki  

R code + narrative

PDF(plots + text + code snippets)

Data objects

HTML

Submitted Paper

TP53 mut

CDKN2A copy

MDM2 expr

HGF expr

CML linage EGFR mut

EGFR mut

EGFR mut

CML lineage

ERBB2 expr

BRAF mut

BRAF mut

NRAS mut

BRAF mut

NRAS mut

KRAS mut

BRAF mut

NRAS mut

KRAS mut

#1  BRAF  mut  

#2  NRAS  mut  #1  BRAF  mut  

#3  KRAS  mut  #2  NRAS  mut  #1  BRAF  mut  

#3  KRAS  mut  #2  NRAS  mut  #1  BRAF  mut  

#1  EGFR  mut  

#1  ERBB2  expr  

#1  EGFR  mut  

#2  CML  lineage  #1  EGFR  mut  

#1  CML  lineage  

#1  HGF  expr  

#2  TP53  mut  #3  CDKN2A  copy  #1  MDM2  expr  

Can  the  approach  make  new  discoveries?  

For 11/12 compounds, the #1 predictive feature in an unbiased analysis corresponds to the known stratifier of sensitivity

48  

Portable  Legal  Consent  

Ac<va<ng  Pa<ents  

Enabling  Sharing  

Becoming  Partners  

(John  Wilbanks)  

weconsent.us  

Sage  Congress  Project  April  20  2012  

RealNames  Parkinson’s  Project  Fanconi’s  Anemia  

Revisi<ng  Breast  Cancer  Prognosis  

(Responders  Compe<<ons-­‐  IBM-­‐DREAM)  

Networking  Disease  Model  Building  

top related