computational challenges in precision medicine and genomics

55
COMPUTATIONAL CHALLENGES IN PRECISION MEDICINE AND GENOMICS GARY BADER WWW.BADERLAB.ORG GOOGLE WATERLOO, JUNE 9, 2014

Upload: gary-bader

Post on 07-May-2015

515 views

Category:

Science


4 download

DESCRIPTION

Genomics is mapping complex data about human biology and promises major medical advances. In particular, genomics is enabling precision medicine, the use of a patient's genome and physiological state to improve therapeutic efficacy and outcome. However, routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with "Big data". These data are so complex and large that typical researchers are not able to cope with them. Collectively, these data require an understanding of many aspects of experimental biology and medicine to correctly process and interpret. Data size is also an issue, as individual researchers may need to handle tens of terabytes (genomes from a few hundred patients), which is challenging to download and store on typical workstations. To effectively support precision medicine, scientists from a wide range of disciplines, including computer science, must develop algorithms to improve precision medicine (e.g. diagnostics and prognostics), genome interpretation, raw data processing and secure high performance computing.

TRANSCRIPT

Page 1: Computational challenges in precision medicine and genomics

COMPUTATIONAL CHALLENGES IN PRECISION MEDICINE AND GENOMICS

GARY BADER WWW.BADERLAB.ORG

GOOGLE WATERLOO, JUNE 9, 2014

Page 2: Computational challenges in precision medicine and genomics

PRECISION MEDICINE

•  TRADITIONAL MEDICINE, WITH MORE DATA •  DIAGNOSIS: ASSIGNING PATIENTS TO GROUPS

–  BIOLOGY, DISEASE PROGRESSION, TREATMENT RESPONSE

•  PERSONALIZED, BUT NOT EVERYONE HAS A DIFFERENT DISEASE

NATURE MEDICINE 19, 249 (2013) DOI:10.1038/NM0313-249

Page 3: Computational challenges in precision medicine and genomics

NATIONAL COMPREHENSIVE CANCER NETWORK (NCCN)

Breast Cancer

Noninvasive Invasive

Lobular Carcinoma In Situ

Ductal Carcinoma In Situ Lobular Carcinoma Ductal Carcinoma Inflammatory

Page 4: Computational challenges in precision medicine and genomics

IMPROVING PRECISION WITH GENOMICS

•  BRCA1/BRCA2 MUTATIONS PREDICT RISK •  COMMERCIAL PROGNOSTIC TESTS BASED ON GENE

SIGNATURES

HTTP://THEBIGCANDME.BLOGSPOT.CA/

Page 5: Computational challenges in precision medicine and genomics

GENOMICS

•  NEW TECHNOLOGY FOR READING/WRITING DNA •  MEASURE OUR GENETIC CODE AND SYSTEM STATE

•  LOTS OF VARIABLES – WHOLE GENOME, TRANSCRIPT AND PROTEIN

EXPRESSION, SPLICING, CHROMATIN STRUCTURE, MOLECULAR INTERACTION, TRANSCRIPTION FACTOR, METHYLATION, METABOLITE, PATIENT PHENOTYPE

Page 6: Computational challenges in precision medicine and genomics

2  

Page 7: Computational challenges in precision medicine and genomics

HTTP://WWW.LHSC.ON.CA/  

SOURCE CODE ON DISK

LOAD TO ACTIVE MEMORY

COMPILER

RUNNING SOFTWARE

ACTIVE MEMORY

4 LETTER CODE (DNA/RNA BASES) 20 LETTER CODE (AMINO ACIDS) MEEPQSDPSVEPPLSQETFSDLWKLLPEN… GATGGGATTGGGGTTTTCCCCTCCCAT…

Page 8: Computational challenges in precision medicine and genomics

A PROTEIN IS A MOLECULAR MACHINE

Page 9: Computational challenges in precision medicine and genomics

DNA SEQUENCING

•  RECENT MASSIVE BREAKTHROUGH

•  CURRENT TECH: – ~10 HUMAN GENOMES,

1TB DATA/6 DAY RUN

ILLUMINA,  GEORGE  CHURCH  

Page 10: Computational challenges in precision medicine and genomics

FEB. 1, 2013: DR. LEE HOOD RECEIVES HIS NATIONAL MEDAL OF SCIENCE FROM PRESIDENT OBAMA AT WHITE HOUSE CEREMONY

Page 11: Computational challenges in precision medicine and genomics

MORE BREAKTHROUGHS COMING

WWW.NANOPORETECH.COM

20-NODE INSTALLATION = COMPLETE HUMAN GENOME IN 15 MINUTES

MINION = USB CONNECTION, MINIMAL SAMPLE PREPARATION, $1000 DEVICE + CONSUMABLES

Page 12: Computational challenges in precision medicine and genomics

WHERE DOES THE DATA COME FROM?

BARODA,  INDIA  

TORONTO,  CANADA  VERMONT,  USA  

CAMBRIDGE,  UK  

MOLECULAR  BIOLOGY  LABS  AROUND  THE  WORLD  

Page 13: Computational challenges in precision medicine and genomics

BGI,  >160  MACHINES    

THE  FACTORY  

Page 14: Computational challenges in precision medicine and genomics

COMPUTING NEEDS: 1 HUMAN GENOME

•  ~125 BASE READ LENGTH X MILLIONS •  >30X COVERAGE

•  ALIGNMENT TO REFERENCE GENOME

•  COMPUTE VARIANTS (MUTATIONS)

•  ANNOTATE VARIANTS

•  COMPUTE TIME: UP TO 2 DAYS/GENOME – OPTIMIZED 4 HOURS: 128G/2CPU/SSD, 3.1GHZ

•  MEDICALLY IMPORTANT TO BE FAST

Page 15: Computational challenges in precision medicine and genomics

THE POWER OF GENOMICS IN MEDICINE

•  7000 RARE MONOGENIC DISEASES – 50% HAVE A KNOWN GENE RESPONSIBLE – QUADRUPLED RATE OF IDENTIFICATION SINCE 2012

•  BRAIN DOPAMINE-SEROTONIN VESICULAR TRANSPORT DISEASE AND ITS TREATMENT – TWO YEARS FROM DISEASE DEFINITION TO GENE

IDENTIFICATION TO TREATMENT

NAT REV GENET. 2013 OCT;14(10):681-91 N ENGL J MED. 2013 FEB 7;368(6):543-50

Page 16: Computational challenges in precision medicine and genomics
Page 17: Computational challenges in precision medicine and genomics

NON-INVASIVE PRENATAL TEST

HTTP://WWW.PANORAMATEST.COM/

Page 18: Computational challenges in precision medicine and genomics

CANCER GENOMICS

•  GERM LINE VS. SOMATIC MUTATIONS •  AIM: IDENTIFY FREQUENT MUTATIONS IN CANCER

•  >11,000 TUMOUR GENOMES, 9M MUTATIONS

HUMAN COLORECTAL CARCINOMA

HTTPS://DCC.ICGC.ORG/  

Page 19: Computational challenges in precision medicine and genomics

COMPUTING CHALLENGES

•  EXPONENTIAL DATA GROWTH (>MOORE’S LAW) – BILLIONS OF GENOMES – SIZE: >100GB/HUMAN GENOME, 4GB PROCESSED,

MBS (JUST MUTATIONS)

•  HETEROGENEOUS, NOISY, COMPLEX DATA – DATA SCIENTISTS, DOMAIN EXPERTS

Page 20: Computational challenges in precision medicine and genomics

COMPUTING WILL TRANSFORM MEDICINE

GOAL: IMPROVE PATIENT OUTCOME

Page 21: Computational challenges in precision medicine and genomics

COMPUTATIONAL BIOLOGY

•  RESEARCH: USING COMPUTERS TO ANSWER BIOLOGICAL/BIOMEDICAL QUESTIONS

•  EXPLORE, INTERPRET AND DISCOVER: SEARCH •  SPEED AND ACCURACY: ALGORITHMS •  PREDICTING FUNCTIONAL MUTATIONS, PATIENT

CLASSIFICATION: MACHINE LEARNING •  PRIVACY: DIFFERENTIAL PRIVACY, ENCRYPTION •  USABLE APPLICATIONS: SOFTWARE ENGINEERING

Page 22: Computational challenges in precision medicine and genomics

MedSavant search engine for genetic variants

WWW.MEDSAVANT.COM

Developers: Marc Fiume, James Vlasblom, Ron Ammar, Orion Buske, Eric Smith, Andrew Brook, Misko Dzamba, Khushi Chachcha, Sergiu Dumitriu Scientific Advisors: Christian Marshall, Kym Boycott, Marta Girdea, Peter Ray, Gary Bader, Michael Brudno

Page 23: Computational challenges in precision medicine and genomics

WWW.MEDSAVANT.COM

Page 24: Computational challenges in precision medicine and genomics

GENOMIC READS

GENOMIC VARIANTS

MARC FIUME, MIKE BRUDNO

Page 25: Computational challenges in precision medicine and genomics

GENOMIC READS

GENOMIC VARIANTS

MARC FIUME, MIKE BRUDNO

Page 26: Computational challenges in precision medicine and genomics

GLIOBLASTOMA MULTIFORME (N=215)

GOLDENBERG, BRUDNO NATURE METHODS, 2014 IDENTIFY DISEASE SUBTYPE

SURVIVAL

CLUSTERING

SPEED

DATA FUSION (NON-LINEAR, MESSAGE PASSING), UNSUPERVISED CLUSTERING

Page 27: Computational challenges in precision medicine and genomics

PREDICT TREATMENT RESPONSE •  SUPERVISED MACHINE LEARNING E.G. RHEUMATOID

ARTHRITIS METHOTREXATE RESPONSE

B

New

A

A

BB

B

A

Personal Medical Network

Responder

Non-Responder

NewNew patient(PredictedNon-Responder)

Weakly similar

Highly similar

Response to treatment

A

Similar e.g. SNP, smoking status

SHIRLEY HUI, RUTH ISSERLIN, HUSSAM KACA, TABITHA KUNG, KATHY SIMINOVITCH  

Page 28: Computational challenges in precision medicine and genomics

EXPLAINING GENOMICS DATA

•  SNAPSHOTS OF SYSTEM STATE –  E.G. CANCER VS. NORMAL

•  EXPLAIN WHY STATES DIFFER –  E.G. REGULATOR PERTURBATION

– CAUSAL MODELING – PRIOR KNOWLEDGE ABOUT

MECHANISM: PATHWAYS

WITT H ET AL. CANCER CELL. 2011 AUG 16;20(2):143-57

Page 29: Computational challenges in precision medicine and genomics

GENOME++ MOLECULAR, PHYSIOLOGICAL

PHENOTYPE

ENCODES EXPLAINS

ENVIRONMENT CELL MECHANISM

THE HUMAN BODY

•  A WETWARE COMPUTING SYSTEM

MODULATES

Page 30: Computational challenges in precision medicine and genomics

A PROTEIN IS A MOLECULAR MACHINE

Page 31: Computational challenges in precision medicine and genomics
Page 32: Computational challenges in precision medicine and genomics

1 INTERACTION (EDGE)

Page 33: Computational challenges in precision medicine and genomics

HO ET AL. NATURE 415(6868) 2002

Page 34: Computational challenges in precision medicine and genomics

LOGIC CIRCUIT (PATHWAY)

HTTP://DISCOVER.NCI.NIH.GOV/KOHNK/INTERACTION_MAPS.HTML

Page 35: Computational challenges in precision medicine and genomics

THE CELL

Page 36: Computational challenges in precision medicine and genomics

ALAIN VIEL, HARVARD UNIVERSITY, 2007

Page 37: Computational challenges in precision medicine and genomics

HTTP://WWW.ENDOSZKOP.COM/

~40 TRILLION CELLS, +TRILLIONS OF MICROBES (PARALLEL PROCESSING)

BIANCONI ET AL. ANN HUM BIOL. 2013 NOV-DEC;40(6):471

Page 38: Computational challenges in precision medicine and genomics

Microtubule

Cytoskeleton

Cell Projection

& Cell Motility

Cell Proliferation

Glycosylation

Adhesion

Regulation of GTPase

Kinase Activity/Regulation

CNS Development

Intellectual

Disability

Autism

GTPase/Ras

Signaling

Regulation of cell proliferation

Positive regulation of cell proliferation

Tyrosin kinase

Vasculature develepment

Palate develepment

Organ Morphogenesis

Behavior

Heart develepment

RHO Ras

Membrane

Kinase regulation

Cell Motility

(stricter cluster)

Centrosome

Nucleolus

Cell cycle

Regulation of

hormone levels

Aminoacid

derivative /

amine

metabolism

Synaptic vescicle maturation

Reelin pathway

LIS1 in neuronal

migration and

development

Negative

regulation

of cell cycle

cKIT

pathwaymTor

pathway

Zn finger

domain

Carboxyl

esterase

domain

Ras signaling GTPase regulator

Neuron

migration

Cell Motility

(stricter cluster)

Cell morphogenesis

Cell projection

organization

CNS

development

Brain

development

Neurite development

CNS neuron

differentiation

AxonogenesisProjection neuron

axonogenesis

Cerebral cortex

cell migration

SMC flexible hinge domain

Urea and amine group metabolism

MHC-I

Zoom of CNS-Development

ID ID

ASDASD

Both

0%

12.5%

Enrichedin deletions

FDR

Known disease genes

Enriched onlyin disease genes

Node type (gene-set)

Edge type (gene-set overlap)

From disease genesto enriched gene-sets

Between gene-setsenriched in deletions

Between sets enriched in deletions and in diseasegenes or between diseasesets only

Pinto  et  al.  FuncJonal  impact  of  global  rare  copy  number  variaJon  in  auJsm  spectrum  disorders.  Nature.  2010  Jun  9.  

Page 39: Computational challenges in precision medicine and genomics

Microtubule

Cytoskeleton

Cell Projection

& Cell Motility

Cell Proliferation

Glycosylation

Adhesion

Regulation of GTPase

Kinase Activity/Regulation

CNS Development

Intellectual

Disability

Autism

GTPase/Ras

Signaling

Regulation of cell proliferation

Positive regulation of cell proliferation

Tyrosin kinase

Vasculature develepment

Palate develepment

Organ Morphogenesis

Behavior

Heart develepment

RHO Ras

Membrane

Kinase regulation

Cell Motility

(stricter cluster)

Centrosome

Nucleolus

Cell cycle

Regulation of

hormone levels

Aminoacid

derivative /

amine

metabolism

Synaptic vescicle maturation

Reelin pathway

LIS1 in neuronal

migration and

development

Negative

regulation

of cell cycle

cKIT

pathwaymTor

pathway

Zn finger

domain

Carboxyl

esterase

domain

Ras signaling GTPase regulator

Neuron

migration

Cell Motility

(stricter cluster)

Cell morphogenesis

Cell projection

organization

CNS

development

Brain

development

Neurite development

CNS neuron

differentiation

AxonogenesisProjection neuron

axonogenesis

Cerebral cortex

cell migration

SMC flexible hinge domain

Urea and amine group metabolism

MHC-I

Zoom of CNS-Development

ID ID

ASDASD

Both

0%

12.5%

Enrichedin deletions

FDR

Known disease genes

Enriched onlyin disease genes

Node type (gene-set)

Edge type (gene-set overlap)

From disease genesto enriched gene-sets

Between gene-setsenriched in deletions

Between sets enriched in deletions and in diseasegenes or between diseasesets only

Page 40: Computational challenges in precision medicine and genomics

PATIENT #1 PATIENT #2 PATIENT #3 PATIENT #I

PATHWAYGSI CNV-AFFECTED GENE

COUNT = 1 COUNT = 1 COUNT = 1 COUNT = 0

•  IF WE HAVE AT LEAST ONE CNV AFFECTING AT LEAST ONE GENE IN A CERTAIN PATHWAY GI, THEN WE HAVE A PERTURBATION POTENTIAL IN THAT PATHWAY

•  WE COUNT THE PRESENCE / ABSENCE OF SUCH PERTURBATION POTENTIAL IN PATIENTS

PaJent  #1   PaJent  #2   PaJent  #3   …   PaJent  #i   …   PaJent  #n  

GS1   1   1   1   …   0   …   0  

GS2   0   0   1   …   1   …   0  

GS3   0   0   0   …   0   …   0  

DANIELE MERICO  

PATHWAY ASSOCIATION TEST

Page 41: Computational challenges in precision medicine and genomics

DESCRIPTION:

• THE SIGNIFICANCE OF A GENE-SET IS THEN ASSESSED USING THE FISHER’S EXACT TEST FOR ASSOCIATION

• A SIGNIFICANT GENE-SET IS AFFECTED BY A MUTATION POTENTIAL MORE FREQUENTLY IN CASES THAN CONTROLS

• THE FDR IS ESTIMATED BY SHUFFLING THE COLUMNS IN THE ‘GENE-SET BY PATIENT’ COUNT TABLE

Case   Control  

GSi   13   1  

Not  in  GSi   1146  -­‐  13   889  -­‐  1  

PaJent  #1   PaJent  #2   PaJent  #3   …   PaJent  #i   …   PaJent  #n  

GS1   1   1   1   …   0   …   0  

GS2   0   0   1   …   1   …   0  

GS3   0   0   0   …   0   …   0  

PATHWAY ASSOCIATION TEST

Page 42: Computational challenges in precision medicine and genomics
Page 43: Computational challenges in precision medicine and genomics

BENEFITS OF SYSTEMS THINKING

•  IMPROVES STATISTICAL POWER – FEWER TESTS

•  MORE REPRODUCIBLE – E.G. GENE EXPRESSION SIGNATURES

•  EASIER TO INTERPRET – FAMILIAR CONCEPTS E.G. CELL CYCLE

•  IDENTIFIES MECHANISM – CAN EXPLAIN CAUSE

VS. PARTS THINKING

Page 44: Computational challenges in precision medicine and genomics

DATABASES EXPERIMENTS, PREDICTIONS

LITERATURE EXPERTS

GENOME++ MOLECULAR, PHYSIOLOGICAL

PHENOTYPE

ENCODES EXPLAINS

ENVIRONMENT CELL MECHANISM

MODULATES

Page 45: Computational challenges in precision medicine and genomics

HTTP://PATHWAYCOMMONS.ORG

Page 46: Computational challenges in precision medicine and genomics

THE FACTOID PROJECT

MAX FRANZ, IGOR RODCHENKOV, OZGUN BABUR, EMEK DEMIR, CHRIS SANDER

HELPING AUTHORS DIGITIZE THEIR PUBLISHED KNOWLEDGE

HTTP://FACTOID.BADERLAB.ORG/

Page 47: Computational challenges in precision medicine and genomics

NETWORK VISUALIZATION AND ANALYSIS

UCSD, ISB, AGILENT, MSKCC, PASTEUR, UCSF HTTP://CYTOSCAPE.ORG

PATHWAY COMPARISON LITERATURE MINING GENE ONTOLOGY ANALYSIS ACTIVE MODULES COMPLEX DETECTION NETWORK MOTIF SEARCH

Page 48: Computational challenges in precision medicine and genomics

CYTOSCAPE.JS: HTML5 – TOUCH CYTOSCAPE.GITHUB.COM/CYTOSCAPE.JS/ MAX FRANZ

Page 49: Computational challenges in precision medicine and genomics

GENE FUNCTION PREDICTION

HTTP://WWW.GENEMANIA.ORG

QUAID MORRIS (DONNELLY) RASHAD BADRAWI, OVI COMES, SYLVA DONALDSON, MAX FRANZ, CHRISTIAN LOPES, FARZANA KAZI, JASON MONTOJO, HAROLD RODRIGUEZ, KHALID ZUBERI

•  GUILT-BY-ASSOCIATION PRINCIPLE •  BIOLOGICAL NETWORKS ARE COMBINED INTELLIGENTLY TO OPTIMIZE PREDICTION ACCURACY •  ALGORITHM IS MORE FAST AND ACCURATE THAN ITS PEERS

Page 50: Computational challenges in precision medicine and genomics

SOCIAL CHALLENGES

•  BIOETHICS AND DATA SHARING •  ENGAGING RESEARCHERS

– CROWDSOURCING: TCGA PAN CANCER, DREAM

•  ENCOURAGING RESEARCHERS TO EXPLORE UNCHARTED TERRITORY

•  NEED FOR QUANTITATIVE THINKING IN BIOLOGY –  NEW PH.D. PROGRAM IN THE MOLECULAR GENETICS

DEPARTMENT AT THE UNIVERSITY OF TORONTO

NATURE. 2011 FEB 10;470(7333):163-5 WWW.NATURE.COM/TCGA/

Page 51: Computational challenges in precision medicine and genomics

EPENDYMOMA

•  3RD MOST COMMON BRAIN TUMOUR IN CHILDREN •  INCURABLE IN UP TO 45% OF PATIENTS

STEVE  MACK,  MICHAEL  TAYLOR,  RUTH  ISSERLIN  -­‐  CANCER  CELL.  2011  AUG  16;20(2):143-­‐57  

GENE  EXPRESSION   PATIENT  AGE   OVERALL  SURVIVAL  

Page 52: Computational challenges in precision medicine and genomics

EPENDYMOMA  GENOMIC  ANALYSIS  •  EPENDYMOMA  BRAIN  CANCER  -­‐  MOST  COMMON  AND  MORBID  LOCATION  

FOR  CHILDHOOD  IS  THE  POSTERIOR  FOSSA  (PF  =  BRAINSTEM  +  CEREBELLUM)  

•  TWO  SUBTYPES  BY  GENE  EXPRESSION:  PFA  -­‐  YOUNG,  DISMAL  PROGNOSIS,  PFB  -­‐  OLDER,  EXCELLENT  PROGNOSIS.  

•  WHOLE  GENOME  SEQUENCING  (47  SAMPLES)  SHOWED  ALMOST  NO  MUTATIONS,  HOWEVER  DNA  METHYLATION  ARRAYS  SHOWED  CLEAR  CLUSTERING  INTO  PFA  AND  PFB  (79  SAMPLES)  

•  PFA  MORE  TRANSCRIPTIONALLY  SILENCED  BY  CPG  METHYLATION  

STEVE MACK, MICHAEL TAYLOR, SCOTT ZUYDERDUYN NATURE, FEB. 2014

Page 53: Computational challenges in precision medicine and genomics

POLYCOMB REPRESSOR COMPLEX 2 – INHIBITED BY DZNEP AND GSK343 – KILLED PFA CELLS NO KNOWN TREATMENT, SO NOW GOING TO CLINICAL TRIAL, COMPASSIONATE USE IN ONE PATIENT

Page 54: Computational challenges in precision medicine and genomics

2 MONTHS 3 MONTHS 3 CYCLES VIDAZA

9 YO WITH METASTATIC PF EPENDYMOMA TO LUNG TREATED WITH AZACYTIDINE

TREATMENT OF METASTATIC PF EPENDYMOMA WITH VIDAZA

MICHAEL TAYLOR

Page 55: Computational challenges in precision medicine and genomics

ACKNOWLEDGEMENTS BADER LAB DOMAIN INTERACTION TEAM SHOBHIT JAIN BRIAN LAW JÜRI REIMAND MOHAMED HELMY ANDREA UETRECHT MARINA OLHOVSKY CANCER GENOMICS FLORENCE CAVALLI DAVID SHIH ASHA ROSTAMIANFAR PRECISION MEDICINE RON AMMAR SHIRLEY HUI

FUNDING

HTTP://BADERLAB.ORG

PATHWAY AND NETWORK ANALYSIS RUTH ISSERLIN IGOR RODCHENKOV SCOTT ZUYDERDUYN RUTH WONG VERONIQUE VOISIN SHAHEENA BASHIR KHALID ZHUBERI CHRISTIAN LOPES JASON MONTOJO MAX FRANZ HAROLD RODRIGUEZ