computational personal genomics: selection, regulation, epigenomics, disease manolis kellis mit...

17
Computational personal genomics: selection, regulation, epigenomics, disease Manolis Kellis Computer Science & Artificial Intelligence Laboratory road Institute of MIT and Harvard

Upload: ruth-jefferson

Post on 29-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Computational personal genomics: selection, regulation, epigenomics,

disease

Manolis Kellis

MIT Computer Science & Artificial Intelligence Laboratory

Broad Institute of MIT and Harvard

Recombination breakpointsFa

mily

Inhe

ritan

ce

Me vs. my brother

My dadDad’s mom Mom’s dad

Hum

an a

nces

try

Dis

ease

risk

Genomics: Regions mechanisms drugs Systems: genes combinations pathways

Personal genomics today: 23 and We

Goal: A systems-level understanding of genomes and gene regulation:• The regulators: Transcription factors, microRNAs, sequence specificities• The regions: enhancers, promoters, and their tissue-specificity• The targets: TFstargets, regulatorsenhancers, enhancersgenes• The grammars: Interplay of multiple TFs prediction of gene expression

The parts list = Building blocks of gene regulatory networks

CATGACTGCATGCCTG

Disease-associated

variant (SNP/CNV/…)

Gene annotation

(Coding, 5’/3’UTR, RNAs) Evolutionary signatures

Non-coding annotation

Chromatin signatures

Roles in gene/chromatin regulation

Activator/repressor signatures

Other evidence of function

Signatures of selection (sp/pop)

Understanding human variation and human disease

• Challenge: from loci to mechanism, pathways, drug targets

Compare 29 mammals: Reveal constrained positions

• Reveal individual transcription factor binding sites• Within motif instances reveal position-specific bias• More species: motif consensus directly revealed

NRSFmotif

Chromatin state dynamics across nine cell types

• Single annotation track for each cell type• Summarize cell-type activity at a glance• Can study 9-cell activity pattern across

Correlatedactivity

Predictedlinking

xx

• Disease-associated SNPs enriched for enhancers in relevant cell types• E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator

Revisiting disease- associated variants

HaploReg: Automate search for any disease study(compbio.mit.edu/HaploReg)

• Start with any list of SNPs or select a GWA study– Mine publically available ENCODE data for significant hits– Hundreds of assays, dozens of cells, conservation, motifs– Report significant overlaps and link to info/browser

Experimental dissection of regulatory motifsfor 10,000s of human enhancers

54000+ measurements (x2 cells, 2x repl)

Example activator: conserved HNF4

motif matchWT expression specific to HepG2

Non-disruptive changes maintain expression

Motif match disruptions reduce expression to background

Random changes depend on effect to motif match

Allele-specific chromatin marks: cis-vs-trans effects

• Maternal and paternal GM12878 genomes sequenced• Map reads to phased genome, handle SNPs indels• Correlate activity changes with sequence differences

Brain methylation in 750 Alzheimer patients/controls

500,000methylation

probes

750 individuals

• 10+ years of cognitive evaluations, post-mortem brains• 93% of functional epigenomic variation is genotype driven!• Global repression in 7,000 enhancers, brain-specific targets

Phil de Jager, Roadmap disease epigenomics

Brad BernsteinREMC mapping

Genome Epigenome

meQTL

Phenotype

Epigenome

ClassificationMWAS

12

Global hyper-methylation in 1000s of AD-associated loci

Alzheimer’s-associated probes are hypermethylated

480,000 probes, ranked by Alzheimer’s association

P-v

alue

Met

hyla

tion

Top 7000 probes

• Global effect across 1000s of probes– Rank all probes by Alzheimer’s association– 7000 probes increase methylation (repressed)– Enriched in brain-specific enhancers– Near motifs of brain-specific regulators

Complex disease: genome-wide effects

Human constraint outside conserved regions

• Non-conserved regions: – ENCODE-active regions

show reduced diversity

Lineage-specific constraint in biochemically-active regions

• Conserved regions: – Non-ENCODE regions

show increased diversity

Loss of constraint in human when biochemically-inactive

Average diversity (heterozygosity)

Aggregate overthe genome

Active regions

Covers computational challenges associated with personal genomics:- genotype phasing and haplotype reconstruction resolve mom/dad chromosomes- exploiting linkage for variant imputation co-inheritance patterns in human population- ancestry painting for admixed genomes result of human migration patterns- predicting likely causal variants using functional genomics from regions to mechanism- comparative genomics annotation of coding/non-coding elements gene regulation- relating regulatory variation to gene expression or chromatin quantitative trait loci- measuring recent evolution and human selection selective pressure shaped our genome- using systems/network information to decipher weak contributions combinatorics- challenge of complex multi-genic traits: height, diabetes, Alzheimer's 1000s of genes

Personal genomics tomorrow: Already 100,000s of complete genomes

• Health, disease, quantitative traits: – Genomics regions disease mechanism, drug targets– Protein-coding cracking regulatory code, variation– Single genes systems, gene interactions, pathways

• Human ancestry: – Resolve all of human ancestral relationships– Complete history of all migrations, selective events– Resolve common inheritance vs. trait association

• What’s missing is the computation– New algorithms, machine learning, dimensionality reduction– Individualized treatment from 1000s genes, genome– Understand missing heritability– Reveal co-evolution between genes/elements– Correct for modulating effects in GWAS

Collaborators and Acknowledgements

• Chromatin state dynamics– Brad Bernstein, ENCODE consortium

• Methylation in Alzheimer’s disease– Phil de Jager, Brad Bernstein, Epigenome Roadmap

• Mammalian comparative genomics– Kerstin Lindblad-Toh, Eric Lander, 29 mammals consortium

• Massively parallel enhancer reporter assays– Tarjei Mikkelsen, Broad Institute

• Funding– NHGRI, NIH, NSF

Sloan Foundation

DanielMarbach

Mike Lin

JasonErnst

JessicaWu

RachelSealfon

PouyaKheradpour

ManolisKellis

ChrisBristow

LoyalGoff

IrwinJungreis

MIT Computational Biology groupCompbio.mit.edu

SushmitaRoy

Luke Ward

Stata4

Stata3

LouisaDiStefano Dave

Hendrix

AngelaYen

BenHolmes Soheil

FeiziMukulBansal

BobAltshuler

StefanWashietl

MattEaton