big data opportunities and challenges in human disease genetics & genomics

15
Big Data Opportunities and Challenges in Human Disease Genetics & Genomics Manolis Kellis Computer Science & Artificial Intelligence Laboratory road Institute of MIT and Harvard

Upload: afram

Post on 24-Feb-2016

69 views

Category:

Documents


0 download

DESCRIPTION

Big Data Opportunities and Challenges in Human Disease Genetics & Genomics. Manolis Kellis. Broad Institute of MIT and Harvard. MIT Computer Science & Artificial Intelligence Laboratory. Big data Opportunities & Challenges in human disease genetics & genomics. - PowerPoint PPT Presentation

TRANSCRIPT

Big Data Opportunities and Challengesin Human Disease Genetics & Genomics

Manolis Kellis

MIT Computer Science & Artificial Intelligence LaboratoryBroad Institute of MIT and Harvard

Big data Opportunities & Challenges in human disease genetics & genomics

• The goal: Mechanistic basis of human disease– Epigenomics: Enhancers, networks, regulators, motifs– Genetics: GWAS, QTLs, molecular epidemiology

• The challenges / opportunities: – Effects are very small, huge number of hypotheses– Much larger cohorts are needed, consent limitations– Technologies for privacy vs. excuse for data hoarding

• Overcoming the challenges: – Case study: Schizophrenia, Alzheimer’s– Collaboration & sharing: personal & technological

CATGACTGCATGCCTG

GeneticVariant

Disease

Environment

Bringing knowledge gap from genetics to disease

Chromatinstates

Promoter

Enhancer

Insulator

Silencer

Circuitry

Control regions

Retina

Heart

Cortex

Lung

Blood

Skin

Nerve

TissueCell Type

Intermediateeffects

LipidsTensionEye drusenMetabolismDrug response

Protein

miRNA

TIMP3

ncRNA

Target genes

Factors

Requires: systematic understanding of genome function

The most complete map of human gene regulation

• 2.3M regulatory elements across 127 tissue/cell types• High-resolution map of individual regulatory motifs• Circuitry: regulatorsregionsmotifstarget genes

Non-coding variants lie in tissue-specific regulatory regions

• Yield new insights on relevant tissues and pathways• Enable linking non-coding elements to relevant target genes• Provide a mechanistic basis for developing therapeutics

Control regions harbor 1000s weak-effect disease SNPs

• GWAS top hits only explain small fraction of trait heritability• Functional enrichments well past genome-wide significance

Poorly ranked SNP nearby

Highly rankedSNP nearby

Bayesian integration of weak effects disease modules

• MAZ no direct assoc, but clusters w/ many T1D hits• MAZ indeed known regulator of insulin expression

Disease geneGenetic associationDisease SNP

Brain methylation changes in Alzheimer’s patients

• Variation in methylation patterns largely genotype driven• Global signature of repression in 1000s regulatory regions:

hypermethylation, enhancer states, brain regulator targets

Genotype(1M SNPsx700 ind.)

Methylation(450k probes

x 700 ind)

Reference Chromatin

states

Dorsolateral PFC

MAP Memory and Aging Project+ ROS Religious Order Study

Big data Opportunities & Challenges in human disease genetics & genomics

• The goal: Mechanistic basis of human disease– Epigenomics: Enhancers, networks, regulators, motifs– Genetics: GWAS, QTLs, molecular epidemiology

• The challenges / opportunities: – Effects are very small, huge number of hypotheses– Much larger cohorts are needed, consent limitations– Technologies for privacy vs. excuse for data hoarding

• Overcoming the challenges: – Case study: Schizophrenia, Alzheimer’s– Collaboration & sharing: personal & technological

Big data Opportunities & Challenges in human disease genetics & genomics

• The goal: Mechanistic basis of human disease– Epigenomics: Enhancers, networks, regulators, motifs– Genetics: GWAS, QTLs, molecular epidemiology

• The challenges / opportunities: – Effects are very small, huge number of hypotheses– Much larger cohorts are needed, consent limitations– Technologies for privacy vs. excuse for data hoarding

• Overcoming the challenges: – Case study: Schizophrenia, Alzheimer’s– Collaboration & sharing: personal & technological

Scaling of QTL discovery power w/ sample

• Number of meQTLs continues to increase linearly• Weak-effect meQTLs: median R2<0.1 after 400 indiv.

2006 2007 2008 2009 2010 2011 2012 2013 2014 20150

20

40

60

80

100

120

WCPG Hamburg 2012 (~65K)

Freeze Jan. 2013 (~70K)

Incl. SWE + CLOZUK(~60K)

Inflection point in complex trait GWAS

Freeze May 2013 (~80K)

Incl. replication (~100K)

Schizophrenia GWAS: Number of significant loci

35,000 cases 62 loci!

3,500 cases 0 loci

10,000 cases 5 loci

Similar inflection point found in every complex trait!

Significantly associated regions (p < 5e-08)

Adult height Crohn’s Schizophrenia(per 5000/5000) (per 1000/1000) (per 3000/3000)

1x 0 2 12x 2 4 23x 7 5 69x 68 51 6218x 180 - -

Same story in:• Type 1 diabetes• Type 2 diabetes• Serum cholesterol level• Every common chronic

disease

• Proof that Schizophrenia is a heritable, medical disorder • Genetic architecture similar to non-brain diseases and traits• Many genes recognition of key pathways and processes

• Voltage-gated calcium channels (CACNA1C, CACNA1D, CACNA1I, CACNB2)• Proteins interacting with FMRP, fragile X gene• Neuron organization: Postsynaptic density, dendritic spine heads• Enhancers: brain (angular gyrus, inferior temporal lobe), immune

Larger samples lead to new biological insights

Big data Opportunities & Challenges in human disease genetics & genomics

• The goal: Mechanistic basis of human disease– Epigenomics: Enhancers, networks, regulators, motifs– Genetics: GWAS, QTLs, molecular epidemiology

• The challenges / opportunities: – Effects are very small, huge number of hypotheses– Much larger cohorts are needed, consent limitations– Technologies for privacy vs. excuse for data hoarding

• Overcoming the challenges: – Collaboration, consortia, sharing of datasets– Case study: Schizophrenia, Alzheimer’s