big data opportunities and challenges in human disease genetics & genomics manolis kellis mit...
TRANSCRIPT
Big Data Opportunities and Challengesin Human Disease Genetics & Genomics
Manolis Kellis
MIT Computer Science & Artificial Intelligence Laboratory
Broad Institute of MIT and Harvard
Big data Opportunities & Challenges in human disease genetics & genomics
• The goal: Mechanistic basis of human disease– Epigenomics: Enhancers, networks, regulators, motifs– Genetics: GWAS, QTLs, molecular epidemiology
• The challenges / opportunities: – Effects are very small, huge number of hypotheses– Much larger cohorts are needed, consent limitations– Technologies for privacy vs. excuse for data hoarding
• Overcoming the challenges: – Case study: Schizophrenia, Alzheimer’s– Collaboration & sharing: personal & technological
CATGACTGCATGCCTG
GeneticVariant
Disease
Environment
Bringing knowledge gap from genetics to disease
Chromatinstates
Promoter
Enhancer
Insulator
Silencer
Circuitry
Control regions
Retina
Heart
Cortex
Lung
Blood
Skin
Nerve
TissueCell Type
Intermediateeffects
LipidsTensionEye drusenMetabolismDrug response
Protein
miRNA
TIMP3
ncRNA
Target genes
Factors
Requires: systematic understanding of genome function
The most complete map of human gene regulation
• 2.3M regulatory elements across 127 tissue/cell types• High-resolution map of individual regulatory motifs• Circuitry: regulatorsregionsmotifstarget genes
Non-coding variants lie in tissue-specific regulatory regions
• Yield new insights on relevant tissues and pathways• Enable linking non-coding elements to relevant target genes• Provide a mechanistic basis for developing therapeutics
Control regions harbor 1000s weak-effect disease SNPs
• GWAS top hits only explain small fraction of trait heritability• Functional enrichments well past genome-wide significance
Poorly ranked SNP nearby
Highly rankedSNP nearby
Bayesian integration of weak effects disease modules
• MAZ no direct assoc, but clusters w/ many T1D hits• MAZ indeed known regulator of insulin expression
Disease geneGenetic associationDisease SNP
Brain methylation changes in Alzheimer’s patients
• Variation in methylation patterns largely genotype driven• Global signature of repression in 1000s regulatory regions:
hypermethylation, enhancer states, brain regulator targets
Genotype(1M SNPsx700 ind.)
Methylation(450k probes
x 700 ind)
Reference Chromatin
states
Dorsolateral PFC
MAP Memory and Aging Project+ ROS Religious Order Study
Big data Opportunities & Challenges in human disease genetics & genomics
• The goal: Mechanistic basis of human disease– Epigenomics: Enhancers, networks, regulators, motifs– Genetics: GWAS, QTLs, molecular epidemiology
• The challenges / opportunities: – Effects are very small, huge number of hypotheses– Much larger cohorts are needed, consent limitations– Technologies for privacy vs. excuse for data hoarding
• Overcoming the challenges: – Case study: Schizophrenia, Alzheimer’s– Collaboration & sharing: personal & technological
Big data Opportunities & Challenges in human disease genetics & genomics
• The goal: Mechanistic basis of human disease– Epigenomics: Enhancers, networks, regulators, motifs– Genetics: GWAS, QTLs, molecular epidemiology
• The challenges / opportunities: – Effects are very small, huge number of hypotheses– Much larger cohorts are needed, consent limitations– Technologies for privacy vs. excuse for data hoarding
• Overcoming the challenges: – Case study: Schizophrenia, Alzheimer’s– Collaboration & sharing: personal & technological
Scaling of QTL discovery power w/ sample
• Number of meQTLs continues to increase linearly• Weak-effect meQTLs: median R2<0.1 after 400 indiv.
2006 2007 2008 2009 2010 2011 2012 2013 2014 20150
20
40
60
80
100
120
WCPG Hamburg 2012 (~65K)
Freeze Jan. 2013 (~70K)
Incl. SWE + CLOZUK(~60K)
Inflection point in complex trait GWAS
Freeze May 2013 (~80K)
Incl. replication (~100K)
Schizophrenia GWAS: Number of significant loci
35,000 cases 62 loci!
3,500 cases 0 loci
10,000 cases 5 loci
Similar inflection point found in every complex trait!
Significantly associated regions (p < 5e-08)
Adult height Crohn’s Schizophrenia(per 5000/5000) (per 1000/1000) (per 3000/3000)
1x 0 2 12x 2 4 23x 7 5 69x 68 51 6218x 180 - -
Same story in:• Type 1 diabetes• Type 2 diabetes• Serum cholesterol level• Every common chronic
disease
• Proof that Schizophrenia is a heritable, medical disorder • Genetic architecture similar to non-brain diseases and traits• Many genes recognition of key pathways and processes
• Voltage-gated calcium channels (CACNA1C, CACNA1D, CACNA1I, CACNB2)
• Proteins interacting with FMRP, fragile X gene• Neuron organization: Postsynaptic density, dendritic spine heads• Enhancers: brain (angular gyrus, inferior temporal lobe), immune
Larger samples lead to new biological insights
Big data Opportunities & Challenges in human disease genetics & genomics
• The goal: Mechanistic basis of human disease– Epigenomics: Enhancers, networks, regulators, motifs– Genetics: GWAS, QTLs, molecular epidemiology
• The challenges / opportunities: – Effects are very small, huge number of hypotheses– Much larger cohorts are needed, consent limitations– Technologies for privacy vs. excuse for data hoarding
• Overcoming the challenges: – Collaboration, consortia, sharing of datasets– Case study: Schizophrenia, Alzheimer’s