Motivating questions
• How do phenotypes vary across individuals?– Regulatory changes drive cellular and organismal
traits– Likely also drive evolutionary differences
• How are genes (co)regulated?– Pathways, processes, contexts
Regulatory variation
• What do “interesting” variants do?• Genetic changes to:
– Coding sequence **– Gene expression levels– Splice isomer levels– Methylation patterns– Chromatin accessibility– Transcription factor binding kinetics– Cell signaling– Protein-protein interactions
~88% of GWAS hits are regulatory
Genetic variation alters regulation
• Protein levels – Maize (Damerval 94)
• Expression levels– Yeast, maize, mouse, humans (Brem 02, Schadt 03,
Stranger 05, Stranger 07)• RNA splicing
– Humans (Pickrell 12, Lappalainen 13)• Methylation and Dnase I peak strength
– Humans (Degner 12; Gibbs 12)
• cis-eQTL– The position of the eQTL maps near
the physical position of the gene.– Promoter polymorphism?– Insertion/Deletion?– Methylation, chromatin
conformation?
• trans-eQTL– The position of the eQTL does not
map near the physical position of the gene.
– Regulator?– Direct or indirect?
Modified from Cheung and Spielman 2009 Nat Gen
Genetics of gene expression (eQTL)
Cis- eQTL analysis: Test SNPs within a pre-defined distance of gene
1Mb 1Mb
SNPsgene
probe
1Mb window
QT association• Analysis of the relationship between a dependent or outcome
variable (phenotype) with one or more independent or predictor variables (SNP genotype)
Yi = b0 + b1Xi + ei
Number of A1 Alleles0 1 2
Conti
nuou
s Tr
ait V
alue
b0
Slope: b1
Linear Regression Equation
Logistic Regression Equation
= b0 + b1Xi + eiln( )pi
(1-pi)
DOES REGULATORY VARIATION ALTER PHENOTYPE? APPLICATION TO GWAS
Candidate genes, perturbations underlying organismal phenotypes
Rationale
• How do disease/trait variants actually alter biology?
• If they change regulation, then:– Change in gene expression/isoform use– Phenotypic consequence*
Pearson’s covariance for windows of 51 SNPs between –log(p) in 2 traits
CD GWAS p
eQTL p
Detect a peak when effect is the sameNo peak when there are independent hits near each other
Crohn’s/eQTL analysis
• CD meta analysis (GWAS only)• CEU Hapmap LCL eQTL data• Overlapping SNPs only (eQTL data has 610K
SNPs, most in CD meta-analysis)• Test 133 associations (total 1054 tests)
GWAS peak
eQTL for gene 1
eQTL for gene 2
Crohn’s/eQTL analysisSNP CHR Gene
rs11742570 5 PTGER4
rs12994997 2 ATG16L1
rs11401 16 SPNS1
rs10781499 9 INPP5E
rs2266959 2 C22orf29
A peak implies that the same effect drives GWAS and eQTL
MS/eQTL analysisSNP CHR Gene
rs6880778 5 PTGER4
rs7132277 12 CDK2AP
rs7665090 4 CISD2
rs2255214 3 GOLGB1 & EAF2
rs201202118 12 METTL1 & TSFM
rs12946510 17 ORMDL3, STARD3 & ZPBP2
rs2283792 22 PPM1F
rs7552544 1 SLC30A7
rs34536443 19 SLC44A2
A peak implies that the same effect drives GWAS and eQTL
gene 3
Whole-genome eQTL analysis is an independent GWAS for expression of each
gene
gene 2
gene N
gene 5
gene 4
gene 1
Issues with trans mapping
• Power– Genome-wide significance is 5e-8
– Multiple testing on ~20K genes– Sample sizes clearly inadequate
• Data structure– Bias corrections deflate variance– Non-normal distributions
• Sample sizes– Far too small
CPMA for correlated traits
• Empirical assessment to account for correlation
• Simulate Z scores under covariance, recalculate CPMA
• Construct distribution of CPMA for dataset, call significance
with Ben Voight, U Penn
Experimental design
610,180 SNPs MAF >0.15 CEU and YRI
LD pruned (r2 < 0.2)
8368 transcriptsDetectable on Illumina arrays
108 CEU individuals*109 YRI individuals*
* Stranger et al Nat Genet 2007(LCL data; publicly available)
CEU p-values Transcript ~ SNP, sex
YRI p-values Transcript ~ SNP, sex
plink CPMA
CEU CPMA scores
YRI CPMA scores
>95%ile sim CPMA
Target sets of genes
• trans-acting variant: SNP with CPMA evidence• Target genes: genes affected by trans-acting
variant (i.e. regulon)
Prediction 1
• Allelic effects should be conserved between two populations– Binomial test on paired observations for all genes
P < 0.05 in at least one population
True for 1124/1311 SNPs (binomial p < 0.05)
Genes pCEU < 0.05
Genes pYRI < 0.05
CEU + + - - +
YRI + + - - +
YRI - - + + -
Prediction 2
• Target genes should overlap– Identify by mixture of gaussians classification– Empirical p from distribution of overlaps between
NCEU and NYRI genes across SNPs.
True for 600/1311 SNPs (empirical p < 0.05)
Genes pCEU < 0.05
Genes pYRI < 0.05
What about the target genes?
• Regulons:– Encode proteins more
connected than expected by chance
www.broadinstitute.org/mpg/dapple.phpRossin et al 2011 PLoS Genetics
What about the target genes?
• Regulons:– Encode proteins enriched for
TF targets (ENCODE LCL data)– 24/67 filtered TFs significant– Binomial overlap test
TF p-value
CEBPB 3.7 x 10-142
HDAC8 7.8 x 10-122
FOS 2.5 x 10-96
JUND 3.7 x 10-88
NFYB 3.3 x 10-71
ETS1 3.8 x 10-63
FAM48A 2.1 x 10-61
FOXA1 1.4 x 10-33
GATA1 4.6 x 10-33
HEY1 7.8 x 10-32
transtarget genes
CHiPseqLCL targetgenes
Summary
• Regulatory variation is common• It affects gene expression levels• Likely many other types:
– DNA accessibility, chromatin states– Transcript splicing, processing, turnover
• Has phenotypic consequences– GWAS– Some cellular assays (not discussed here)
Open questions
• Discover regulatory elements (cis)– Promoters, enhancers etc
• Gene regulatory circuits (trans)• Dynamics of regulation
– Splicing variation, processing, degradation• Phenotypic consequences
– Cellular assays required• Tie in to organismal phenotype
GTEx – Genotype-Tissue EXpressionAn NIH common fund project
Current: 35 tissues from 50 donors
Scale up: 20K tissues from 900 donors.
Novel methods groups: 5 current + RFA
How can we make RNAseq useful?
• Standard eQTLs – Montgomery et al, Pickrell et al Nature 2010
• Isoform eQTLs– Depth of sequence!
• Long genes are preferentially sequenced• Abundant genes/isoforms ditto• Power!?• Mapping biases due to SNPs
RNAseq combined with other techs
• Regulons: TF gene sets via CHiP/seq– Look for trans effects
• Open chromatin states (Dnase I; methylation)– Find active genes– Changes in epigenetic marks correlated to RNA– Genetic effects
• RNA/DNA comparisons – Simultaneous SNP detection/genotyping– RNA editing ???