gene expression group presentation at gaw 19
DESCRIPTION
Presentation slides of the gene expression group at GAW19, Genetic Analysis Workshop, Vienna 2014TRANSCRIPT
Gene expression group
• Fearless leader: Rita Cantor
General structure of our group presentatons
• 3 subgroups– Gene expression alone (Renaud Tissier)
– Genetcs of gene expression (August Blackburn)
– Genetcs of gene expression and phenotype (Heather Cordell)
Biological/technical background
X X XX
5%
30%
60% 2%1%
Cytotoxic
Helper
~11,022 genes from 20,634 probes
# of probes per gene symbolProbes # of Genes1 10,5282 4693 234 2
exon
Gene
intron
Alternatve Splicing
General structure of our group presentatons
• 3 subgroups– Gene expression alone (Renaud Tissier)
– Genetcs of gene expression (August Blackburn)
– Genetcs of gene expression and phenotype (Heather Cordell)
Aims
• Understand the correlaton structure of expression of 1000s of genes across individuals, pedigrees
• … and their relatonship to phenotype (SBP)
Data used
• All probes; all individuals; no phenotype (Gallaugher –P3)
• WGCNA to get 14K probes; 82 individuals with SBP>75% at all 4 tme points in real data (Gadaleta)
• 25% most heritable probes (4.9K, Göring et al., 2007); 5 largest pedigrees (2,5,6,8,10) (n=276); SBP visit 1, rep 1 of simulated (Tissier –P6)
• External data (HaemAtlas, DAVID; Gallaugher & Tissier)
• No-one used genotypes or WGS (yet)
Methods
• Principal Components (Gallaugher –P3 & Tissier – P6)
• Lasso regression (Gadaleta)
• WGCNA: weighted gene co-expression network analysis (Gadaleta & Tissier)
• Meta-analysis across pedigrees (Tissier)
• Gene enrichment (Tissier)
• Linear Mixed Models (Tissier & Gallaugher)
Gallaugher
• T-, B- lymphocyte and monocyte counts vary between people, are heritable, and thus may confound genetc mapping of eQTL
• Principal Component analysis to identfy variaton in gene expression between people
• Determine if PCs associated with variables (age, sex, BP, HT, medicaton, pedigree)
Peds 5, 6, and 8 signifcantly diferent for PC2 (p<10-3)
Estmate proporton of cells for each individual using sorted cell expression data (HaemAtlas)
Cytotoxic T cell proporton
helper T cell proporton
Gross outlier from ped #8 for both Tc and Th ? Acute infecton
Conclusions:Variaton in gene expression in PBMC could be incorporated into genetc analysis to improve power
WGCNA: Weighted Gene Co-expression Network Analysis(Tissier & Gadaleta)
Tissier
5K genes
Gene clusters
Tissier
NETWORKCONSTRUCTION
iiGadaleta
GAW DATA ANALYSIS
WGCNA
SBP @75%
sam
ple
s
probes less probes
less
sam
ple
s
Gadaleta
min
penalty (sparsity)
covariance matrix (associaton)
gene matrixresponse
GENERALIDEA
Gadaleta
RESULTS /CONCLUSION
Small number of samples vs. high number of covariates
Computatonal burden of LASSO too high
Gadaleta
No signifcant gene networks detected in cases with SBP>75%
Sub-group Conclusions
• Complex correlaton structure of gene expression (Gallaugher)– Diferent for specifc pedigrees ; outlier
– Biological (rare variants) or technical (mixed cells, batch efects, acute illness)
• Only 1 gene (DUSP1) was in the answers (Tissier)– Meta-analysis across pedigrees can be more robust for fltering
than correctng for family structure
• High-dimensional data needs larger sample sizes and controls (Gadaleta)– Diferental network analysis
General structure of our group presentatons
• 3 subgroups– Gene expression alone (Renaud Tissier)
– Genetcs of gene expression (August Blackburn)
– Genetcs of gene expression and phenotype (Heather Cordell)
Identfying Genetc contributon to Gene Expression
• All used pedigree genotype and expression data
• Cis-eQTL regions genetc architecture (Cantor, 3 genes with high eQTL LODs (Göring 2007), Imputed genotype dosages)
• Allele Specifc Binding flters potental regulatory SNPs (Peralta – P4, ENCODE, Imputed genotype dosages)
• Replicaton of reported epistatc interactons (candidate SNPs (Hemani, 2014), GWAS)
• Haplotype specifc gene expression estmates (Blackburn – P2, RFSs identfed using HIPster, GWAS data)
Gene Enumeraton of Independent Signals by Sofware
FaST-LMM SOLAR MGA
# SNPs Conditoned
on
# Signifcant SNPs
Minimum P-value
# Signifcant SNPs Minimum P-value
TIMM10
0 25 2.9e-68 24 1.6e-661 23 2.2e-87 23 9.9e-862 10 5.0e-07 10 1.9e-073 2 0.03
4
1 0.04RPL14 0 73 1.5e-128 74 3.80e-124
1 29 0.001 29 0.00092 13 0.006 13 0.0033 11 0.006 4 0.014 1 0.02 2 0.035 1 0.04
LR8 0 67 3.6e-86 65 9.2e-831 39 2.2e-24 55 2.1e-22 2 47 1.1e-11
3 46 0.00014 37 0.00015 40 0.00036 23 0.00047 14 0.000028 14 0.0039 8 0.002
Independent Associatons for 3 Genes with best eQTL (LODs 37-43): alpha = 0.05
Gene Name Probe_id
Original LOD
Bprange
# SNPs conditoned
on # sig SNPs
Min p-val
TIMM10 GI_6912707-S 37 12120 01
89
1.6e-669.9e-86
RPL14 GI_16753224-S 34 14582 0
29 3.8e-124
LR8
GI_21361500-S 43 19100 012
29141
9.2e-832.1e-221.1e-11
Independent Associatons SOLAR-MGA; alpha = 5e-8
Conclusions: • Multple independent SNPs contribute to single eQTL regions• Number of independent cis eQTL associatons varies with the
level of signifcance and sofware used
Identfying Genetc contributon to Gene Expression
• All used pedigree genotype and expression data
• Cis-eQTL regions genetc architecture (Cantor, Siegmund, 3 genes with high eQTL LODs (Göring 2007), Imputed genotype dosages)
• Allele Specifc Binding (ASB) flters potental regulatory SNPs (Peralta - P4, ENCODE, Imputed genotype dosages)
• Replicaton of reported epistatc interactons (candidate SNPs (Hemani, 2014), GWAS)
• Haplotype specifc gene expression estmates (Blackburn – P2, RFSs identfed using HIPster, GWAS data)
http://www.discoveryandinnovation.com/BIOL202/notes/lecture18.html
http://www.genome.duke.edu/labs/crawford/images/dnase.gif
Peralta P4
ENCODE
Null model10k simulated phenotypes 0.15 < h2r < 0.250.01 < afreq < 0.50
10,552 ASB SNPs used to build the covariance kernel
Significant eQTL signals obtained for the 2 ASB based covariance kernels used
Peralta P4
Peralta – P4
• ASB is a biologically meaningful flter for the prioritzaton of non-coding variaton
– can be used to prioritze non-coding variants based on potental regulatory functon
• ASB correlates with gene expression levels– cis-ASB accounts for 53-83% of the variaton in neigboring gene
expression
• Segregaton of ASB in pedigrees can act as a background noise flter
– known biases in ASB predicton can be incorporated as weights into the correlaton kernel to improve signal to noise rato
Identfying Genetc contributon to Gene Expression
• All used pedigree genotype and expression data
• Cis-eQTL regions genetc architecture (Cantor, Siegmund, 3 genes with high eQTL LODs (Göring 2007), Imputed genotype dosages)
• Allele Specifc Binding flters potental regulatory SNPs (Peralta – P4, ENCODE, Imputed genotype dosages)
• Replicaton of reported epistatc interactons (Howey, candidate SNPs, GWAS data)– Hemani et al. Detection and replication of epistasis influencing
transcription in humans. Nature. 2014 508:249–253.
• Haplotype specifc gene expression estmates (Blackburn – P2, RFSs identfed using HIPster, GWAS data)
Evidence for replicaton of epistasis (Howey)
-
Howey Conclusions
• SNP-SNP interactons associated with gene expressions showed combined evidence of replicaton, p-value= 0.007
• Expression data is argued to give higher power for detectng associaton. This replicaton exercise seems to refect this
Identfying Genetc contributon to Gene Expression
• All used pedigree genotype and expression data
• Cis-eQTL regions genetc architecture (Cantor, Siegmund, 3 genes with high eQTL LODs (Göring 2007), Imputed genotype dosages)
• Allele Specifc Binding flters potental regulatory SNPs (Peralta – P4, ENCODE, Imputed genotype dosages)
• Replicaton of reported epistatc interactons (candidate SNPs (Hemani, 2014), GWAS)
• Haplotype specifc gene expression estmates (Blackburn – P2, RFSs identfed using HIPster, GWAS data)
Blackburn – P2
• Aim: To estmate haplotype-specifc gene expression levels and identfy diferences
• Methods: – Phased genotypes / IBD structure using HIPster.
Identfed recombinaton free segments (RFS).
– Haplotype specifc estmates generated using EM
– Diferences between haplotypes assessed using LRT
Recombinaton free segments (RFS)Blackburn
Recombination Free Segment Lengths
Length in bases
Fre
qu
en
cy
0 50000 150000 250000
05
00
10
00
150
0
Haplotype diferences (Blackburn)
• Null simulaton adheres to uniform distributon
• 542 of 8624 tests signifcant (q<0.1)
Haploytpe specific cis−eQTL
p
Fre
qu
en
cy
0.0 0.2 0.4 0.6 0.8 1.0
01
00
30
05
00
pi0=0.725
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
PTGS2
Expression
de
nsity
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
T2DG0800492_1
Methods Adjustng for Non-Independence Due to Relatedness
• Theoretcal kinship matrix – Variance component (Peralta, SOLAR)
– Eigensimplifcaton (Blackburn & Cantor, SOLAR-MGA)
• Empirical kinship matrix – Linear mixed model (Cantor, FaST-LMM & Howey,
GEMMA)
Advantages of Pedigrees
• Permit identfcaton of recombinaton free segments (RFS, Blackburn)
• True allele specifc binding (ASB) signals will segregate (Peralta)
Sub-group Conclusions
• Biological informaton from allele specifc binding can be used to flter potentally functonal regulatory SNPs
• Multple independent signals are observed at eQTL
• Epistasis
• Expression varies between haplotypes
• Genetc architecture of gene expression is complex (duh!)
General structure of our group presentatons
• 3 subgroups– Gene expression alone (Renaud Tissier)
– Genetcs of gene expression (August Blackburn)
– Genetcs of gene expression and phenotype (Heather Cordell)
Expression Phenotype (SBP,DBP,HT)
• With (3 papers) or without (1 paper) use of genotype data
Aims
• Pitsillides modeled gene expression as the primary outcome– Also looked for enrichment of GWAS results (GWAS for SBP or
DBP) in SNPs associated with expression
• Three papers tried to model phenotypes as the primary outcome– Radkowski (P5) tried to model future HT using expression
– Tong investgated whether using E+G did beter than using E or G alone
– Ainsworth (P1) fted causal models for relatonship between G, E and P
Expression data
• 2 papers used individual expression variables as predictors
• 1 paper used individual expression variables as outcomes– All expression variables, with SNPs located in same genetc
region used as predictors
• 1 paper used both individual expression variables and a clustered summary measure (from WGCNA)– Both as outcomes and predictors
Genetc/sample Data
• Two papers used WGS – Tong collapsed variants (common and rare) within genes, used
142 unrelated individuals from families
– Pitsillides used common SNPs, used all individuals in families
• Ainsworth used GWAS (common SNPs), all individuals in families
• Radkowski did not use genetc data– Used 340 family members without baseline HT or HT at frst visit
• All used real SBP, DBP, HT
Pedigree relatonships
• Ainsworth & Pitsillides used linear mixed models when modeling SNPs as predictors (for family data)
• Tong used unrelated individuals
• Two papers ignored family relatonships– When relatng E to P (Ainsworth & Radkowski)
– Or when doing causal modeling (Ainsworth)
Methods
• Linear mixed models: lmekin and FaST-LMM
• Unrelated individuals (Tong)– Non-parametric weighted U statstcs
– Models similarites in genotype (burden), gene expression and phenotype
• Causal modeling: structural equaton models (SEM) and Bayesian Unifed Framework (BUF) (Ainsworth)– Applied to a set of fltered variables for G, E, P
• Predictng future HT (Radkowski)– Calculated slope of regression of BP on tme-point
– Multple regression of slope on gene expression (with/without adjustment for medicaton efect)
Results• No p values reached statstcal signifcance (once multple
testng taken into account)– Probably due to low power
– Nevertheless all papers presented their “top fndings”
• Incorporaton of both G and E improved signifcance of associaton test (compared to G or E alone) (Tong)
• Adjustment for efect of medicaton gave a larger number of “signifcant” results than non-adjustment (Radkowski)
• SEM and BUF implicated very similar causal models (Ainsworth)
Table 1. Top 5 genes associated with SBP, DBP and HTNTong results
E E
Results• No p values reached statstcal signifcance (once multple
testng taken into account)– Probably due to low power
– Nevertheless all papers presented their “top fndings”
• Incorporaton of both G and E improved signifcance of associaton test (compared to G or E alone) (Tong)
• Adjustment for efect of medicaton gave a larger number of “signifcant” results than non-adjustment (Radkowski)
• SEM and BUF implicated very similar causal models (Ainsworth)
Causal models (Ainsworth)
Causal modeling (Ainsworth)
• SEM always implicated either model (b) or (d)– Model (d) was not considered by BUF, model (f) was implicated
instead
• Generally good agreement between SEM and BUF
Sub-group Conclusions
• Top results show no replicaton of previous fndings– Diferent (Mexican-American) populaton?
– Low power?
• Lots of diferent ways to consider gene expression data– Incorporate directly into analysis of G and P (e.g. to improve
power)
– Use directly as outcome
– As predictor of (future) phenotype
– To infer causal relatonships
Group-wide Conclusions
• Documented complexity of gene expression– One-gene at-a-tme vs. multple genes
simultaneously
– Multple alleles contribute to a single eQTL region
• Power– High for genotype -> expression (inc. epistasis)
– Low for genotype/expression -> phenotype
– Pedigrees present challenges, but can be useful