gene expression group presentation at gaw 19

Gene expression group

• Fearless leader: Rita Cantor

General structure of our group presentatons

• 3 subgroups– Gene expression alone (Renaud Tissier)

– Genetcs of gene expression (August Blackburn)

– Genetcs of gene expression and phenotype (Heather Cordell)

Biological/technical background

X X XX

5%

30%

60% 2%1%

Cytotoxic

Helper

~11,022 genes from 20,634 probes

# of probes per gene symbolProbes # of Genes1 10,5282 4693 234 2

exon

Gene

intron

Alternatve Splicing

Aims

• Understand the correlaton structure of expression of 1000s of genes across individuals, pedigrees

• … and their relatonship to phenotype (SBP)

Data used

• All probes; all individuals; no phenotype (Gallaugher –P3)

• WGCNA to get 14K probes; 82 individuals with SBP>75% at all 4 tme points in real data (Gadaleta)

• 25% most heritable probes (4.9K, Göring et al., 2007); 5 largest pedigrees (2,5,6,8,10) (n=276); SBP visit 1, rep 1 of simulated (Tissier –P6)

• External data (HaemAtlas, DAVID; Gallaugher & Tissier)

• No-one used genotypes or WGS (yet)

Methods

• Principal Components (Gallaugher –P3 & Tissier – P6)

• Lasso regression (Gadaleta)

• WGCNA: weighted gene co-expression network analysis (Gadaleta & Tissier)

• Meta-analysis across pedigrees (Tissier)

• Gene enrichment (Tissier)

• Linear Mixed Models (Tissier & Gallaugher)

Gallaugher

• T-, B- lymphocyte and monocyte counts vary between people, are heritable, and thus may confound genetc mapping of eQTL

• Principal Component analysis to identfy variaton in gene expression between people

• Determine if PCs associated with variables (age, sex, BP, HT, medicaton, pedigree)

Peds 5, 6, and 8 signifcantly diferent for PC2 (p<10-3)

Estmate proporton of cells for each individual using sorted cell expression data (HaemAtlas)

Cytotoxic T cell proporton

helper T cell proporton

Gross outlier from ped #8 for both Tc and Th ? Acute infecton

Conclusions:Variaton in gene expression in PBMC could be incorporated into genetc analysis to improve power

WGCNA: Weighted Gene Co-expression Network Analysis(Tissier & Gadaleta)

Tissier

5K genes

Gene clusters

Tissier

NETWORKCONSTRUCTION

iiGadaleta

GAW DATA ANALYSIS

WGCNA

SBP @75%

sam

ple

s

probes less probes

less

sam

ple

s

Gadaleta

min

penalty (sparsity)

covariance matrix (associaton)

gene matrixresponse

GENERALIDEA

Gadaleta

RESULTS /CONCLUSION

Small number of samples vs. high number of covariates

Computatonal burden of LASSO too high

Gadaleta

No signifcant gene networks detected in cases with SBP>75%

Sub-group Conclusions

• Complex correlaton structure of gene expression (Gallaugher)– Diferent for specifc pedigrees ; outlier

– Biological (rare variants) or technical (mixed cells, batch efects, acute illness)

• Only 1 gene (DUSP1) was in the answers (Tissier)– Meta-analysis across pedigrees can be more robust for fltering

than correctng for family structure

• High-dimensional data needs larger sample sizes and controls (Gadaleta)– Diferental network analysis

Identfying Genetc contributon to Gene Expression

• All used pedigree genotype and expression data

• Cis-eQTL regions genetc architecture (Cantor, 3 genes with high eQTL LODs (Göring 2007), Imputed genotype dosages)

• Allele Specifc Binding flters potental regulatory SNPs (Peralta – P4, ENCODE, Imputed genotype dosages)

• Replicaton of reported epistatc interactons (candidate SNPs (Hemani, 2014), GWAS)

• Haplotype specifc gene expression estmates (Blackburn – P2, RFSs identfed using HIPster, GWAS data)

Gene Enumeraton of Independent Signals by Sofware

FaST-LMM SOLAR MGA

# SNPs Conditoned

on

# Signifcant SNPs

Minimum P-value

# Signifcant SNPs Minimum P-value

TIMM10

0 25 2.9e-68 24 1.6e-661 23 2.2e-87 23 9.9e-862 10 5.0e-07 10 1.9e-073 2 0.03

4

1 0.04RPL14 0 73 1.5e-128 74 3.80e-124

1 29 0.001 29 0.00092 13 0.006 13 0.0033 11 0.006 4 0.014 1 0.02 2 0.035 1 0.04

LR8 0 67 3.6e-86 65 9.2e-831 39 2.2e-24 55 2.1e-22 2 47 1.1e-11

3 46 0.00014 37 0.00015 40 0.00036 23 0.00047 14 0.000028 14 0.0039 8 0.002

Independent Associatons for 3 Genes with best eQTL (LODs 37-43): alpha = 0.05

Gene Name Probe_id

Original LOD

Bprange

# SNPs conditoned

on # sig SNPs

Min p-val

TIMM10 GI_6912707-S 37 12120 01

89

1.6e-669.9e-86

RPL14 GI_16753224-S 34 14582 0

29 3.8e-124

LR8

GI_21361500-S 43 19100 012

29141

9.2e-832.1e-221.1e-11

Independent Associatons SOLAR-MGA; alpha = 5e-8

Conclusions: • Multple independent SNPs contribute to single eQTL regions• Number of independent cis eQTL associatons varies with the

level of signifcance and sofware used



• Cis-eQTL regions genetc architecture (Cantor, Siegmund, 3 genes with high eQTL LODs (Göring 2007), Imputed genotype dosages)

• Allele Specifc Binding (ASB) flters potental regulatory SNPs (Peralta - P4, ENCODE, Imputed genotype dosages)



http://www.discoveryandinnovation.com/BIOL202/notes/lecture18.html

http://www.genome.duke.edu/labs/crawford/images/dnase.gif

Peralta P4

ENCODE

http://www.discoveryandinnovation.com/BIOL202/notes/lecture18.html

http://www.genome.duke.edu/labs/crawford/images/dnase.gif

Null model10k simulated phenotypes 0.15 < h2r < 0.250.01 < afreq < 0.50

10,552 ASB SNPs used to build the covariance kernel

Significant eQTL signals obtained for the 2 ASB based covariance kernels used

Peralta P4

Peralta – P4

• ASB is a biologically meaningful flter for the prioritzaton of non-coding variaton

– can be used to prioritze non-coding variants based on potental regulatory functon

• ASB correlates with gene expression levels– cis-ASB accounts for 53-83% of the variaton in neigboring gene

expression

• Segregaton of ASB in pedigrees can act as a background noise flter

– known biases in ASB predicton can be incorporated as weights into the correlaton kernel to improve signal to noise rato





• Replicaton of reported epistatc interactons (Howey, candidate SNPs, GWAS data)– Hemani et al. Detection and replication of epistasis influencing

transcription in humans. Nature. 2014 508:249–253.


Evidence for replicaton of epistasis (Howey)

-

Howey Conclusions

• SNP-SNP interactons associated with gene expressions showed combined evidence of replicaton, p-value= 0.007

• Expression data is argued to give higher power for detectng associaton. This replicaton exercise seems to refect this

Blackburn – P2

• Aim: To estmate haplotype-specifc gene expression levels and identfy diferences

• Methods: – Phased genotypes / IBD structure using HIPster.

Identfed recombinaton free segments (RFS).

– Haplotype specifc estmates generated using EM

– Diferences between haplotypes assessed using LRT

Recombinaton free segments (RFS)Blackburn

Recombination Free Segment Lengths

Length in bases

Fre

qu

en

cy

0 50000 150000 250000

05

00

10

00

150

0

Haplotype diferences (Blackburn)

• Null simulaton adheres to uniform distributon

• 542 of 8624 tests signifcant (q<0.1)

Haploytpe specific cis−eQTL

p

Fre

qu

en

cy

0.0 0.2 0.4 0.6 0.8 1.0

01

00

30

05

00

pi0=0.725

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

PTGS2

Expression

de

nsity

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

T2DG0800492_1

Methods Adjustng for Non-Independence Due to Relatedness

• Theoretcal kinship matrix – Variance component (Peralta, SOLAR)

– Eigensimplifcaton (Blackburn & Cantor, SOLAR-MGA)

• Empirical kinship matrix – Linear mixed model (Cantor, FaST-LMM & Howey,

GEMMA)

Advantages of Pedigrees

• Permit identfcaton of recombinaton free segments (RFS, Blackburn)

• True allele specifc binding (ASB) signals will segregate (Peralta)


• Biological informaton from allele specifc binding can be used to flter potentally functonal regulatory SNPs

• Multple independent signals are observed at eQTL

• Epistasis

• Expression varies between haplotypes

• Genetc architecture of gene expression is complex (duh!)

Expression Phenotype (SBP,DBP,HT)

• With (3 papers) or without (1 paper) use of genotype data

Aims

• Pitsillides modeled gene expression as the primary outcome– Also looked for enrichment of GWAS results (GWAS for SBP or

DBP) in SNPs associated with expression

• Three papers tried to model phenotypes as the primary outcome– Radkowski (P5) tried to model future HT using expression

– Tong investgated whether using E+G did beter than using E or G alone

– Ainsworth (P1) fted causal models for relatonship between G, E and P

Expression data

• 2 papers used individual expression variables as predictors

• 1 paper used individual expression variables as outcomes– All expression variables, with SNPs located in same genetc

region used as predictors

• 1 paper used both individual expression variables and a clustered summary measure (from WGCNA)– Both as outcomes and predictors

Genetc/sample Data

• Two papers used WGS – Tong collapsed variants (common and rare) within genes, used

142 unrelated individuals from families

– Pitsillides used common SNPs, used all individuals in families

• Ainsworth used GWAS (common SNPs), all individuals in families

• Radkowski did not use genetc data– Used 340 family members without baseline HT or HT at frst visit

• All used real SBP, DBP, HT

Pedigree relatonships

• Ainsworth & Pitsillides used linear mixed models when modeling SNPs as predictors (for family data)

• Tong used unrelated individuals

• Two papers ignored family relatonships– When relatng E to P (Ainsworth & Radkowski)

– Or when doing causal modeling (Ainsworth)

Methods

• Linear mixed models: lmekin and FaST-LMM

• Unrelated individuals (Tong)– Non-parametric weighted U statstcs

– Models similarites in genotype (burden), gene expression and phenotype

• Causal modeling: structural equaton models (SEM) and Bayesian Unifed Framework (BUF) (Ainsworth)– Applied to a set of fltered variables for G, E, P

• Predictng future HT (Radkowski)– Calculated slope of regression of BP on tme-point

– Multple regression of slope on gene expression (with/without adjustment for medicaton efect)

Results• No p values reached statstcal signifcance (once multple

testng taken into account)– Probably due to low power

– Nevertheless all papers presented their “top fndings”

• Incorporaton of both G and E improved signifcance of associaton test (compared to G or E alone) (Tong)

• Adjustment for efect of medicaton gave a larger number of “signifcant” results than non-adjustment (Radkowski)

• SEM and BUF implicated very similar causal models (Ainsworth)

Table 1. Top 5 genes associated with SBP, DBP and HTNTong results

E E

Results• No p values reached statstcal signifcance (once multple

testng taken into account)– Probably due to low power

– Nevertheless all papers presented their “top fndings”

• Incorporaton of both G and E improved signifcance of associaton test (compared to G or E alone) (Tong)

• Adjustment for efect of medicaton gave a larger number of “signifcant” results than non-adjustment (Radkowski)

• SEM and BUF implicated very similar causal models (Ainsworth)

Causal models (Ainsworth)

Causal modeling (Ainsworth)

• SEM always implicated either model (b) or (d)– Model (d) was not considered by BUF, model (f) was implicated

instead

• Generally good agreement between SEM and BUF


• Top results show no replicaton of previous fndings– Diferent (Mexican-American) populaton?

– Low power?

• Lots of diferent ways to consider gene expression data– Incorporate directly into analysis of G and P (e.g. to improve

power)

– Use directly as outcome

– As predictor of (future) phenotype

– To infer causal relatonships

Group-wide Conclusions

• Documented complexity of gene expression– One-gene at-a-tme vs. multple genes

simultaneously

– Multple alleles contribute to a single eQTL region

• Power– High for genotype -> expression (inc. epistasis)

– Low for genotype/expression -> phenotype

– Pedigrees present challenges, but can be useful

gene expression group presentation at gaw 19

Data & Analytics