genome-wide association studies caitlin collins, thibaut jombart mrc centre for outbreak analysis...
TRANSCRIPT
![Page 1: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/1.jpg)
Genome-Wide Association Studies
Caitlin Collins, Thibaut Jombart
MRC Centre for Outbreak Analysis and ModellingImperial College London
Genetic data analysis using 30-10-2014
![Page 2: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/2.jpg)
2
Outline• Introduction to GWAS
• Study designo GWAS designo Issues and considerations in GWAS
• Testing for associationo Univariate methods o Multivariate methods
• Penalized regression methods• Factorial methods
![Page 3: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/3.jpg)
3
Genomics & GWAS
![Page 4: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/4.jpg)
Genomics & GWAS 4
The genomics revolution• Sequencing technology
o 1977 – Sangero 1995 – 1st bacterial genomes
• < 10,000 bases per day per machine
o 2003 – 1st human genome• > 10,000,000,000,000
bases per day per machine
• GWAS publicationso 2005 – 1st GWAS
o Age-related macular degeneration
o 2014 – 1,991 publicationso 14,342 associations
![Page 5: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/5.jpg)
Genomics & GWAS 5
A few GWAS discoveries…
![Page 6: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/6.jpg)
Genomics & GWAS 6
• Genome Wide Association Studyo Looking for SNPs… associated with a phenotype.
• Purpose:o Explain
• Understanding• Mechanisms• Therapeutics
o Predict• Intervention• Prevention• Understanding not required
So what is GWAS?
![Page 7: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/7.jpg)
Genomics & GWAS 7
• Definitiono Any relationship
between two measured quantities that renders them statistically dependent.
• Heritabilityo The proportion of
variance explained by genetics
o P = G + E + G*E• Heritability > 0
Associationp
SNPs
n individuals
Cases
Controls
![Page 8: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/8.jpg)
Genomics & GWAS 8
![Page 9: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/9.jpg)
Genomics & GWAS 9
Why?• Environment, Gene-Environment interactions• Complex traits, small effects, rare variants• Gene expression levels• GWAS methodology?
![Page 10: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/10.jpg)
10
Study Design
![Page 11: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/11.jpg)
Study Design 11
• Case-Controlo Well-defined “case”o Known heritability
• Variationso Quantitative phenotypic data
• Eg. Height, biomarker concentrationso Explicit models
• Eg. Dominant or recessive
GWAS design
![Page 12: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/12.jpg)
Study Design 12
• Data qualityo 1% rule
• Controlling for confoundingo Sex, age, health profileo Correlation with other variables
• Population stratification*
• Linkage disequilibrium*
Issues & Considerations
*
*
![Page 13: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/13.jpg)
Study Design 13
• Problemo Violates assumed population homogeneity, independent
observations• Confounding, spurious associations
o Case population more likely to be related than Control population• Over-estimation of significance of associations
Population stratification• Definition
o “Population stratification” = population structure
o Systematic difference in allele frequencies btw. sub-populations… • … possibly due to different
ancestry
![Page 14: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/14.jpg)
Study Design 14
• Solutionso Visualise
• Phylogenetics• PCA
o Correct• Genomic Control• Regression on
Principal Components of PCA
Population stratification II
![Page 15: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/15.jpg)
Study Design 15
• Definitiono Alleles at separate loci are
NOT independent of each other
• Problem?o Too much LD is a problem
• noise >> signal
o Some (predictable) LD can be beneficial• enables use of “marker”
SNPs
Linkage disequilibrium (LD)
![Page 16: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/16.jpg)
16
Testing for Association
![Page 17: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/17.jpg)
Testing for Association 17
• Standard GWASo Univariate methods
• Incorporating interactionso Multivariate methods
• Penalized regression methods (LASSO)• Factorial methods (DAPC-based FS)
Methods for association testing
![Page 18: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/18.jpg)
Testing for Association 18
• Variationso Testing
• Fisher’s exact test, Cochran-Armitage trend test, Chi-squared test, ANOVA
• Gold Standard—Fischer’s exact testo Correcting
• Bonferroni• Gold Standard—FDR
Univariate methodsp
SNPs
n individuals
Cases
Controls
• Approacho Individual test statisticso Correction for multiple
testing
![Page 19: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/19.jpg)
Testing for Association 19
Strengths Weaknesses• Straightforward
• Computationally fast
• Conservative
• Easy to interpret
• Multivariate system, univariate framework
• Effect size of individual SNPs may be too small
• Marginal effects of individual SNPs ≠ combined effects
Univariate – Strengths &
weaknesses
![Page 20: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/20.jpg)
Testing for Association 20
What about interactions?
![Page 21: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/21.jpg)
Testing for Association 21
• Epistasiso “Deviation from linearity under a general linear
model”
vs.
• With p predictors, there are:• k-way interactions
• p = 10,000,000 5 x 1011 o That’s 500 BILLION possible pair-wise
interactions!
• Need some way to limit the number of pairwise interactions considered…
Interactions
![Page 22: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/22.jpg)
Testing for Association 22
Multivariate methodsLASSO penalized regression
Ridge regression
Neural NetworksPenalized Regression
Bayesian Approaches
Factorial Methods
Bayesian Epistasis Association Mapping
Logic Trees
Modified Logic Regression-Gene
Expression ProgrammingGenetic Programming
for Association Studies
Logic feature selectionMonte Carlo
Logic RegressionLogic regression
Supervised-PCA
Sparse-PCA
DAPC-based FS (snpzip)
Bayesian partitioning
The elastic net
Bayesian Logistic Regression with
Stochastic Search Variable Selection
Odds-ratio-based MDR
Multi-factor dimensionality reduction
method
Genetic programming optimized neural
networks
Parametric decreasing method
Restricted partitioning method
Combinatorial partitioning method
Random forests
Set association approach
Non-parametric Methods
![Page 23: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/23.jpg)
Testing for Association 23
• Penalized regression methodso LASSO penalized regression
• Factorial methodso DAPC-based
feature selection
Multivariate methods
![Page 24: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/24.jpg)
Testing for Association 24
• Approacho Regression models multivariate associationo Shrinkage estimation feature selection
• Variationso LASSO, Ridge, Elastic net, Logic regression
• Gold Standard—LASSO penalized regression
Penalized regression methods
![Page 25: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/25.jpg)
Testing for Association 25
• Regressiono Generalized linear model (“glm”)
• Penalizationo L1 norm
o Coefficients 0
o Feature selection!
LASSO penalized regression
![Page 26: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/26.jpg)
Testing for Association 26
Strengths Weaknesses• Stability
• Interpretability
• Likely to accurately select the most influential predictors
• Sparsity
• Multicollinearity
• Not designed for high-p
• Computationally intensive
• Calibration of penalty parameterso User-defined variability
• Sparsity
LASSO – Strengths & weaknesses
![Page 27: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/27.jpg)
Testing for Association 27
• Approacho Place all variables (SNPs) in a multivariate spaceo Identify discriminant axis best separationo Select variables with the highest contributions to that
axis
• Variationso Supervised-PCA, Sparse-PCA, DA, DAPC-based FSo Our focus—DAPC with feature selection (snpzip)
Factorial methods
![Page 28: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/28.jpg)
Testing for Association 28Discriminant axis
Den
sit
y o
f in
div
idu
als
Discriminant axis
Den
sit
y o
f in
div
idu
als
DAPC-based feature selection
a
b
cd
e
AllelesIndividuals
a b c d e0
0.1
0.2
0.3
0.4
0.5
Contribution to Discriminant Axis
Healthy (“controls”)
Diseased (“cases”)
Discriminant Axis
Discriminant Axis
![Page 29: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/29.jpg)
a b c d e0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Contribution to Discriminant Axis
?
Discriminant axis
Den
sit
y o
f in
div
idu
als
DAPC-based feature selection• Where should we draw the line?
o Hierarchical clustering
![Page 30: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/30.jpg)
Testing for Association 30
Hierarchical clustering (FS)
a b c d e0
0.050.1
0.150.2
0.250.3
0.350.4
0.45
Contribution to Discriminant Axis
Hooray!
![Page 31: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/31.jpg)
Testing for Association 31
Strengths Weaknesses
• More likely to catch all relevant SNPs (signal)
• Computationally quick
• Good exploratory tool
• Redundancy > sparsity
• Sensitive to n.pca
• N.snps.selected varies
• No “p-value”
• Redundancy > sparsity
DAPC – Strengths & weaknesses
![Page 32: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/32.jpg)
32
Conclusions• Study design
o GWAS designo Issues and considerations in GWAS
• Testing for associationo Univariate methods o Multivariate methods
• Penalized regression methods• Factorial methods
![Page 33: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/33.jpg)
33
Thanks for listening!
![Page 34: Genome-Wide Association Studies Caitlin Collins, Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis](https://reader034.vdocuments.net/reader034/viewer/2022051214/56649d6e5503460f94a4f7b2/html5/thumbnails/34.jpg)
34
Questions?