statistical methods for genetic association studies paulj/assoc_study_stats.ppt

Post on 28-Mar-2015

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Statistical methods forgenetic association studies

http://www.stats.gla.ac.uk/~paulj/assoc_study_stats.ppt

A tutorial on statistical methods for population association studies

David Balding

Nature Reviews Genetics (2006) 7:781-791

Environment

G×E interaction

Genetics

Health outcome

or

?

Recombination

A X

a x

Gametophytes(gamete-producing cells)

Gametes

a X

A x

Recombination

B

B

b

b

X/x: unobserved causative mutation

A/a: distant marker

B/b: linked marker

Approaches to finding disease genes

• Population-based association study– “unrelated” subjects

• Family-based association study– nuclear families

• Admixture mapping– recently admixed population

• Linkage mapping– large pedigrees

Darvasi & Shifman (2005) Nature Genetics

Types of population association study

• Candidate causative polymorphism– SNP (single nucleotide polymorphism), deletion, duplication

• Candidate causative gene (5-50 marker SNPs)– evidence from linkage study or function

• Candidate causative region (100s of marker SNPs)– evidence from linkage study

• Genome-wide (>300,000 marker SNPs)– no prior evidence required

Common disease common variant (CDCV) hypothesis

• Assuming mating is random and the population is large, HWE genotype frequencies will apply

• Allele frequencies:P(X) = pP(x) = q

• HWE genotype frequencies:P(XX) = p2

P(Xx) = 2pqP(xx) = q2

• Useful data quality check:– chi-squared or exact test– log QQ plot

• But can discard causative mutations

p q

p p2 pq

q pq q2

Preliminary analysis: data quality

Log QQ plot

Preliminary analysis: dealing with missing data

• Imputation– various methods: maximum likelihood; probalistic;

‘hot-deck’; regression modelling– test for independence of ‘missingness’ and case-

control status

Choice of inheritance model

Dominant vs additive inheritance

0%

50%

100%

0 1 2

Number trait alleles inherited

Tra

it v

alu

e

Dominant

Additive

Dominant vs additive inheritance

0%

50%

100%

0 1 2

Number trait alleles inherited

Tra

it v

alu

e

Dominant

Additive

Choice of inheritance model

Dominant vs additive inheritance

0%

50%

100%

0 1 2

Number trait alleles inherited

Tra

it v

alu

e

Dominant

Additive

Choice of inheritance model

Tests of association: single SNP

• Case-control– Treat genotype as factor with 3 levels, perform 2x3 goodness-of-

fit test. Loses power if effect is additive– Count alleles rather than individuals, perform 2x2 goodness-of-fit

test. Out of favour because• sensitive to deviation from HWE• risk estimates not interpretable

Major allele homozygote (0)

Heterozygote (1) Minor allele homozygote (2)

Case

Control

Tests of association: single SNP

• Case-control– Cochran-Armitage test

• loses power if additivity assumption wrong

Cochran-Armitage test

Tests of association: single SNP

• Case-control– Armitage or goodness-of-fit? Depends on:

• Prior knowledge of inheritance (additive, dominant, etc)

• Genotype frequencies, e.g. use Armitage test when minor allele is rare, goodness-of-fit test otherwise

Tests of association: single SNP

• Case-control– Logistic regression

• Easily incorporates inheritance model (additive, dominant, etc)

• But assumes phenotype is outcome variable not genotype, so easier to justify for prospective studies

Tests of association: single SNP

• Continuous outcome– Linear regression

• Ordered categorical outcomes– Multinomial regression

Problems: population stratification

Cases

Correcting for population stratification

• Genomic control– Genotype null SNPs and use to calculate background

inflation in test statistic due to population stratification– Limited to simple single-SNP analyses– Can over- or under-correct

• Other approaches using null SNPs– Regression, principal components analysis, model

underlying demography

Problems: multiple testing

• Bonferroni correction– conservative when SNPs are linked

• Permutation– computationally demanding

• False discovery rate• Bayesian approaches

• Advantages– Many SNPs may be linked to a gene, but individually may not

have a significant effect– Interactions between SNPs can be modelled– ‘Tag’ SNPs can reduce testing of redundant linked SNPs

• Methods– Linear regression, logistic regression– Armitage test

• Haplotype-based methods– Natural interpretation– But power reduced due to multiple alleles

Tests of association: multiple SNPs

Haplotypes

Nature Genetics  37, 915 - 916 (2005)

Inferring haplotype phase

Inferring haplotype phase

?

Inferring haplotype phase

Inferring haplotype phase

Methods & software• PHASE, FASTPHASE• EH+• FBAT• HAPLOTYPER• EM-DECODER• PLEM• HAP• HAPLORE• Haplo.stat • SNPEM• PEDPHASE• SNPHAP• TDTHAP

Inferring haplotype phase

• Phase cases and controls separately or pooled?– Separating can give inflated type I error– Pooling can reduce power

Inferring haplotype phase

top related