Transcript
Page 1: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Biostatistics-Lecture 19Linkage Disequilibrium and SNP

detection

Ruibin XiPeking University

School of Mathematical Sciences

Page 2: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Haplotype Freqeuncies

Page 3: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Linkage Equilibrium

Page 4: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Linkage Disequilibrium

Page 5: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Disequilibrium Coefficient DAB

Page 6: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

DAB is hard to interpret

• Sign is arbitrary …– A common convention is to set A, B to be the

common allele and a, b to be the rare allele• Range depends on allele Frequencies– Hard to compare between markers

Page 7: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

r2 (also called Δ2)

• Ranges between 0 and 1– 1 when the two markers provide identical

information– 0 when they are in perfect equilibrium

Page 8: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Raw r2 data from chr22

Page 9: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Comparing Populations

CEPH: Utah residents with ancestry from northern and western Europe (CEU)

Page 10: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Use LD for SNP imputation and detection

fastPhase

Page 11: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Use LD for SNP imputation and detection

fastPhase

Page 12: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Model for haplotypes

• Observed n haplotypes– Each with M markers– bij = 0, 1

• Assume each haplotye originates from one of K clusters– zi: unknown cluster of origin of bi

– Since clusters of origin are unknown

Page 13: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Local clustering of haplotype

• Assume zi = (zi1,…, ziM) forms a Markov chain on {1,…,K}– zim denote the cluster origin for bim

– Initial probabilities

– Transition probabilities

– Conditional on the cluster of origin

– Marginal

Page 14: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Local clustering of genotype data

• We have genotype data• gim: genotype at marker m of individual i– Take values 0, 1, 2

• Initial probabilities ( unordered cluster of origins)

• Transition probabilities

Page 15: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Local clustering of genotype data

• Genotype probabilities conditional on cluster of origins

• Joint likelihood

Page 16: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Algorithms for genotype imputation

• fastPhase

• BEAGLE

• IMPUTE

• PLINK

• MaCH

Page 17: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

Algorithms for genotype imputation

• fastPhase

• BEAGLE

• IMPUTE

• PLINK

• MaCHPicture taken from IMPUTE v2

Page 18: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

SNP detection with LD information

• MaCH: (G: genotye, S: cluster)

Page 19: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

SNP detection with LD information

• For sequencing data G is not observed• Coverage of base A, B are observed, we have

the HMM

Page 20: Biostatistics-Lecture  19 Linkage Disequilibrium and SNP detection

SNP detection with LD information

Nielsen et al. 2011 Nature Review Genetics


Top Related