biostatistics-lecture 19 linkage disequilibrium and snp detection
DESCRIPTION
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection. Ruibin Xi Peking University School of Mathematical Sciences. Haplotype Freqeuncies. Linkage Equilibrium. Linkage Disequilibrium. Disequilibrium Coefficient D AB. D AB is hard to interpret. Sign is arbitrary … - PowerPoint PPT PresentationTRANSCRIPT
Biostatistics-Lecture 19Linkage Disequilibrium and SNP
detection
Ruibin XiPeking University
School of Mathematical Sciences
Haplotype Freqeuncies
Linkage Equilibrium
Linkage Disequilibrium
Disequilibrium Coefficient DAB
DAB is hard to interpret
• Sign is arbitrary …– A common convention is to set A, B to be the
common allele and a, b to be the rare allele• Range depends on allele Frequencies– Hard to compare between markers
r2 (also called Δ2)
• Ranges between 0 and 1– 1 when the two markers provide identical
information– 0 when they are in perfect equilibrium
Raw r2 data from chr22
Comparing Populations
CEPH: Utah residents with ancestry from northern and western Europe (CEU)
Use LD for SNP imputation and detection
fastPhase
Use LD for SNP imputation and detection
fastPhase
Model for haplotypes
• Observed n haplotypes– Each with M markers– bij = 0, 1
• Assume each haplotye originates from one of K clusters– zi: unknown cluster of origin of bi
– Since clusters of origin are unknown
Local clustering of haplotype
• Assume zi = (zi1,…, ziM) forms a Markov chain on {1,…,K}– zim denote the cluster origin for bim
– Initial probabilities
– Transition probabilities
– Conditional on the cluster of origin
– Marginal
Local clustering of genotype data
• We have genotype data• gim: genotype at marker m of individual i– Take values 0, 1, 2
• Initial probabilities ( unordered cluster of origins)
• Transition probabilities
Local clustering of genotype data
• Genotype probabilities conditional on cluster of origins
• Joint likelihood
Algorithms for genotype imputation
• fastPhase
• BEAGLE
• IMPUTE
• PLINK
• MaCH
Algorithms for genotype imputation
• fastPhase
• BEAGLE
• IMPUTE
• PLINK
• MaCHPicture taken from IMPUTE v2
SNP detection with LD information
• MaCH: (G: genotye, S: cluster)
SNP detection with LD information
• For sequencing data G is not observed• Coverage of base A, B are observed, we have
the HMM
SNP detection with LD information
Nielsen et al. 2011 Nature Review Genetics