biostatistics-lecture 19 linkage disequilibrium and snp detection

Biostatistics-Lecture 19Linkage Disequilibrium and SNP detectionRuibin XiPeking UniversitySchool of Mathematical SciencesHaplotype Freqeuncies

Linkage Equilibrium

Linkage Disequilibrium

Disequilibrium Coefficient DAB

DAB is hard to interpretSign is arbitrary A common convention is to set A, B to be the common allele and a, b to be the rare alleleRange depends on allele FrequenciesHard to compare between markersr2 (also called 2)Ranges between 0 and 11 when the two markers provide identical information0 when they are in perfect equilibrium

Raw r2 data from chr22

Comparing Populations

CEPH: Utah residents with ancestry from northern and western Europe (CEU)Use LD for SNP imputation and detection

fastPhaseUse LD for SNP imputation and detection


Model for haplotypesObserved n haplotypesEach with M markersbij = 0, 1Assume each haplotye originates from one of K clusterszi: unknown cluster of origin of bi

Since clusters of origin are unknown

Local clustering of haplotypeAssume zi = (zi1,, ziM) forms a Markov chain on {1,,K}zim denote the cluster origin for bimInitial probabilities

Transition probabilities

Conditional on the cluster of origin


Local clustering of genotype dataWe have genotype datagim: genotype at marker m of individual iTake values 0, 1, 2Initial probabilities ( unordered cluster of origins)

Transition probabilities

Local clustering of genotype dataGenotype probabilities conditional on cluster of origins

Joint likelihood

Algorithms for genotype imputationfastPhase




MaCHAlgorithms for genotype imputationfastPhase





Picture taken from IMPUTE v2SNP detection with LD informationMaCH: (G: genotye, S: cluster)

SNP detection with LD informationFor sequencing data G is not observedCoverage of base A, B are observed, we have the HMM

SNP detection with LD information

Nielsen et al. 2011 Nature Review Genetics

