Download - Linkage Disequilibrium

Linkage DisequilibriumLinkage Disequilibrium

Granovsky Ilana and Berliner Yaniv

Computational Genetics

19.06.03

What is Linkage Disequilibrium?What is Linkage Disequilibrium?• When the occurrence of pairs of specific

alleles at different loci on the same haplotype is not independent, the deviation form independence is termed linkage disequilibrium

• In general, linkage disequilibrium is usually seen as an association between one specific allele at one locus and another specific allele at a second locus

LinkageLinkage Disequilibrium Coefficient Disequilibrium Coefficient DefinitionsDefinitions

Marker 2Marker1

Allele1(probability = p2)

Allele2(probability = 1-p2)

Allele1(probability = p1)

X1p1*p2+D11

X2p1*(1-p2)-D11

Allele2(probability = 1-p1)

X3(1-p1)*p2-D11

X4(1-p1)*(1-p2)+D11

•Xi-number of observations in cell i (X1+X2+X3+X4)=n

•D11-coefficient of gametic linkage disequilibrium between allele 1 at locus 1 and allele 1 at locus 2

D11=E[X1X4-X2X3|n=1]

Population-based sampling and the Population-based sampling and the EH programEH program

• We wish to test the absence of disequilibrium between allele A at locus 1 and allele B at locus 2 (DAB=0)

• The sample of individuals we have consist of genotyping data with no possibility to fully distinguish all of the haplotypes in each individual

Table of all possible two-locus Table of all possible two-locus genotypesgenotypes

Locus2Locus 2

AA Aa aa

BB k1 k2 k3Bb k4 k5 k6bb k7 k8 k9

In cell 5 there can be either of two phases, AB/ab or Ab/aB

Analysis of likelihoodAnalysis of likelihood

• We maximize the log likelihood of the data observed:

• For cell 1: p1=[P(A B)] • For cell 4: p4=2P(A B)P(A b)• For cell 5: p5=P(A B/a b)+P(A b/a B) =

=2P(A B)P(a b)+2P(A b)P(a B)

1 2

1

ln[ ( )] ln( )a a

i ii

L data pk

2

2

Table of probabilities in each cellTable of probabilities in each cell

Locus 1Locus 2

AA Aa aa

BB p(A B) 2p(A B)p(a B) P(a B)

Bb 2p(A B)p(A b)

2P(A B)P(a b)++2P(A b)P(a B)

2p(a B)p(a b)

bb P(A b) 2p(A b)p(a b) P(a b)

2

2

2

2

Analysis of likelihoodAnalysis of likelihood

• We maximize the likelihood above over the possible haplotype frequencies (p(A), p(B) and DAB.

• This likelihood is then compared with the maximum likelihood when DAB is set equal to 0 (absence of linkage disequilibrium)

ExampleExample Locus 1

Locus 2AA Aa aa

BB K1=10 K2 = 10 K3=3

Bb K4=15 K5=50 K6=13

bb K7=5 K8=13 K9=10

A aB 45 29b 38 46

A a

B 0.28 0.18b 0.24 0.29

*When censoring k5 all the haplotypes can be uniquely determined

Example cont.Example cont.

• P(A) = 0.28+0.24 = 0.525• P(B) = 0.28+0.18 = 0.468• DAB = p(A B) –p(A)p(B) = 0.28 – 0.525*0.468

= 0.0387* Biased example due to the elimination of the

50 observations in k5.

EH program input file formatEH program input file format

• EH = estimated haplotype.– Input file EH.datLine 1: Number of alleles at each of the two lociLine 2: k1 k4 k7Line 3: k2 k5 k8Line 4: k3 k6 k9

EH program output fileEH program output file• Output – Estimates of Gene Frequencies (including

k5)

AlleleLocus

1 2

1 0.515 0.484

2 0.480 0.519

# of typed Individuals: 129

EH program output fileEH program output fileAllele at locus

1Allele at locus

2Haplotype frequency

Independent w/association

1 1 0.248 0.328

1 2 0.268 0.188

2 1 0.232 0.153

2 2 0.252 0.332

Chi square testChi square test

df Ln(L) Chi-square

H0: No association 2 -252.68 0.00H1: Allelic association allowed

3 -248.23 8.89

•The difference between the 2 chi-square is 8.89

• The P-value associated with chi-square (with 1 df) is 0.002873

• It is clear the k5 contributes siginificant information

Haplotype frequenciesWithout k5 With k5

Haplotype Independent

associate Independent

associate

A B 0.246 0.284 0.247 0.327

A b 0.279 0.24 0.267 0.187

a B 0.222 0.183 0.232 0.152

a B 0.252 0.291 0.251 0.331

p(A) 0.525 0.515

p(B) 0.468 0.48

Dab 0.038 0.079

SummarySummary

Multiallelic genotype information in EH Multiallelic genotype information in EH programprogram

Locus 2Locus 1 1/1 1/2 2/2 1/3 2/3 3/3

1/1 a1 b1 c1 d1 e1 f11/2 a2 b2 c2 d2 e2 f22/2 a3 b3 c3 d3 e3 f31/3 a4 b4 c4 d4 e4 f42/3 a5 b5 c5 d5 e5 f53/3 a6 b6 c6 d6 e6 f6

Line 1: Number of alleles at each locus

Subsequent lines:

Multilocus genotype dataMultilocus genotype data

Locus 3Locus 1 Locus 2 1/1 1/2 2/2

1/1 1/1 a1 b1 c11/2 a2 b2 c22/2 a3 b3 c3

1/2 1/1 a4 b4 c41/2 a5 b5 c52/2 a6 b6 c6

2/2 1/1 a7 b7 c71/2 a8 b8 c82/2 a9 b9 c9

Ex. 23Ex. 23• Full data Solution file: • Censored data solution file.

Censored data

1/1 haplotype data

Locus 2Locus 1

1/1 1/2 1/3 1/4 2/2 2/3 2/4 3/3 3/4 4/4

1/1 10 5 6 4 1 2 3 1 2 0

1/2 6 3 3 3 1 2 1 1 2 1

2/2 12 9 8 11 3 2 5 1 0 3

1/3 1 2 2 1 1 1 1 0 4 2

2/3 0 2 2 8 2 2 9 3 6 8

3/3 8 6 4 10 3 3 8 5 9 13

Haplotypes from censored genotype dataHaplotypes from censored genotype dataAllele at locus 2

Allele at locus 1 1 2 3 41 42 14 13 122 58 25 16 313 37 26 29 63

Allele at locus 2

Allele at locus 1

1 2 3 4

1 0.11 0.038 0.035 0.0322 0.158 0.068 0.044 0.0853 0.10 0.07 0.079 0.172

!!!!!!תודה רבהתודה רבה

Download - Linkage Disequilibrium

Top Related