joint linkage and linkage disequilibrium mapping
Post on 01-Jan-2016
60 views
Embed Size (px)
DESCRIPTION
Joint Linkage and Linkage Disequilibrium Mapping. Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium map in human populations. Statistical Applications in Genetics and Molecular Biology 8 (1): Article 18. Genetic Designs for Mapping. - PowerPoint PPT PresentationTRANSCRIPT
Joint Linkage and Linkage Disequilibrium Mapping
Joint Linkage and Linkage Disequilibrium MappingKey ReferenceLi, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium map in human populations. Statistical Applications in Genetics and Molecular Biology 8 (1): Article 18. Genetic Designs for MappingControlled crosses Backcross, F2, full-sib family, (linkage)Unrelated (random) individuals from a natural population (linkage disequilibrium)Cases and controls from a natural populationUnrelated (random) families from a natural population (linkage and LD)Related (non-random) families from a natural population (linkage, LD and identical-by-descent)
Family designs are increasingly used for genetic studies because of much information contained.Natural PopulationConsider two SNPs 1 (with two allele A and a) and 2 (with two alleles B and b)The two SNPs are linked with recom. frac. r The two SNPs form four haplotypes, AB, Ab, aB, and abProb(A) = p, Prob(B) = q, linkage disequilibrium = D. We have haplotype frequencies as
Diagrammatic Presentation
Family Design: family number and sizeMating frequencies of families and ospring genotype frequencies per family
HWE assumedCan you figure out where this assumption is needed?Segregation of double heterozygoteOverall haplotype frequencies produced by this parent are calculated as 1/21 for AB or ab and 1/22 for Ab or aB
A Joint ProbabilityMother genotypes (Mm)Father genotypes (Mf )Ospring genotypes (Mo)
P(Mm,Mf,Mo)= P(Mm,Mf)P(Mo|Mm.Mf)= P(Mm)P(Mf)P(Mo|Mm,Mf)A joint two-stage log-likelihoodLet unknown parametersUpper-stage LikelihoodEM algorithm for E step
M stepLower-stage LikelihoodEM algorithm for rE step - calculate the probability with which a considered haplotype produced by a double heterozygote parent is the recombinant type using
E step (contd)Calculate the probability with which a double heterozygote ospring carries recombinant haplotypes byM stepwhere m equals the sum of the following terms:Hypothesis testsLinkage and Linkage disequilibriumH0: r = 0 and D = 0H1: At least one equality does not hold
LR = -2(log L0 log L1)
Critical threshold x2 (df=2)
Hypothesis testsSex-specific difference in population structure
Hypothesis testSex-specific difference in the recombination fractionSimulation
PowerPower
ConclusionsThe model can jointly estimate the linkage and linkage disequilibrium between two markers- LD from parents- Linkage from offspringThe model can draw a LD map to study the evolution of populations and high-resolution mapping of traits
Table 1: Data structure of two markers typed for a panel of full-sib families, each composedof the mother, father and offspring, sampled at random from a natural population.
Family Offspring
Group Mother Father Number AABB AABb AAbb AaBB AaBb Aabb aaBB aaBb aabb
1 AABB AABB n11 n111
2 AABB AABb n12 n121 n122
3 AABB AAbb n13 n132
4 AABB AaBB n14 n141 n144
5 AABB AaBb n15 n151 n152 n154 n155
6 AABB Aabb n16 n162 n165
7 AABB aaBB n17 n174
8 AABB aaBb n18 n184 n185
9 AABB aabb n19 n195
. . .
50 AaBb AaBb n55 n551 n552 n553 n554 n555 n556 n557 n558 n559
. . .
81 aabb aabb n99 n999
20
respectively. Both the diplotypes will produce haplotypes AB, Ab, aB, and ab, with frequen-
cies defined as follows:
Parent AaBb Haplotype
Diplotype Frequency AB Ab aB ab
AB|ab 12(1 r) 12r 12r 12(1 r)Ab|Ab 1 12r 12(1 r) 12(1 r) 12r
Let
1 = (1 r) + (1 )r2 = r + (1 )(1 r). (6)
Thus, overall haplotype frequencies produced by this parent are calculated as 121 for AB or
ab and 122 for Ab or aB.
Based on the information about genetic segregation in each family, the lower-level likeli-
hood is constructed as
logL(r|Mm,Mf ,Mo,) = C
+n111 log(1) + n121 log(1
2) + n122 log(
1
2) + n131 log(1) + n141 log(
1
2) + n142 log(
1
2)
+(n151 + n155) log(1
21) + (n152 + n154) log log(
1
22)
+
+(n551 + n559) log(1
421) + (n552 + n554 + n556 + n558) log(
1
212) + (n553 + n557) log(
1
422)
+n555 log
[1
2(21 +
22)
]+
+n999 log(1). (7)
Estimation: By maximizing the upper-level likelihood (3), we derive a closed form for
the EM algorithm to estimate haplotype frequencies. This procedure is described as follows:
7
respectively. Both the diplotypes will produce haplotypes AB, Ab, aB, and ab, with frequen-
cies defined as follows:
Parent AaBb Haplotype
Diplotype Frequency AB Ab aB ab
AB|ab 12(1 r) 12r 12r 12(1 r)Ab|Ab 1 12r 12(1 r) 12(1 r) 12r
Let
1 = (1 r) + (1 )r2 = r + (1 )(1 r). (6)
Thus, overall haplotype frequencies produced by this parent are calculated as 121 for AB or
ab and 122 for Ab or aB.
Based on the information about genetic segregation in each family, the lower-level likeli-
hood is constructed as
logL(r|Mm,Mf ,Mo,) = C
+n111 log(1) + n121 log(1
2) + n122 log(
1
2) + n131 log(1) + n141 log(
1
2) + n142 log(
1
2)
+(n151 + n155) log(1
21) + (n152 + n154) log log(
1
22)
+
+(n551 + n559) log(1
421) + (n552 + n554 + n556 + n558) log(
1
212) + (n553 + n557) log(
1
422)
+n555 log
[1
2(21 +
22)
]+
+n999 log(1). (7)
Estimation: By maximizing the upper-level likelihood (3), we derive a closed form for
the EM algorithm to estimate haplotype frequencies. This procedure is described as follows:
7
in allele and haplotype frequencies). Thus, we have the following relationships:
p11 = pq +Dp10 = p(1 q)Dp01 = (1 p)q Dp00 = (1 p)(1 q) +D,
(1)
where D is the coefficient of linkage disequilibrium between the two SNPs.
The two SNPs produce nine joint genotypes, AABB (coded as 1), AABb (coded as 2), ...,
aabb (coded as 9), which are observed. Thus, each subject will bear one of these genotypes,
and the parents in each family will be one of 9 9 = 81 possible genotype by genotypecombinations. Depending on the parental genotype combination, all offspring in a family
will have a certain number of marker genotypes. At meiosis, a parental diplotypes will be
broken down to form recombinant and nonrecombinant haplotypes for the next generation.
The relative proportion of these two types of haplotypes is determined by the recombination
fraction denoted as r. Let nij denote the number of families from the combination between
mother genotype i and father genotype j for the two SNPs (i, j = 1, 2, ..., 9) and nijk denote
the number of offspring derived from parental genotype combination ij (k = 1, 2, ..., 9). Table
1 gives the structure of genotypic data collected from n =9
i=1
9j=1 nij random families in
which the distribution of genotypes in the mothers, fathers, and their offspring are shown.
Likelihood: For observed mother genotypes (Mm), father genotypes (Mf ), and offspring
genotypes (Mo), a joint probability is expressed as
P (Mm,Mf ,Mo) = P (Mm,Mf )P (Mo|Mm,Mf )
= P (Mm)P (Mf )P (Mo|Mm,Mf ),
where it is assumed that there is random mating between parents for the two markers. Thus,
a joint log-likelihood for the parameters, = (p11, p00, p10, p01, r) = (, r), can be factorized
into two parts:
logL(|Mm,Mf ,Mo) = logL(|Mm,Mf ) + logL(r|Mm,Mf ,Mo,). (2)
5
in allele and haplotype frequencies). Thus, we have the following relationships:
p11 = pq +Dp10 = p(1 q)Dp01 = (1 p)q Dp00 = (1 p)(1 q) +D,
(1)
where D is the coefficient of linkage disequilibrium between the two SNPs.
The two SNPs produce nine joint genotypes, AABB (coded as 1), AABb (coded as 2), ...,
aabb (coded as 9), which are observed. Thus, each subject will bear one of these genotypes,
and the parents in each family will be one of 9 9 = 81 possible genotype by genotypecombinations. Depending on the parental genotype combination, all offspring in a family
will have a certain number of marker genotypes. At meiosis, a parental diplotypes will be
broken down to form recombinant and nonrecombinant haplotypes for the next generation.
The relative proportion of these two types of haplotypes is determined by the recombination
fraction denoted as r. Let nij denote the number of families from the combination between
mother genotype i and father genotype j for the two SNPs (i, j = 1, 2, ..., 9) and nijk denote
the number of offspring derived from parental genotype combination ij (k = 1, 2, ..., 9). Table
1 gives the structure of genotypic data collected from n =9
i=1
9j=1 nij random families in
which the distribution of genotypes in the mothers, fathers, and their offspring are shown.
Likelihood: For observed mother genotypes (Mm), father genotypes (Mf ), and offspring
genotypes (Mo), a joint probability is expressed as
P (Mm,Mf ,Mo) = P (Mm,Mf )P (Mo|Mm,Mf )
= P (Mm)P (Mf )P (Mo|Mm,Mf ),
where it is assumed that there is random mating between parents for the two markers. Thus,
a joint log-likelihood for the parameters, = (p11, p00, p10, p01, r) = (, r), can be factorized
into two parts:
logL(|Mm,Mf ,Mo) = logL(|Mm,Mf