mapping populations
DESCRIPTION
Mapping populations. Controlled crosses between two parents two alleles/locus, gene frequencies = 0.5 gametic phase disequilibrium is due to linkage, not other causes Examples Backcross (BC 1 or BC 2 ) F 2 or F 2:3 Recombinant inbred lines (RIL) Doubled haploid (DH). - PowerPoint PPT PresentationTRANSCRIPT
Mapping populations
• Controlled crosses between two parents– two alleles/locus, gene frequencies = 0.5– gametic phase disequilibrium is due to linkage, not other
causesExamples– Backcross (BC1 or BC2)– F2 or F2:3
– Recombinant inbred lines (RIL)– Doubled haploid (DH)
Recombinant Inbred Lines (RILs)
Generation AA Aa aaF1 0 100% 0F2 25% 50% 25%F3 37.5% 25% 37.5%F4 43.75% 12.5% 43.75%F5 46.875% 6.25% 46.875%F6 48.4375% 3.125% 48.4375%
F10 49.9% 0.2% 49.9%
A(1/2)
a(1/2)
A(1/2)
AA(1/4)
Aa(1/4)
a(1/2)
aA(1/4)
Aa(1/4)
♀♂
expectedfrequency
f112--- 1 r– =
f212---r=
f312---r=
f412--- 1 r– =
r = 0 r = 0.5
0.5 0.25
0.0 0.25
0.0 0.25
0.5 0.25
Recombinant Inbred Lines (RILs)
RR R R
𝑟=𝑘𝑁=
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑐𝑜𝑚𝑏𝑖𝑛𝑎𝑛𝑡𝑠𝑇𝑜𝑡𝑎𝑙 =
420 =0.2
RILs
Doubled Haploids
expectedfrequency
f112--- 1 r– =
f212---r=
f312---r=
f412--- 1 r– =
r = 0 r = 0.5
0.5 0.25
0.0 0.25
0.0 0.25
0.5 0.25
Doubled Haploids (DHs)
DOUBLED HAPLOIDS
R R R R R R R R R R
𝑟=𝑘𝑁=
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑐𝑜𝑚𝑏𝑖𝑛𝑎𝑛𝑡𝑠𝑇𝑜𝑡𝑎𝑙 =
1020 =0.5
F2 Population
Expected Genotypic Frequencies for F2 Progeny when r = 0 or r = 0.5 Between Two Loci in Coupling (AB/ab) Configuration
Genotype Expected Frequency
r = 0 r = 0.5AB/AB p1 = 0.25(1 - r)2 1/4 = 0.25 1/16 = 0.0625AB/aB p2 = 0.50r(1 - r) 0.0 2/16 = 0.125AB/Ab p3 = 0.50r(1 - r) 0.0 2/16 = 0.125AB/ab p4 = 0.50(1 - r)2 1/2 = 0.5 2/16 = 0.125Ab/aB p5 = 0.50r2 0.0 2/16 = 0.125Ab/Ab p6 = 0.25r2 0.0 1/16 = 0.0625Ab/ab p7 = 0.50r(1 - r) 0.0 2/16 = 0.125aB/aB p8 = 0.25r2 0.0 1/16 = 0.0625aB/ab p9 = 0.50r(1 - r) 0.0 2/16 = 0.125ab/ab p10 = 0.25(1 - r)2 1/4 = 0.25 1/16 = 0.0625
Expected and Observed Genotypic FrequenciesCoupling (AB/ab) and Repulsion (Ab/aB) F2 Progeny
Genotype Observed Frequency
Coupling Repulsion
AB/AB p1 p1 = 0.25(1 - r)2 p1 = 0.25r2
AB/aB p2 p2 = 0.50r(1 - r) p2 = 0.50r(1 - r)AB/Ab p3 p3 = 0.50r(1 - r) p3 = 0.50r(1 - r)AB/ab p4 p4 = 0.50(1 - r)2 p4 = 0.50r2
Ab/aB p5 p5 = 0.50r2 p5 = 0.50(1 – r)2
Ab/Ab p6 p6 = 0.25r2 p6 = 0.25(1 – r)2
Ab/ab p7 p7 = 0.50r(1 - r) p7 = 0.50r(1 - r)aB/aB p8 p8 = 0.25r2 p8 = 0.25(1 – r)2
aB/ab p9 p9 = 0.50r(1 - r) p9 = 0.50r(1 - r)ab/ab p10 p10 = 0.25(1 - r)2 p10 = 0.25r2
•Co-dominant•Fully classified double hets.
•Locus A = A and a•Locus B = B and b• r = recombination frequency between locus A and B
Expected and Observed Genotypic FrequenciesCoupling (AB/ab) F2 Progeny
Genotype Observed Frequency
Coupling
AB/AB q1 q1 = 0.25(1 - r)2
AB/aB q2 q2 = 0.50r(1 - r)AB/Ab q3 q3 = 0.50r(1 - r)
AB/ab + Ab/aB q4 q4 = p4 + p5 = 0.50[(1 - r)2+r2]Ab/Ab q5 q5 = 0.25r2
Ab/ab q6 q6 = 0.50r(1 - r)aB/aB q7 q7 = 0.25r2
aB/ab q8 q8 = 0.50r(1 - r)ab/ab q9 q9 = 0.25(1 - r)2
•Co-dominant•Unclassified double heterozygotes
•Locus A = A and a•Locus B = B and b• r = recombination frequency between locus A and B
Expected and Observed Genotypic FrequenciesCoupling (AB/ab) and Repulsion (Ab/aB) F2 Progeny
Genotype Observed Frequency
Coupling Repulsion
A_B_ f1 f1 = 0.25(3 - 2r + r2) f1 = 0.25(2 + r2)A_bb f2 f2 = 0.25(2r – r2) f2 = 0.25(1 – r2)aaB_ f3 f3 = 0.25(2r – r2) f3 = 0.25(1 – r2)aabb f4 f4 = 0.25(1 - r)2 f4 = 0.25r2
•Dominant•Locus A = A and a•Locus B = B and b• r = recombination frequency between locus A and B
Analysis
1. Single-locus analysis2. Two-locus analysis3. Detecting linkage and grouping4. Ordering loci5. Multi-point analysis
Mendelian Genetic AnalysisPhenotypic and Genotypic Distributions • The expected segregation ratio of a gene is a function of the
transmission probabilities
• If a gene produces a discrete phenotypic distribution, then an intrinsic hypothesis can be formulated to test whether the gene produces a phenotypic distribution consistent with a expected segregation ratio of the gene
• The heritability of a phenotypic trait that produces a Mendelian phenotypic distribution is ~1.0. Such traits are said to be fully penetrant
• The heritability of a DNA marker is theoretically ~1.0; however, it is affected by genotyping errors
Mendelian Genetic AnalysisHypothesis Tests • The expected segregation ratio (null hypothesis) is specified on
the basis of the observed phenotypic or genotypic distribution
• One-way tests are performed to test for normal segregation of individual phenotypic or DNA markers
– If the observed segregation ratio does not fit the expected segregation ratio, then the null hypothesis is rejected.
• The expected segregation ratio is incorrect• Selection may have operated on the locus• The locus may not be fully penetrant• A Type I error has been committed
Mendelian Genetic AnalysisHypothesis Tests
• Two-way tests are performed to test for independent assortment (null hypothesis - no linkage) between two phenotypic or DNA markers. – If two genes do not sort independently, then the null
hypothesis is rejected • The two genes are linked (r < 0.50)• The expected segregation ratio is incorrect• A Type I error has been committed.
Mendelian Genetics Analysis
Null Hypothesis
Null Hypothesis
Accept Reject
True No error1 - a
Type I errora
False positive
FalseType II error
bFalse negative
No error1 - b
One-way or single-locus tests
• C2 statistics
• Log likelihood ratio statistics (G-statistics)
C
n
i i
ii
eeo
1
22 )(
i
ik
ii e
ooG ln21
i = ith genotype (or allele, or phenotype)
Pr[C2 > 2df] = a
Pr[G > 2df] = a
Goodness of fit statistics
One-way or single-locus tests
Genotype Sample A Sample B Total
aa 40 51 91Aa 82 81 163Total 122 132 254
88.6588.16149.1326681ln81
6651ln512
SBG
7.14259.24880.1626182ln82
6140ln402
SAG
Two backcross populations (A and B) genotyped for a co-dominant marker (Brandt and Knapp 1993)
Null hypothesis1:1 ratio of aa to Aa
Pr[GA > 2k-1] =
Pr[14.8 > 21] = 0.0001
Pr[GB > 2k-1] =
Pr[6.88 > 21] = 0.0086
Null hypothesis is rejected for both samples
Individual G-statistics for samples A and B
i
ik
ii e
ooG ln21
i = ith genotypek = 2 genotypic classes
One-way or single-locus tests
Genotype Sample A Sample B Totalaa 40 51 91Aa 82 81 163Total 122 132 254
7.20679.40333.302127163ln163
12791ln912
PG
Two backcross populations (A and B) genotyped for a co-dominant marker (Brandt and Knapp 1993)
Null hypothesis1aa to 1Aa ratio for
pooled samples
Pr[GP > 2k-1] = Pr[20.7 > 2
1] = 0.0000054
Null hypothesis is rejected
Pooled G-statistic across samples
i = ith genotype j = jth samplek = genotypic classesp = No. of samples (populations)
k
ip
iij
p
iijp
iijP
e
ooG
1
1
1
1
ln2
One-way or single-locus tests
Genotype Sample A Sample B Total
aa 40 51 91Aa 82 81 163Total 122 132 254
Two backcross populations (A and B) genotyped for a co-dominant marker (Brandt and Knapp 1993)
Null hypothesisSamples A and B are
homogenous
378.106581ln8151ln5140ln4082ln82ln1 1
k
i
p
jijij oo
94.0483.1406621.1230769.1240378.10652lnlnlnln21
......1
..1 1
p
jjj
k
iiiij
k
j
p
jijH ooooooooG
Pr[GH > 2(k-1)(p-1)] = Pr[0.94 > 2
1] = 0.33 (N.S.)
The heterogeneity G-statistic is
769.1240163ln16391ln91ln1
..
k
iii oo 621.1230132ln132122ln122ln
1..
p
jjj oo
483.1406254ln254ln .... oo
i = ith genotype j = jth sample (population)k = genotypic classesp = No. of samples (populations)n = Total No. of observations
One-way or single-locus tests
6.219.67.14 SBSAT GGG
6.219.07.20 HPT GGGPr[GT > 2
p(k-1)] = Pr[21.7 > 22] = 0.00002
Source G df Pr > G
Sample A 14.7 k-1 = 2-1 =1 0.0001
Sample B 6.9 k-1 = 2-1 =1 0.0086
Total 21.6 p(k-1) = 2(2-1) = 2 0.00002
Pooled 20.7 k-1 = 2-1 =1 0.000005
Heterogeneity 0.9 (k-1)(p-1) = (2-1)(2-1) = 1 0.33
Total 21.6 p(k-1) = 2(2-1) = 2 0.00002
Relationship between G statistics
k = genotypic classesp = No. of samples (populations)
One-way or single-locus tests
Allelic constitution Genotype Observed Expected
120bp /120bp aa 21 23.5120bp /124bp Aa 44 47124bp /124bp AA 29 23.5
Total 94 94
668.1098.6902.2362.225.23
29ln294744ln44
5.2321ln212
G
F2 progeny of Ae. cylindrica genotyped for the SSR marker barc98. Null hypothesis
1:2:1 ratio of aa:Aa:AA
Pr[G > 2k-1] = Pr[1.67 > 2
2] = 0.434
Null hypothesis is not rejected
Individual G-statistics for samples A and B
i
ik
ii e
ooG ln21
i = ith genotypek = 3 genotypic classes
Calculating probability values for Chi-square distributions
SAS program
data pv;Input x df;datalines;3.75 2;data pvalue;set pv;pvalue = 1 – probchi (x, df);output;proc print;run;
Output
Obs x df pvalue 1 3.75 2 0.15335
Excel formula
=CHIDIST(x , degrees_fredom)
=CHIDIST(3.75 , 2)
Output
0.15335