lab 12. linkage disequilibrium november 28, 2012

Post on 16-Jan-2016

221 views

Category:

Documents

Tags:

• rate of ld decayproblem

Embed Size (px)

TRANSCRIPT

Lab 12. Linkage Disequilibrium November 28, 2012GoalsEstimation of LD in terms of D, D and r2.

Determine effect of random and non-random mating on LD.

Estimate LD from diploid genotype data using EM-algorithm.

LD estimation in two-locus (A&B) and two-allele (1 & 2) modelA1A1A2A2B1B2B1B2p1p1p2p2q1q2q1q2GameteObserved gametic frequencyExpected gametic frequency under linkage equilibriumAlleleAllele frequencyA1B1x11p1q1A1p1=x11+x12A1B2x12p1q2A2p2= x21+x22A2B1x21p2q1B1q1= x11+x21A2B2x22p2q2B2q2= x12+x22

If D > 0, Dmax = min(p1q2, p2q1)

If D < 0, Dmax = min(p1q1, p2q2).

Different measures of LD

Allele historyHigh driftorSelective sweepTimeLD Broken by recombinationA1B1A2B2A1B2A2B1A1B1A1B2A1B2A1B1LD Broken by recombinationCloser proximity -> less recombination -> stronger LDDecay of LD

Recombination rate for self-fertilizing organisms: GameteCountA1B1138A1B288A2B178A2B2152Total456Problem 1. In most conifers, gamete frequencies and the linkage phase of diploid genotypes can be determined directly because seeds contain relatively large amounts of haploid nutritional tissue (called endosperm or megagametophyte), which originates from the maternal gamete. As part of a study of the linkage relationship among allozyme loci in loblolly pine (Pinus taeda), Adams and Joly (1980) sampled 456 gametes at loci phosphoglucose isomerase 2 (PGI2, for simplicity, let this be locus A) and glutamate-oxaloacetate transaminase 1 (GOT1, let this be locus B) and observed the following numbers of gametes.(15 minutes)

a.)Calculate D, D, and r2, and test the statistical significance of the gametic disequilibrium between the two loci.b.)Because the linkage phase of each mother tree was known, Adams and Joly were able to estimate that the recombination rate between the two loci is c = 0.044. What is the expected value of D in the next generation (i.e., in the offspring of the seeds that were included in the study)?How many generations of random mating will it take for D to decay below 0.005?What is the expected value of D in the next generation if:S = 0.1? S = 0.5?S = 0.9?c.) Repeat the calculations from b) assuming c = 0.5 (i.e., assuming that the two loci are physically unlinked).d.) Discuss the relative importance of rates of recombination and self-fertilization in determining the rate of LD decayProblem 2. Compare rates of decay of r2 with physical distance in sequences from the phytochrome B2 (PHYB2) gene in European aspen (Populus tremula) and the phytochrome C (PHYC) gene in Arabidopsis thaliana.Show scatter plots with trend lines illustrating the decay of r2 with physical distance for each geneHow do the patterns of LD differ between these two species, and why?GRADUATE STUDENTS: Provide facts and citations supporting your biological explanationWhen we genotype, we often dont know the actual haplotypesUnphased haplotypesCan use a maximum likelihood method to obtain haplotype frequenciesExpectation Maximization (EM)Haplotypes through EMHaplotypes through EMInitialize Guess the gamete frequenciesExpectation Step Find expected frequencies of known phase genotypes given gamete frequenciesMaximization Step Find expected frequencies of all unphased genotypes given gamete frequenciesUse to make new gamete frequency estimates

where n= # of unphased genotypes in the samples, n1, n2.n5, are the # of times each unphased genotype was observed in the sample, and P1, P2, ., P5 are the expected frequencies of the unphased genotypes in the sample.

Problem 3. File human_LD.arp contains data for humans from two populations (Han and Melanesian) genotyped for the same loci you have analyzed for departures from Hardy-Weinberg Equilibrium and population structure. The Han sample includes individuals from a broad geographic area in China, whereas the Melanesian sample only includes individuals from the Bougainville Island. Use Arlequin to test for significant linkage disequilibrium among the 10 loci in each of these populations. a.) How do you interpret the difference in the number of linked loci in the two populations?b.) GRAD STUDENTS: How many pairs of loci are expected to show significant LD at =0.05 by chance?c.) GRAD STUDENTS: Provide facts and citations supporting your biological claim.

http://en.wikipedia.org/wiki/Melanesia

Han