lecture 10: linkage analysis iii

41
Lecture 10: Linkage Analysis III Date: 9/26/02 Revisit segregation ratio distortion. Haplotype coding Three point analysis Multipoint analysis

Upload: amato

Post on 12-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Lecture 10: Linkage Analysis III. Date: 9/26/02 Revisit segregation ratio distortion. Haplotype coding Three point analysis Multipoint analysis. Additive Segregation Ratio Distortion. Systematic genotype classification error occurs. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 10: Linkage Analysis III

Lecture 10: Linkage Analysis III

Date: 9/26/02 Revisit segregation ratio distortion. Haplotype coding Three point analysis Multipoint analysis

Page 2: Lecture 10: Linkage Analysis III

Additive Segregation Ratio Distortion

Systematic genotype classification error occurs.

Power and estimates of recombination fraction are unaffected by additive distortion in the backcross configuration.

Estimates of recombination fraction are not affected for F2, but the false positive rate increases.

Page 3: Lecture 10: Linkage Analysis III

Additive Segregation - Backcross

Suppose the frequency of genotype Aa is increased because a fraction u of aa genotypes are misclassified.

Similarly, assume the frequency of genotype Bb is independently increased by fraction v.

We need to recalculate the expected frequencies under the new model with additional parameters u and v.

Page 4: Lecture 10: Linkage Analysis III

Additive Segregation – Backcross (contd)

Genotype Expected Frequency

Expected Frequency with Distortion

AaBb 0.5(1-) 0.5(1-) + u/2 + v/2

Aabb 0.5 0.5u/2 – v/2

aaBb 0.5 0.5 - u/2 + v/2

aabb 0.5(1-) 0.5(1-) – u/2 – v/2

Total: Aa 0.5 0.5 + u

Total: aa 0.5 0.5 – u

Total: Bb 0.5 0.5 + v

Total: bb 0.5 0.5 – v

Page 5: Lecture 10: Linkage Analysis III

Additive Segregation – Backcross (contd)

The number of unknown parameters equals the number of degrees of freedom.

Use Bailey’s method to find the MLEs of the parameters (, u, v).

Page 6: Lecture 10: Linkage Analysis III

Bailey’s Method

Set the expected frequencies equal to the observed proportions and solve the system of equations for the unknown parameters. These are the MLEs.

Example: Suppose you observe 5 successes from a Binomial(10, p) distribution. Then

pmle = 5/10

Page 7: Lecture 10: Linkage Analysis III

Additive Segregation – Backcross (contd)

What do you notice about the MLE for recombinant fraction?

Is the MLE for recombinant fraction biased?

N

ffffv

N

ffffu

N

ff

ˆ

22122111

22211211

2112

Page 8: Lecture 10: Linkage Analysis III

Additive Segregation – F2-CC

Genotype Expected Frequency

Additive Distortion

AABB 0.25(1-)2 u/3 + v/3

AABb 0.5 u/3 – v/3

AAbb 0.25 u/3

AaBB 0.5(1-) - u/3 + v/3

AaBb 0.5(1-2+22) -u/3 – v/3

Aabb 0.5 (1-) -u/3

aaBB 0.252 v/3

aaBb 0.5 (1-) -v/3

aabb 0.25(1-)2 0

Page 9: Lecture 10: Linkage Analysis III

Penetrance Distortion - Backcross

Selection, penetrance, linkage to selected markers all can result in penetrance distortion, thus it is quite common.

Suppose (100xu)% of the genotype aa is misclassified as Aa. Similarly, assume that bb has (100xv)% misclassified as Bb independently.

Page 10: Lecture 10: Linkage Analysis III

Penetrance Distortion - Backcross

Gen. Expected Frequency

AaBb P(AaBb)+P(scored as Aa|aaBb)P(aaBb)+P(scored as Bb|Aabb)P(Aabb)+P(scored as AaBb|aabb)P(aabb)

=0.5(1-)+0.5u+0.5v+0.5(a+b)(1-)

=0.5[(u+v)+(1-)(1+uv)]

Aabb

aaBb

aabb

Page 11: Lecture 10: Linkage Analysis III

Penetrance Distortion - Backcross

Is the estimate for recombination fraction biased?

The power to detect linkage is decreased.

N

ffffv

N

ffffu

vuN

f

22122111

22211211

22

ˆ

ˆ

11

21ˆ

Page 12: Lecture 10: Linkage Analysis III

Cost of Assuming Non-Distortion Model

The estimate for recombination fraction is biased. By how much?

ˆEBias

Page 13: Lecture 10: Linkage Analysis III

Overall Impact of Segregation Distortion

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

Distortion (u=v)

Biasrecomb. fraction 0.3

recomb. fraction 0.2

recomb. fraction 0.1

Page 14: Lecture 10: Linkage Analysis III

First Project

This slide marks the end of the material that will be needed to complete the first project.

Page 15: Lecture 10: Linkage Analysis III

Linkage Analysis for Multiple Loci

The haplotype is the sequence of alleles along one of the chromosomes in an individual.

In multipoint linkage analysis we are not concerned with the alleles at each locus, rather its parental origin.

Page 16: Lecture 10: Linkage Analysis III

Recoding Haplotypes

Suppose there are k loci. Recode each haplotype as a string of k-1 of 0’s and 1’s If the ith position is 0, it indicates the (i+1)th

locus is noit recombinant with respect to the ith locus.

If the ith position is 1, it indicates the (i+1)th locus is recombinant with respect o the ith locus.

Page 17: Lecture 10: Linkage Analysis III

Recoding Haplotypes (contd)

Haplotype ABC

Recombinant on interval:

Picture

AB AC BC

00 no no no A—B—C

01 no yes yes A—BC

10 yes no yes ABC

11 yes yes no ABC

Page 18: Lecture 10: Linkage Analysis III

Recoding Haplotypes (contd)

Haplotype Code

ABxCxD

101

000110

Page 19: Lecture 10: Linkage Analysis III

Recoded Haplotypes and Recombination Fractions

1101

1001

1011

AC

BC

AB

111011000

Page 20: Lecture 10: Linkage Analysis III

Calculate the probabilities of the four haplotype classes (i.e. 00, 10, 01, 11) when AB = 0.1 and BC = 0.2 and AC is unknown. Assume the Sturt map function with L = 1.

Sample Problem

Page 21: Lecture 10: Linkage Analysis III

Plan of Attack

1. Transform recombination fractions to genetic map units using the inverse map function.

2. Sum the genetic map units to obtain length of AC interval.

3. Calculate the recombination fraction between AC using the map function.

4. Solve the set of simultaneous equations for the haplotype frequencies.

Page 22: Lecture 10: Linkage Analysis III

Step 1

238.0

108.0

BC

AB

m

m

LLme

L

m /12112

1

Page 23: Lecture 10: Linkage Analysis III

Step 2

346.0238.0108.0 BCABAC mmm

Page 24: Lecture 10: Linkage Analysis III

Step 3

269.0

346.0112

1

112

1

346.0

/12

e

eL

m

AC

LLm

Page 25: Lecture 10: Linkage Analysis III

Step 4

1

269.0

2.0

1.0

11100100

1101

1001

1011

0845.0

1845.0

0155.0

7155.0

11

01

10

00

Page 26: Lecture 10: Linkage Analysis III

Phase Known Three Point Analysis

When all gametes in sample are fully informative, then the likelihood is simple.

4

1

logi

iifl

BCAB

ACBCAB

BCAB

ACBCAB

c

cl

l

2

,,

,,

How would youtest for interference?

Page 27: Lecture 10: Linkage Analysis III

Multipoint Analysis – A Difficulty

Suppose there are k loci. How many haplotypes are possible? How many recombination fractions are

there?

Page 28: Lecture 10: Linkage Analysis III

Recombination Value

Definition: The recombination value of a set of intervals is the probability of an odd number of crossovers occurring in the intervals.

How many sets of intervals are there?

Page 29: Lecture 10: Linkage Analysis III

Sample Problem – Four Point Analysis

Suppose loci A, B, C, and D are in syntenic order and AB = 0.1, BC = 0.2, and CD = 0.3.

What are the probabilities of the haplotype classes given the Kosambi map function.

12

14

4

m

m

e

e

Page 30: Lecture 10: Linkage Analysis III

The Linear Equations

1111101011110100010001000

111111101011001

110111110011010

,101111100010001

100111110101100

011110100011001

010101100011010

001110101010001

AD

AC

CDAB

AB

BD

BC

CD

Page 31: Lecture 10: Linkage Analysis III

Multipoint Likelihood

Can be written in terms of the 2k-1-1 recombination values or haplotype frequencies.

Can be reparameterized as k-1 recombination fractions and 2k-1-k interference parameters.

Then tests for interference are possible. An alternative is to assume a map function with

possibly unknown parameters which constrains the gamete probabilities as functions of the k-1 recombination fractions.

Page 32: Lecture 10: Linkage Analysis III

Multilocus-Infeasible Map Functions

Kosambi, Carter-Falconer, and Felsenstein map functions are multilocus-infeasible because they can produce negative gametic frequencies.

The Morgan, Haldane, Sturt and generalized map functions are multilocus-feasible.

Haldane is most often used for its simplicity except when linkage is tight, e.g. m << 0.5.

Page 33: Lecture 10: Linkage Analysis III

Map Building

How many possible orders are there for k loci?

10 loci can be ordered in over 1 million ways.

The solution is to generate a small number of probably orders and then analyze these few in depth.

Page 34: Lecture 10: Linkage Analysis III

Stepwise Approximate Ordering

Use likelihood analysis to order a few markers, say l.

Add each additional marker one at a time by considering all l-1 positions for it. Choose the location that results in the highest likelihood.

Number of likelihood evaluations: 3+4+5...+k = (k-2)(k+3)/2.

Page 35: Lecture 10: Linkage Analysis III

Pairwise Approximate Ordering

Two point linkage analysis on all pairs of loci to obtain a recombination fraction estimate.

Multidimensional scaling analyses (multivariate exploratory analysis) to find approximate orders.

Page 36: Lecture 10: Linkage Analysis III

Final Step – Perfecting Order

Test the likelihood of various reorderings of neigboring groups of loci.

If an tested order has higher likelihood, keep it.

etc...

Page 37: Lecture 10: Linkage Analysis III

Disease Mapping

Condition on an ordering of all markers except disease locus.

Calculate a multilocus likelihood for each possible position of the disease locus, call this lx.

Calculate the location score 2(lx - l) at point x, where l is the log-likelihood with disease locus unlinked to other markers.

Page 38: Lecture 10: Linkage Analysis III

Disease Mapping

Can also calculate multipoint LOD scores by dividing locations scores by 2ln(10).

Plot location score or multipoint LOD score by position x. The peak is the likely position of the disease locus and if the peak exceeds some cut-off criteria linkage to that region is significant.

Page 39: Lecture 10: Linkage Analysis III

Multipoint vs. Single Point Disease Mapping

Information from every sampled individual, even those who may be homozygous at the single marker.

Single marker can only provide information about crossovers on one side of the disease gene.

The more markers, the sharper the peak. The disease gene is ultimately mapped to the smallest

interval where there is no observed crossover between marker and disease gene in entire sample.

Page 40: Lecture 10: Linkage Analysis III

Sample Size

Assuming no interference, crossovers are distributed exponentially with mean 1 per Morgan.

Sample n individuals and the mean rate is n. Therefore, the expected distance to the nearest

crossover on either side of the disease locus is 1/n. The interval containing disease gene has length

distributed as gamma distribution with mean 2/n. Example: You want to localize disease gene to 1

cM = 1/100 M. Therefore, you need n>200.

Page 41: Lecture 10: Linkage Analysis III

Summary

Modeling of segregation distortion and the impact on linkage analysis.

Haplotying coding. The use of map functions. Overview of likelihood formulation for

multipoint analysis.