gene frequency and linkage

29
Gene Frequency and LINKAGE Gregory Kovriga & Alex Ratt

Upload: kiara

Post on 10-Jan-2016

71 views

Category:

Documents


2 download

DESCRIPTION

Gregory Kovriga & Alex Ratt. Gene Frequency and LINKAGE. Outline:. What gene frequencies are for ? Consequenses of incorrect frequencies Estimation techniques Example Estimations with ILINK Exercise. What gene frequencies are for?. a/b. ?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Gene Frequency and LINKAGE

Gene Frequency and LINKAGE

Gregory Kovriga&

Alex Ratt

Page 2: Gene Frequency and LINKAGE

Outline:

● What gene frequencies are for ?● Consequenses of incorrect frequencies● Estimation techniques● Example● Estimations with ILINK● Exercise

Page 3: Gene Frequency and LINKAGE

What gene frequencies are for?

● Consider a pedigree with unknown genotype/phenotype founders...

● This is especially important in desease alleles where correspondance between genotype and phenotype is rarely 1:1

● In order to estimate the likelyhood function values should be provided for allele frequencies...

?a/b

b/a

a/c

Page 4: Gene Frequency and LINKAGE

Example of analysis with different gene frequencies

● We will demonstrate this on an example...

Given a pedigree with

recessive desease and only

single affected individual

(=0)

1/1

Freq. DeseaseAllele 0.90000 0.50000 0.10000 0.01000 0.00001

0.90000 0.00012 0.00338 0.01945 0.03859 0.042740.50000 0.00102 0.02802 0.14323 0.25348 0.274680.10000 0.00622 0.14860 0.53034 0.76720 0.806140.01000 0.01473 0.29571 0.82723 1.10035 1.143380.00001 0.01706 0.32902 0.88315 1.16052 1.20401

Page 5: Gene Frequency and LINKAGE

Example ...

The table shows that when desease gene freq. is less than 90% the freq. of the marker has more effect on the analysis than the freq. of the desease allele...

(because the penetrances at the desease locus tell us more about the untyped individuals desease locus genotypes than we know about their marker locus) 1/1

Freq. DeseaseAllele 0.90000 0.50000 0.10000 0.01000 0.00001

0.90000 0.00012 0.00338 0.01945 0.03859 0.042740.50000 0.00102 0.02802 0.14323 0.25348 0.274680.10000 0.00622 0.14860 0.53034 0.76720 0.806140.01000 0.01473 0.29571 0.82723 1.10035 1.143380.00001 0.01706 0.32902 0.88315 1.16052 1.20401

In practice: if your analysis has drastically different results depending on gene frequencies – significance of the results should be highly questioned...

Page 6: Gene Frequency and LINKAGE

Wrong frequencies?

● It is difficult to choose correct frequencies: for the population or a pedigree. One of the techniques: equal allele frequencies...

Q: What are the effects of using wrong gene frequencies then?

A: In general the effects of choosing to use equal gene frequencies was shown to lead to systematic bias in favor of linkage... in other words this tends to give false positives in linkage analysis.

Page 7: Gene Frequency and LINKAGE

Estimation techniques

● There are published frequencies for many markers based on random samples. But those frequencies may differ strongly between different populations...

● In large pedigrees: treat unrelated individuals as a sample and apply counting methods

● ILINK (LINKAGE package) – is another powerful approach...

AB

Page 8: Gene Frequency and LINKAGE

Estimation techniques (cont.)

● Contrary to a simple counting method ILINK can extract additional data from the pedigree structure about the untyped individuals...

Example:

● The estimation step can be repeated to get even more refined results (EM)

● Significance of the approach depends on number of untyped pedigree members

Freq.: 1 2 3 4Counting 0.3333 0.1667 0.4167 0.0833ILINK est. 0.3333 0.1999 0.3999 0.0667

Page 9: Gene Frequency and LINKAGE

Estimation techniques (cont)

● Take into consideration that in such estimation the recombination factor is active parameter in determining the gene frequency...

● Though the difference in allele frequencies might not be significant, the affect on the lod-score might be notable in some situations...

● A way to balance the computations: compute the frequencies separatly for =0.5 and `

Z('log10(L(',p'i)/L(0.5,p''i))

Page 10: Gene Frequency and LINKAGE

Gene Frequencies Estimation

● Published estimates for the gene frequencies may be used as a first approximation.

● But it is advisable to estimate marker allele frequencies on your own from unrelated individuals taken from the same genetic population as your disease pedigrees.

● Another approach is to use the ILINK program to estimate the allele frequencies from the pedigree data.

Page 11: Gene Frequency and LINKAGE

Gene Frequencies With ILINK

● In our pedigree there are eight founders, two of whom are untyped.

● Directly estimating the allele frequencies based on the six typed founders produces:

● 4 copies of the 1 allele● 2 copies of the 2 allele● 5 copies of the 3 allele ● 1 copy of the 4 allele

● Gene frequency estimates:– 1 (0.3333) 2 (0.1667) 3 (0.4167) 4 (0.0833)

ExampleExample

Page 12: Gene Frequency and LINKAGE

Gene Frequencies With ILINK

● However, there is some information in the pedigree about the genotypes of the two untyped founders.

● To take advantage of it ,we use the ILINK program.

● Prepare the parameter file for the example.

Example (cont.)Example (cont.)

Page 13: Gene Frequency and LINKAGE

● Disease locus is fully penetrant.● Disease locus is autosomal dominant.● Gene frequency for the disease allele equal to

0.00001● Estimated values used as starting values for gene

frequencies.

Assumptions

Parameter File For The Example

Page 14: Gene Frequency and LINKAGE

d2 0 0 3 << No. of loci, risk locus, sex linked, program

0 0.0 0.0 0 << Mut locus, mut male, mut fem, hap freq.

1 2 << Affection , No. of alleles

9.99990E-01 1.00000E-05 << Gene Frequencies

1 << No. of liability classes

0 1.0000 1.0000 << Penetrances

Datafile.dat

Parameter File For The Example

Page 15: Gene Frequency and LINKAGE

d3 4 << Allele numbers , No. of alleles

0.3333 0.1667 0.4167 0.0833 << Gene Frequencies

0 0 << Sex difference, interference (if 1 or 2)

0.079 << Recombination values

2 << This locus may have iterated pars

0 1 1 1 << Estimate 3 free gene frequencies

Datafile.dat (cont.)

Parameter File For The Example

Page 16: Gene Frequency and LINKAGE

Running ILINK Program

d

CHROMOSOME ORDER OF LOCI : 1 2

****************** FINAL VALUES ********************

PROVIDED FOR LOCUS 2 (CHROMOSOME ORDER)

*****************************************************

GENE FREQUENCIES : 0.333411 0.199931 0.400035 0.066623

*****************************************************

THETAS: 0.079

-2 LN(LIKE) = 1.19255504700605990e+02

LOD SCORE = 1.82101572034502788e+00

NUMBER OF ITERATIONS = 6

NUMBER OF FUNCTION EVALUATIONS = 37

PTG = -2.19142732875567302e-06

Final.dat

Page 17: Gene Frequency and LINKAGE

Gene Frequencies With ILINK

● We did the estimation conditional on there being linkage between marker and disease.

● What happens to the estimates if we assume that the recombination between disease and marker is 50%?

● This involves estimating marker allele frequencies ignoring all information about linkage.

Estimation 2Estimation 2

Page 18: Gene Frequency and LINKAGE

Gene Frequencies With ILINK

● Now we set recombination values to 0.5 and run the ILINK program again.

● The estimates change slightly to the following numbers:– 1 (0.366830) 2 (0.200045) 3 (0.366430) 4 (0.066695)

Estimation 2 (cont.)Estimation 2 (cont.)

Page 19: Gene Frequency and LINKAGE

Gene Frequencies With ILINK

● Another thing we may think of is jointly estimating recombination fraction with the gene frequencies.

● This can be done by setting the bottom line of the parameter file to be 1 1 1 1 such that all 4 parameters be estimated.

● ILINK results: θ = 0.078– 1 (0.333419) 2 (0.200082) 3 (0.399933) 4 (0.066669)

Estimation 3Estimation 3

Page 20: Gene Frequency and LINKAGE

Gene Frequencies With ILINK

Gene Frequency Estimates Under Different HypothesesGene Frequency Estimates Under Different Hypotheses

Θ = 0.079

Θ = 0.500

Θ = Θ Counting

p1 0.333383 0.366830 0.333666 0.333333

p2 0.199991 0.200045 0.200032 0.166667

p3 0.399935 0.366430 0.399933 0.416667

p4 0.066691 0.066695 0.066669 0.083333

Page 21: Gene Frequency and LINKAGE

Gene Frequencies With ILINK

● Estimating gene frequencies using different hypotheses leads to slightly different estimates.

● Fortunately , the difference is not huge, though it may have a significant influence on the lod scores in some situations.

● Because most pedigree members were typed in this example, the gene frequencies are not very crucial, whereas in other examples , the results may vary dramatically.

ConclusionConclusion

Page 22: Gene Frequency and LINKAGE

The Exercise

1. Go back to Exercise 8 and estimate gene frequencies for the ABO blood group in this same pedigree.

2. Does the lod score change when these frequencies are estimated instead of using population gene frequency estimates?

3. Consider the incomplete penetrance model on this same family.

4. Does encorporating this reduced penetrance affect your estimates of marker allele frequencies?

5. How does the gene frequency information affect the lod score between ABO and the disease?

Page 23: Gene Frequency and LINKAGE

ABO Blood Group

A A A

A A

AAA

AAA

B B B

B B

BAB B BB

B

B

O

O

A

Page 24: Gene Frequency and LINKAGE

The Exercise - Solution

Estimating Allele Frequencies

Estimation 1:

We set the recombination fraction between disease

and ABO to 0.5 and estimate allele frequencies.

Results: A(0.288) B(0.343) O(0.369)

Estimation 2:

We estimate allele frequencies jointly with the

recombination fraction.

Results: A (0.277) B(0.341) O (0.382) θ (0.001)

Page 25: Gene Frequency and LINKAGE

The Exercise - SolutionComputing Lod Score

1. Allele frequencies estimated jointly with recombination fraction: Z(θ=0) = 3.459960

2. Allele frequencies estimated when disease considered to be unlinked to the marker: Z(θ=0) = 3.454484

3. Treat gene frequency estimates as nuisance parameters. Z(θ=0) = 3.457298

In our case the lod scores are not greatly affected by the changes in gene frequency estimates at ABO.

Page 26: Gene Frequency and LINKAGE

The Exercise - SolutionIncomplete penetrance model

● Define penetrance for each age class.● For individuals younger than 10, the penetrance is 0.1● For individuals older than 60, the penetrance is 0.9● For individuals in the middle use formula for the line

connecting the points (10,0.1) and (60,0.9)

● Estimating allele frequencies based on this model.

Results: θ=0.5 A (0.288) B(0.343) O (0.369)

θ=θ’ A (0.277) B(0.341) O (0.382)

Page 27: Gene Frequency and LINKAGE

The Exercise - SolutionIncomplete penetrance model (cont.)

● The estimations are the same as in full penetrance model.

● This is true because the estimation of allele frequency is done independently of the disease phenotypes in pedigree.

● Another reason is that there is little ambiguity as to the disease locus genotypes of the founders.

Page 28: Gene Frequency and LINKAGE

The Exercise - SolutionComputing Lod Score

● The lod scores are now as follows:

Z(θ=0) = 2.172223 θ= θ`

Z(θ=0) = 2.166747 θ= 0.5

Z(θ=0) = 2.169561

● The last lod score is again right between the two lod scores computed with fixed gene frequency estimates.

Page 29: Gene Frequency and LINKAGE