666.09 -- haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “guesstimate” haplotype...
TRANSCRIPT
![Page 1: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/1.jpg)
Haplotyping
Biostatistics 666Lecture 9
![Page 2: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/2.jpg)
Last Lecture
Introduction to the E-M algorithm
Approach for likelihood optimization
Examples related to gene counting• Allele frequencies estimation• Haplotype frequency estimation
![Page 3: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/3.jpg)
Today:
Other approaches for haplotyping• Clark’s greedy algorithm• Stephens et al’s coalescent based algorithm
Using haplotypes in association studies
![Page 4: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/4.jpg)
Useful Roles for HaplotypesLinkage disequilibrium studies• Summarize genetic variation
Selecting markers to genotype• Identify haplotype tag SNPs
Candidate gene association studies• Help interpret single marker associations• May capture effect of ungenotyped alleles
![Page 5: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/5.jpg)
The problem…
Haplotypes are hard to measure directly• X-chromosome in males• Sperm typing• Hybrid cell lines• Other molecular techniques
Often, statistical reconstruction required
![Page 6: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/6.jpg)
Typical Genotype Data
Two alleles for each individual• Chromosome origin for each allele
is unknown
Multiple haplotype pairs can fit observed genotype
C G Marker1T C Marker2G A Marker3
Observation
C G C GT C C TG A G A
C G C GC T T CA G A G
Possible States
![Page 7: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/7.jpg)
Use Information on RelativesFamily information can help determine phase at many markers
Still, many ambiguities might not be resolved• Problem more serious with
larger numbers of markers
Can you propose examples?
Proportion of Phase Known Haplotypes
0
33
67
100
1 8 15Markers
Unrelateds w/Parents
Parents + Sibling
![Page 8: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/8.jpg)
What if there are no relatives?
Rely on linkage disequilibrium
Assume that population consists of small number of distinct haplotypes
Haplotypes tend to be similar
![Page 9: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/9.jpg)
Clark’s Haplotyping Algorithm
Clark (1990) Mol Biol Evol 7:111-122
One of the first haplotyping algorithms• Computationally efficient• Very fast and widely used in 1990’s• More accurate methods are now available
![Page 10: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/10.jpg)
Clark’s Haplotyping AlgorithmFind unambiguous individuals• What kinds of genotypes will these have?• Initialize a list of known haplotypes
Resolve ambiguous individuals• If possible, use two haplotypes from list• Otherwise, use one known haplotype and augment list
If unphased individuals remain• Assign phase randomly to one individual• Augment haplotype list and continue from previous step
![Page 11: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/11.jpg)
Chain of Inference (Clark, 1990)
![Page 12: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/12.jpg)
Can The Algorithm Get Started?
What kinds of genotypes do we need to get started?
What kinds of haplotype pairs do we need to get started?
What is the probability of these occurring?
![Page 13: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/13.jpg)
Probability of Failing To Start
![Page 14: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/14.jpg)
Distribution of Orphaned Alleles
![Page 15: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/15.jpg)
Distribution of Anomalous Matches
![Page 16: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/16.jpg)
Notes …
Clark’s Algorithm is extremely fast
More likely to start with large sample
Orphaned alleles and anomalous matches may occur• Solution with the least orphaned alleles is usually
the one with the fewest anomalous matches
![Page 17: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/17.jpg)
The E-M Haplotyping Algorithm
Excoffier and Slatkin (1995) • Mol Biol Evol 12:921-927• Provide a clear outline of how the algorithm
can be applied to genetic data
Combination of two strategies• E-M statistical algorithm for missing data• Counting algorithm for allele frequencies
![Page 18: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/18.jpg)
E-M Algorithm For Haplotyping
1. “Guesstimate” haplotype frequencies2. Use current frequency estimates to replace
ambiguous genotypes with fractional counts of phased genotypes
3. Estimate frequency of each haplotype by counting
4. Repeat steps 2 and 3 until frequencies are stable
![Page 19: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/19.jpg)
Expected Haplotype Counts
∑ ∑≠
+=
==
=
=
ij
ji
ji
jiiiihh
hhGH
hhhhGhhGh
G
jiji
j
pHP
ppnn)(n
GHn
hhhhG
jh
),(~
),(),( )ˆ|(
ˆˆ22E
G with compatible pairs Haplotype~G typeof genotypes ofNumber
, toingcorrespond genotype Unphased),(
haplotype
![Page 20: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/20.jpg)
Computational Cost (for SNPs)
Consider sets of m unphased genotypes• Markers 1..m
If markers are bi-allelic• 2m possible haplotypes• 2m-1 (2m + 1) possible haplotype pairs• 3m distinct observed genotypes• 2n-1 reconstructions for n heterozygous loci
For example, if m = 10
= 1024= 524,800
= 59,049= 512
![Page 21: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/21.jpg)
E-M Algorithm for HaplotypingCost grows rapidly with number of markers
Typically appropriate for < 25 SNPs• Fewer microsatellites
More accurate than Clark’s method
Fully or partially phased individuals contribute most of the information
![Page 22: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/22.jpg)
Enhancements to E-M
List only haplotypes present in sample
Gradually expand subset of markers under consideration, eliminating haplotypes with low estimated frequency from consideration at each stage• SNPHAP [Clayton (2001)]• HAPLOTYPER [Qin et al. (2002)]
![Page 23: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/23.jpg)
Divide-And-Conquer Approximation
Number of potential haplotypes increases exponentially• Number of observed haplotypes
does notApproximation• Successively divide marker set• Run E-M on each segment• Prune haplotype list as segments
are ligated
Computation order: ~ m log m• Exact E-M is order ~ 2m
0
1250
2500
1 3 5 7 9 11
markers
time
(ms)
Exact E-M Approximation
![Page 24: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/24.jpg)
Other Recent Developments …
Newer methods try to further improve haplotype estimation by favoring sets of similar haplotypes
Stephens et al. (2001) • Am J Hum Genet 68:978-89
Genealogical approach…
![Page 25: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/25.jpg)
What the Genealogy Implies…
Haplotypes are similar to each other…Known Haplotypes
225442254422544
3333433334
23233
14234
Individual 1
Genotype:3234423534
Individual 2:
Genotype:3244423434
33334
22544
33434
22444
![Page 26: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/26.jpg)
Chromosome Genealogies
Present
Past
*
*
*
*
*
*
*
*
* *
*
*
![Page 27: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/27.jpg)
Method based on Gibbs sampler
MCMC method• Stochastic, random procedure• Improves solution gradually
Given initial set of haplotypesSample haplotypes for one individual at a time, assuming other haplotypes are trueRepeat a few million times…
![Page 28: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/28.jpg)
Update Procedure I
Pick individual U to update at random
Calculate haplotype frequencies F in all other individuals• Since everyone is “phased”, this is done by counting
Sample new haplotypes for U from conditional distribution of U’s haplotypes given F
![Page 29: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/29.jpg)
Update Procedure I
This procedure would produce an estimate of haplotype frequencies that equivalent to the E-M algorithm…
Stephens et al (2001) suggested an alternative estimate of F…
![Page 30: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/30.jpg)
Update Procedure II
Estimate F from the other individuals
Construct F* to include haplotypes in F and also other similar (possibly differing at a few sites)
Update U’s haplotypes conditional on F*
![Page 31: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/31.jpg)
Stephens' Formula …
Pr(h|H) is the probability of observing haplotype h given previous set H
( ) hS
S
S
Pn
nnn
nHh αα
α
θθθ∑∑ +
⎟⎠⎞
⎜⎝⎛
+=)|Pr(
Sum over haplotypes
Sum over number of mutations
S mutations before coalescenceCoalescence
Mutation Matrix
![Page 32: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/32.jpg)
Further Refinements
This naïve strategy becomes impractical for very long haplotypes• List of haplotypes for each individual could
become too long
Instead, we can proceed by selecting a short segment of the haplotype to update at random
![Page 33: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/33.jpg)
Hypothesis Testing
Often, haplotype frequencies are not final outcome.
For example, we may wish to compare two groups of individuals…• Are haplotypes similar in two populations?• Are haplotypes similar in patients and healthy
controls?
![Page 34: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/34.jpg)
Simplistic approach…
Calculate haplotype frequencies in each group
Find most likely haplotype for each individual
Compare haplotype reconstructions in the two groups
![Page 35: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/35.jpg)
Simplistic approach…
Calculate haplotype frequencies in each group
Find most likely haplotype for each individual
Compare haplotype reconstructions in the two groups
NOT RECOMMENDED!!!
![Page 36: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/36.jpg)
Observed Case Genotypes
1 2 3 4 5 6
The phase reconstruction in the five ambiguous individuals will be driven by the haplotypes observed in individual 1 …
![Page 37: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/37.jpg)
Inferred Case Haplotypes
1 2 3 4 5 6
This kind of phenomenon will occur with nearly all population based haplotyping methods!
![Page 38: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/38.jpg)
Observed Control Genotypes
1 2 3 4 5 6
Note these are identical, except for the single homozygous individual …
![Page 39: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/39.jpg)
Inferred Control Haplotypes
1 2 3 4 5 6
Ooops… The difference in a single genotype in the original data has been greatly amplified by estimating haplotypes…
![Page 40: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/40.jpg)
Hypothesis Testing II
Never impute haplotypes in two samples separately
Instead, consider both samples jointly…• Schaid et al (2002) Am J Hum Genet 70:425-34• Zaytkin et al (2002) Hum Hered. 53:79-91
Another alternative is to use maximum likelihood
![Page 41: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/41.jpg)
Hypothesis Testing III
Estimated haplotype frequencies, imply a likelihood for the observed genotypes
∏∑=i G~H i
)(HPL
![Page 42: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/42.jpg)
Hypothesis Testing III
Estimated haplotype frequencies, imply a likelihood for the observed genotypes
∏∑=i G~H i
)(HPL
individuals
possible haplotype pairs, conditional on genotype
haplotype pair frequency
![Page 43: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/43.jpg)
Hypothesis Testing III
Calculate 3 likelihoods:• Maximum likelihood for combined sample, LA
• Maximum likelihood for control sample, LB
• Maximum likelihood for case sample, LC
2~ln2 dfA
CB
LLL χ⎟⎟
⎠
⎞⎜⎜⎝
⎛
df corresponds to number of non-zero haplotype frequencies in large samples
![Page 44: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/44.jpg)
Significance in Small Samples
In realistic sample sizes, it is hard to estimate the number of df accurately
Instead, permute case and control labels randomly
![Page 45: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/45.jpg)
Final thoughts…
Compare alternative reconstructions• Change input order• Change random seeds• Change starting values
When analyzing case-control studies• Randomize case-control labels
![Page 46: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/46.jpg)
Summary
Describe principles underlying haplotype estimation in unrelated individuals
• Heuristic algorithms• The E-M algorithm• Genealogical approach
![Page 47: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/47.jpg)
Notation used inExcoffier and Slatkin paper
n'combinatio genotype observed' with phenotype'' replacing try clarity,For
Likelihood)(
phenotype ofy probabilit)()pair haplotype(
phenotypefor compatible pairs haplotype phenotypefor markers usheterozygo of no.
phenotypes individualfor index ...1phenotypes observed of no.
1 1
11
∏ ∑
∑∑
= =
==
⎟⎟⎠
⎞⎜⎜⎝
⎛=
==
=
m
j
nc
iilik
c
iilik
c
ij
j
j
jj
jj
hhPL
jhhPjPP
jcjs
mjm
![Page 48: 666.09 -- Haplotypingcsg.sph.umich.edu/abecasis/class/666.09.pdf · “Guesstimate” haplotype frequencies 2. Use current frequency estimates to replace ambiguous genotypes with](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c2fadf1d1e214007d2ed3/html5/thumbnails/48.jpg)
Notation used inExcoffier and Slatkin paper
( )
n'combinatio genotype observed' with phenotype'' replacing try clarity,For
iterationnext for sfrequencie allele)(21
genotypefor proportion fitted)(
)(
phenotype ofy probabilit)(
shomozygote and tesheterozygofor genotype ofy probabilit
)(
roundat haplotype offrequency ...,
1 1
)()1(
)(
)()(
1
)()(
2)(
)()()(
)()(2
)(1
∑∑
∑
= =
+
=
=
=
=
⎩⎨⎧
=
m
j
c
ilk
gjit
gt
lkgj
lkg
jjlk
g
c
iilik
gj
gj
lkg
k
gl
gk
lkg
j
gh
gg
j
j
hhPp
hhP
hhPnn
hhP
jhhPP
hhp
pphhP
ghppp
δ