introduction to haplotype estimation

49
Introduction to Haplotype Estimation Stat/Biostat 550

Upload: trixie

Post on 11-Jan-2016

74 views

Category:

Documents


2 download

DESCRIPTION

Introduction to Haplotype Estimation. Stat/Biostat 550. The Haplotype Problem. Suppose we genotype individuals at a number of tightly linked SNPs. A. C. G. C. C. T. T. T. G. C. G. C. G. A. A. C. C. C. C. C. A. G. G. C. The Haplotype Problem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Haplotype Estimation

Introduction to Haplotype Estimation

Stat/Biostat 550

Page 2: Introduction to Haplotype Estimation

The Haplotype Problem

• Suppose we genotype individuals at a number of tightly linked SNPs.

A C G C C T T T G C G C

G A A C C C C C A G G C

Page 3: Introduction to Haplotype Estimation

The Haplotype Problem

• Suppose we genotype individuals at a number of tightly linked SNPs.

A C G C C T T T G C G C

G A A C C C C C A G G C

Page 4: Introduction to Haplotype Estimation

The Haplotype Problem

• Suppose we genotype individuals at a number of tightly linked SNPs.

Page 5: Introduction to Haplotype Estimation

The Haplotype Problem

• What do the types on the two chromosomes look like?

Page 6: Introduction to Haplotype Estimation

The Haplotype Problem

• What do the types on the two chromosomes look like?

Page 7: Introduction to Haplotype Estimation

The Haplotype Problem

• What do the types on the two chromosomes look like?

Page 8: Introduction to Haplotype Estimation

The Haplotype Problem

• What do the types on the two chromosomes look like?

Page 9: Introduction to Haplotype Estimation

The Haplotype Problem

• What do the types on the two chromosomes look like?

Page 10: Introduction to Haplotype Estimation

Haplotypes: who cares?

• LD mapping: increase power?

• LD mapping: decrease genotyping?

• Evolutionary studies: selection, recombination, gene conversion, population structure,…

Many people, for many different reasons…

Page 11: Introduction to Haplotype Estimation

The Haplotype Problem – potential solutions

• Molecular methods

• Collect family data

• Statistical methods for population data

Page 12: Introduction to Haplotype Estimation

The Simplest Case

• What do the types on the two chromosomes look like?

Page 13: Introduction to Haplotype Estimation

The Next Simplest Case

• What do the types on the two chromosomes look like?

Page 14: Introduction to Haplotype Estimation

The Next Simplest Case

• What do the types on the two chromosomes look like?

Page 15: Introduction to Haplotype Estimation

The first difficult case…

• What do the types on the two chromosomes look like?

Page 16: Introduction to Haplotype Estimation

The first difficult case…

• What do the types on the two chromosomes look like?

Page 17: Introduction to Haplotype Estimation

Clark’s Method (1990)

• Idea: use information obtained from other individuals in the population to determine the most probable haplotype pair.

Page 18: Introduction to Haplotype Estimation

Is it this configuration?

1

2

3

Page 19: Introduction to Haplotype Estimation

…or this one?

1

2

3

Page 20: Introduction to Haplotype Estimation

This one is more probable.

1

2

3

Page 21: Introduction to Haplotype Estimation

Clark’s Method (Clark, 1990)

• Identify the unambiguous individuals.

• Make a list of “known” haplotypes.

• Go through list, and see whether ambiguous individuals can be made up from a “known” haplotype plus another “complementary” haplotype. If so, add the complementary haplotype to the list of “known” haplotypes.

Page 22: Introduction to Haplotype Estimation

Clark’s Method

List of known haps.1

2

3

Page 23: Introduction to Haplotype Estimation

Clark’s Method

List of known haps.1

2

3

Page 24: Introduction to Haplotype Estimation

Clark’s Method: Problem 1

3

1

2

Page 25: Introduction to Haplotype Estimation

Clark’s Method: Problem 1

List of known haps.1

2

3

Page 26: Introduction to Haplotype Estimation

Clark’s Method: Problem 1

List of known haps.1

2

3

Page 27: Introduction to Haplotype Estimation

Clark’s Method: Problem 1

List of known haps.1

2

3

Page 28: Introduction to Haplotype Estimation

Clark’s Method: Problem 1

List of known haps.1

2

3

Page 29: Introduction to Haplotype Estimation

Clark’s Method: Problem 1

List of known haps.1

2

3

Answer depends on order list is considered….

… and frequency information is ignored

Page 30: Introduction to Haplotype Estimation

Clark’s Method: Problem 2

3

1

2

Page 31: Introduction to Haplotype Estimation

Clark’s Method: Problem 2

3

1

2

List of known haps.

Algorithm can fail to resolve all haplotypes…

… because looks only for exact matches

Page 32: Introduction to Haplotype Estimation

Clark’s Algorithm: Summary

• Results may depend on order individuals are considered.

• Frequency information is ignored.

• May fail to resolve all haplotypes.

• Fails to assess uncertainty.

• Looks only for exact matches.

• Fast and intuitive(?).

Page 33: Introduction to Haplotype Estimation

Maximum Likelihood (EM Algorithm)

• Idea: find haplotype frequencies (f1,…fN) to maximise probability of observed genotype data (g1,…,gn).

}21:2,1{ 211 ),...|Pr(ighhhh hhNi ffffg

),...|Pr(),...|,...,Pr( 111 Ni

iNn ffgffgg

Page 34: Introduction to Haplotype Estimation

Bayesian version

• Replace single pass through data, with iterative scheme.

• Allow for uncertainty in resolution.

• Use frequency information.

Resulting “naïve Gibbs sampler” produces results similar to EM (Stephens, Smith and Donnelly 2001).

Modify Clark’s algorithm:

Page 35: Introduction to Haplotype Estimation

Example

List of known haps.1

2

3Matches 1 known

Does not match any

31

Assigned moderate probability

Page 36: Introduction to Haplotype Estimation

Example

List of known haps.1

2

3Matches 3 known

Does not match any

31

Assigned higher probability

Page 37: Introduction to Haplotype Estimation

Example

List of known haps.1

2

3Does not match any

Does not match any

31

Assigned low probability

Page 38: Introduction to Haplotype Estimation

Problems with EM/naïve Gibbs

• Potentially (very) large number of parameters to estimate, leading to inaccurate estimates.

• Can be time-consuming for large problems.

• Can “converge” to poor local optima (alleviated by multiple runs).

Page 39: Introduction to Haplotype Estimation

Further modification

• Take into account “near misses”, as well as exact matches.

(PHASE v1.0: Stephens, Smith and Donnelly 2001)

Page 40: Introduction to Haplotype Estimation

Example

List of known haps.1

2

3Matches 1 known

Differs by 2 from 3 known

31

Page 41: Introduction to Haplotype Estimation

Example

List of known haps.1

2

3Matches 3 known

Differs by 2 from 1 known

31

Page 42: Introduction to Haplotype Estimation

Example

List of known haps.1

2

3Differs by 1 from 3 known

Differs by 1 from 1 known

31

How to balance these possibilities?

Page 43: Introduction to Haplotype Estimation

The key question

• What is the conditional distribution of the next haplotype, given a set of known haplotypes?

Page 44: Introduction to Haplotype Estimation

Example

1

2

Given the above haplotypes, what would you expect the next haplotype to look like?

Page 45: Introduction to Haplotype Estimation

Qualitative answer

• The next haplotype will likely differ by a small number of mutations (possibly 0 mutations) from a (randomly-chosen) existing haplotype.

• Use theory (Ewens sampling formula; coalescent theory) to roughly quantify the distribution of the “small number”.

Page 46: Introduction to Haplotype Estimation

Comparisons on simulated data

Page 47: Introduction to Haplotype Estimation
Page 48: Introduction to Haplotype Estimation

Problems

• Time-consuming for large problems.

• Can “converge” to poor local optima.

• Ignores recombination (decay of LD with distance).

• How should uncertainty in haplotype estimates be treated?

Page 49: Introduction to Haplotype Estimation

… to be continued.