lambda-coalescents and population genetic inference...neutral mutation models computing likelihoods...
TRANSCRIPT
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Lambda-coalescents and population geneticinference
Matthias {Birkner1, Steinrucken2}1University of Munich and 2TU Berlin
Joint project with
Jochen Blath2
Eindhoven, 25th/27th March 2009
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Outline
Introduction
Beta(2 − α,α)-coalescents
Dawson-Watanabe and Fleming-Viot processes, time-changerelation
Mutation models: Infinitely-many-alleles, infinitely-many-sites
Computing likelihoods under Λ-coalescents
Importance sampling methods
Summary & Outlook
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Genetic variability at the mitochondrial cyt b locus in
Atlantic cod
468 481 487 488 490 496 508 523 562 601 631 643 649 685 691
66 t a a c a a t g a t g a c c g
17 - - - - - - c - - - - - - - -
14 - - - - - - - a - - - - - t -
8 - - - - - - - - - - - - - - t
1 - - - - - - - - - - - - - t -
2 - - - t - - - - - - - - - - -
1 - - - - - - - a - - - g - t -
1 - - - - - - - - - - - - t - -
1 - - - - g - c - - - - - - - -
1 - - - - - g - - - - - - - - -
1 - - g - - - - - - - a - - - t
1 - - - - - - c - g - - - - - -
1 g - - - - - - - - - - - - - -
1 - - - - - - - - - c - - - - -
1 - c - - - - c - - - - - - - -
(a random subsample of the sample described in Arnason, Genetics 2004)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
The Great Obsession of population geneticists (J. Gillespie)
What evolutionary forces could have lead to such
divergence between individuals of the same species?
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
The Great Obsession of population geneticists (J. Gillespie)
What evolutionary forces could have lead to such
divergence between individuals of the same species?
In this talk, we will focus on neutral genetic variation, and thus theinterplay of mutation and genetic drift.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Wright-Fisher model: The fundamental model for ‘genetic drift’
A (haploid) population of N individuals per generation,each individual in the present generation picks a ‘parent’ atrandom from the previous generation,genetic types are inherited (possibly with a small probability ofmutation).
past
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Wright-Fisher model: The fundamental model for ‘genetic drift’
A (haploid) population of N individuals per generation,each individual in the present generation picks a ‘parent’ atrandom from the previous generation,genetic types are inherited (possibly with a small probability ofmutation).
past
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Wright-Fisher model: The fundamental model for ‘genetic drift’
A (haploid) population of N individuals per generation,each individual in the present generation picks a ‘parent’ atrandom from the previous generation,genetic types are inherited (possibly with a small probability ofmutation).
past
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Wright-Fisher model: The fundamental model for ‘genetic drift’
A (haploid) population of N individuals per generation,each individual in the present generation picks a ‘parent’ atrandom from the previous generation,genetic types are inherited (possibly with a small probability ofmutation).
past
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Genealogical point of view
Sample n (≪ N) individuals from the ‘present generation’
past
present
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Kingman’s coalescent
Theorem (Kingman (& Hudson, Griffiths, ...), 1982)
In the limit N → ∞, the genealogy of an n-sample, measured in units of N generations, isdescribed by a continuous-time Markov chainwhere each pair of lineages merges at rate 1.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Kingman’s coalescent
Theorem (Kingman (& Hudson, Griffiths, ...), 1982)
In the limit N → ∞, the genealogy of an n-sample, measured in units of N generations, isdescribed by a continuous-time Markov chainwhere each pair of lineages merges at rate 1.
Robustness. The same limit appears for any exchangeable offspringvectors
(ν1, . . . , νN), (independent over generations),
if time is measured in units ofN
σ2generations, where
σ2 = limN→∞
Var(ν1)
(under a third moment condition on ν1).
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Modeling neutral variation: Superimposing types on the
coalescent
Assume that the considered genetic typesdo not affect their bearer’s reproductive succes.
If as population size N → ∞,N
σ2× mutation prob. per ind. per generation → r ,
the type configuration in the sample can be described by puttingmutations with rate r along the genealogy.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Modeling neutral variation: Superimposing types on the
coalescent
Assume that the considered genetic typesdo not affect their bearer’s reproductive succes.
If as population size N → ∞,N
σ2× mutation prob. per ind. per generation → r ,
the type configuration in the sample can be described by puttingmutations with rate r along the genealogy.
Kingman’s coalescent is the standard model of mathematicalpopulation genetics.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Question
What if the variability of offspring numbers across individuals is solarge that reasonably
σ2 ≈ ∞ ?
This might happen e.g. in marine species (so-called reproduction
sweepstakes).
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Coalescents with multiple collisions, aka ‘Λ-coalescents’
While n lineages, any k coalesce at rate
λn,k =
∫
[0,1]xk−2(1 − x)n−k Λ(dx), where Λ is a finite measure on
[0, 1]. (Sagitov, 1999; Pitman, 1999).
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Coalescents with multiple collisions, aka ‘Λ-coalescents’
While n lineages, any k coalesce at rate
λn,k =
∫
[0,1]xk−2(1 − x)n−k Λ(dx), where Λ is a finite measure on
[0, 1]. (Sagitov, 1999; Pitman, 1999).
Interpretation:re-write λn,k =
∫[0,1] x
k(1 − x)n−k 1x2 Λ(dx) to see:
at rate 1x2 Λ([x , x + dx ]), an ‘x-resampling event’ occurs.
Thinking forwards in time, this corresponds to an event in whichthe fraction x of the total population is replaced by the offspring ofa single individual.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Coalescents with multiple collisions, aka ‘Λ-coalescents’
While n lineages, any k coalesce at rate
λn,k =
∫
[0,1]xk−2(1 − x)n−k Λ(dx), where Λ is a finite measure on
[0, 1]. (Sagitov, 1999; Pitman, 1999).
Interpretation:re-write λn,k =
∫[0,1] x
k(1 − x)n−k 1x2 Λ(dx) to see:
at rate 1x2 Λ([x , x + dx ]), an ‘x-resampling event’ occurs.
Thinking forwards in time, this corresponds to an event in whichthe fraction x of the total population is replaced by the offspring ofa single individual.
Form of rates stems from λn,k = λn+1,k + λn+1,k+1 (consistencycondition).
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Coalescents with multiple collisions, aka ‘Λ-coalescents’
While n lineages, any k coalesce at rate
λn,k =
∫
[0,1]xk−2(1 − x)n−k Λ(dx), where Λ is a finite measure on
[0, 1]. (Sagitov, 1999; Pitman, 1999).
Interpretation:re-write λn,k =
∫[0,1] x
k(1 − x)n−k 1x2 Λ(dx) to see:
at rate 1x2 Λ([x , x + dx ]), an ‘x-resampling event’ occurs.
Thinking forwards in time, this corresponds to an event in whichthe fraction x of the total population is replaced by the offspring ofa single individual.
Form of rates stems from λn,k = λn+1,k + λn+1,k+1 (consistencycondition).Note: Λ = δ0 corresponds to Kingman’s coalescent.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Cannings’ models in the
‘domain of attraction of a Λ-coalescent’
Fixed population size N, exchangeable offspring numbers in onegeneration (
ν1, ν2, . . . , νN
).
Sagitov (1999), Mohle & Sagitov (2001) clarify under whichconditions the genealogies of a sequence of exchangeable finitepopulation models are described by a Λ-coalescent:
cN := pair coalescence probability over one generation → 0
( cN = 1N−1E[ν1(ν1 − 1)] )
two double mergers asymptotically negligible compared to onetriple merger
NcN Pr(a given family has size ≥ Nx
)∼
∫ 1x
y−2Λ(dy)
Time is measured in 1/cN generations (in general 6= 1/pop. size)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Note: There are many Λ-coalescents.
Maybe a natural “first candidate”:
Λ = wδ0 + (1 − w)δψ with w , ψ ∈ (0, 1)
(as considered by Eldon & Wakeley, Genetics 2006)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
A ‘heavy-tailed’ Cannings model and Beta-coalescents
Haploid population of size N. Individual i has Xi potential
offspring, X1,X2, . . . ,XN are i.i.d. with mean m := E[X1
]> 1,
Pr(X1 ≥ k
)∼ Const. × k−α with α ∈ (1, 2).
Note: infinite variance.
Sample N without replacement from all potential offspring to formthe next generation.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
A ‘heavy-tailed’ Cannings model and Beta-coalescents
Haploid population of size N. Individual i has Xi potential
offspring, X1,X2, . . . ,XN are i.i.d. with mean m := E[X1
]> 1,
Pr(X1 ≥ k
)∼ Const. × k−α with α ∈ (1, 2).
Note: infinite variance.
Sample N without replacement from all potential offspring to formthe next generation.
Theorem (Schweinsberg, 2003)Let cN = prob. of pair coalescence one generation back in N-thmodel.cN ∼ const. N1−α, measured in units of 1/cN generations, thegenealogy of a sample from the N-th model is approximatelydescribed by a Λ-coalescent with Λ = Beta(2 − α,α).(
Beta(2 − α,α)(dx) = 1[0,1](x) 1Γ(2−α)Γ(α) x
1−α(1 − x)α−1 dx)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Why Λ = Beta(2 − α, α)?
Heuristic argument:
Probability that first individual’s offspring provides
more than fraction y of the next generation,
given that the family is substantial (i.e. given X1 ≥ εN, for y > ε)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
Why Λ = Beta(2 − α, α)?
Heuristic argument:
Probability that first individual’s offspring provides
more than fraction y of the next generation,
given that the family is substantial (i.e. given X1 ≥ εN, for y > ε)
≈ P
( X1
X1 + (N − 1)m≥ y
∣∣∣X1 ≥ εN)
= P
(X1 ≥ (N − 1)m
y
1 − y
∣∣∣X1 ≥ εN)
∼ const.(1 − y)α
yα= const.’ Beta(2 − α,α)([y , 1]).
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
The Great ObsessionWright-Fisher model and Kingman’s coalescentMultiple merger coalescentsThe Beta(2 − α, α)-class
The family Beta(2 − α, α), α ∈ (1, 2]
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
6
1.91.51.1
Kingman’s coalescent includedas boundary case:Beta(2 − α,α) → δ0 weakly as α→ 2.
Smaller α means tendency towardsmore extreme resampling events.
For α ≤ 1, corresponding coalescentsdo not come down from infinity.
Beta(2 − α,α)-coalescents appear as genealogies of α-stablecontinuous mass branching process (via a time-change).
Scaling relation of mutation rate per generation relative topopulation size depends on α!
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Free branching: Galton-Watson processes
individuals have a random number of offspring, independently,according to a fixed probability distribution
the totality of offspring forms the next generation
da capo ...
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Galton-Watson forests (with ordered individuals)tim
e
ξt(n) := # descendants of founders 1, . . . , n alive in generation t .For fixed n, ξt(n), t ∈ Z+ is a Galton-Watson process, ξ0(n) = n.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Free branching & neutral typestim
e
Example: there are 2 neutral types, red and blue.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Fixed population size: Cannings models (reprise)
fixed population size N
individuals have a random number of offspring,the offspring numbers (ν1, . . . , νN) are exchangeable with∑N
i=1 νi = N
the totality of offspring forms the next generation
da capo ...
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings modeltim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings modeltim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings modeltim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings modeltim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings modeltim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings modeltim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings modeltim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings modeltim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings modeltim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings modeltim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings modeltim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings modeltim
e
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Genealogies in a Cannings model: Adding typestim
e
We can again think of neutral types (for the moment, withoutmutation), e.g. red and blue.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Question
Relation between
and ?
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Relations?
A first answer
Condition a Galton-Watson process to have constant
population size N, obtain a model from Cannings’ class.
Easy, but is there more?
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Relations?
In a Cannings model, we think ofrelative frequencies of types.
We can do the same in a free branching model by normalising withthe current total population size (at least before extinction), i.e. byconsidering
ξt(n) :=ξt(n)
ξt(N).
/ ≈ ?
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Relations?
Yes, in a suitable limit of population size N → ∞.
20 40 60 80 100 120
20
40
60
80
100
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Continuous state branching processes (Jirina, Silverstein, Lamperti, ... )
Z (N) = (Z(N)k )k=0,1,2,..., N ∈ N a sequence of Galton-Watson
processes (possibly with offspring distributions depending on N),
mN → 0 mass rescaling,
Z(N)0 = [m−1
N ] ( + conditions ... ).
(mNZ
(N)[Nt]
)t≥0
=⇒ X ,
where X is a continuous state branching process, i.e. an R+-valuedMarkov process that enjoys the branching property:
X ,X ′,X ′′ independent copies, X0 = x ,X ′0 = x ′,X ′′
0 = x ′′, x = x ′ + x ′′
=⇒ Xlaw= X ′ + X ′′.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Continuous state branching processes: Favourite example
Assume the N-th offspring distribution has
mean = 1 + µ/N + o(1/N),
variance = σ2 + o(1)
(and, say, uniformly bounded third moment).
Z(N)0 = N, then
(N−1Z
(N)[Nt]
)t≥0
⇒ Feller’s branching diffusion,
generator
L(2)f (z) =1
2σ2zf ′′(z) + µzf ′(z).
(i.e. Var[Zt+∆t −Zt|Zt ] ≈ σ2Zt∆t, E[Zt+∆t −Zt|Zt ] ≈ µZt∆t.)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Continuous state branching processes: stable case, α < 2
If the approximating offspring distributions are in the domain ofattraction of a stable law of index α ∈ (0, 2), i.e.
P(more than n children) ∼ Const. × n−α
(note: in particular, no variance),the limit process Z will have discontinuous paths. Generator
L(α)f (z) = cαzf ′(z)+z
∫
(0,∞)
{f (z+h)−f (z)−h1(0,1](h)f ′(z)
}h−1−αdh.
Interpretation:if the present mass is z , a new litter of mass h is produced at ratez × h−1−α.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Stable CSBPes, some properties
Why the name?
Lamperti’s construction connects them with stable Levy
processes: Xt = Y(∫ t
0 Xs ds), where Y is a stable process
without negative jumps, stopped upon hitting 0.
Scaling properties
Equation for Laplace transforms
Existence of mean:α > 1 ⇒ Ex Xt <∞,α < 1 ⇒ Ex Xt = ∞ for all t > 0.
Extinction/Explosion:α < 1 ⇒ X has growing paths, explodes in finite time,α > 1 ⇒ X becomes extinct in finite time.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Dawson-Watanabe superprocesses
For our purposes here: the continuous mass & timeanalogue of the “ragged boundary” picture:
Z a given CSBP. The corresponding Dawson-Watanabe
superprocess∗ is a process X with values in the measures on [0, 1]such that
for B ⊂ [0, 1]: (Xt(B))t≥0law= Z , started from Z0 = |B |,
X·(B1),X·(B2), . . . ,X·(Bn) are independent if B1, . . .Bn aredisjoint.
Interpretation: Xt(B) = mass of descendants alive at time t
whose ancestors where ∈ B∗Note: this is a “toy version”.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Λ-Fleming-Viot processes (Donnelly & Kurtz, Bertoin & Le Gall)
For our purposes here: the continuous mass & timeanalogue of the “straight boundary” picture:
(ρt)t≥0 a Markov process with values in the probability measureson [0, 1]. Interpretation (if ρ0 = uniform measure):
ρt(B) =fraction of mass alive at time t whose ancestors attime 0 where in B ⊂ [0, 1].
On test functions of the formF (ρ) =
∫. . .
∫ρ(dx1) . . . ρ(dxp)f (x1, . . . , xp), the generator acts as
LFV ,ΛF (ρ) =∑
J⊂{1,...,p},
|J|≥2
λp,|J|
∫. . .
∫ρ(dx1) . . . ρ(dxp)
(f (aJ
1 , . . . , aJp) − f (a1, . . . , ap)
).
↑aJi = amin J if i ∈ J
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Λ-FV processes: Interpretation of “non-Kingman” part
N a Poisson point process on [0,∞) × (0, 1] with intensitymeasure dt ⊗ y−2Λ(dy).
0 20 40 60 80 100
0.00.2
0.40.6
0.81.0
· · ·· ··
·
·· ··· ·· ···· · ·
··
··
·· ·· ·· ·
·
·
·
··
··
·· · · ·····
··
··· ···
·
·
··
·· ···
·
· · ·
·
· ·· ·· ··
··· ··
· ·· ·· ·· · ··· ·· ·· ···
· ·
·
·· · ·· ···· ··· ·· ·
·
···· ·· ··· ··· ··· ···
··
· ··
· ·
· · ·
·
· · ·
·
·
· ·· · ··
·· ·
·
··
·
··
· · ·· ·
·· ·· ·
·
· · ···
·
·
· · ···
··
· · · ······ · ··
·
·
·
··· · ·
··
· ··· ·· ···
·
·· ·· · · ·
·
·· · ··· ··· ··
·
· ·· ·· ·
·
··
·
· · ·
··
· ··
·
··· · ··
·
·
· ·
· ···
· ·· · ·····
· ···· · ··· ··
·
· ··
·
·
·· · ·
·
· ·
·
·
·
···· ·· ·
·
·
·
·
· ·· ··· · ··· ··
· ·· ·
·· ·
· ··· · ·· ··
·
·· ··· ··
· ···
·· ··
·
· · ·· ··
··· · ·· · ·
·
·· ·· ··· ··
·
··· · ····· ···
·· ··
·
· ·
·
· ···· ··
···
·
·
··· · ··· ··
··
· ···
·
· · ···· ·· · ·
·
·· ·· ·
·
·
·
··
·
···
·
·
···
·
· ·· ·
·
···
·
··
·
··· ··
·
· ···· ·· · ···
·
· ·· ··
··
·
·· ·
·
···
·
·
·
·
· ···
·· · ·· ·· ·
·
·
·
··· ·· ···
··
·
··
· · ··
·
···· · ··
·
· ·· ·
·· ·
·
·
·
·
·
··
·
··· ··
·
···
·
·
·
· ·· ·· · ·
·
··
·
······
·
· ··
·
·· ·· · · ·· ····
··· · ·
·
·
·
··
···
·
·
··
···
····
·· · ·· ·
··
·
·
·
·
·
·· ·
··
· · ··· ··· · ··
·
··· ·
·
···
· ···· ·
·
···
·
· ··· ·· · ··
·
·
·
· ··· · ·· ·· ·
·
·· · ·
·
· ·
·
· ·
·
· ····
· ··
· ·
·
·
·
·
· ···· ·· ·· ·
·
·
·
·· ·
·
·· ·· ·· ·· · · · ·· · ·
·
··
··· ·
·
·
·
·
·
··
·· ··
·
··· · ·· ··
·
· ·
·
·· ·
·
·· · ··· ··
·· ···
·
···
· ···· ·
···
··
·· ·· ·· ·
·
·
·
··
·
·· ·· · ·
·
·· ·
·
·
· ·
· ·
· ··
·
· ··
···
·
·
·
···
·
·· · ···
·
· · ··
·
· ·· · ·· · ··
·
··· ·· ·
·
·· · ·· ·· ·
·
· · ····
· ···
·
· ··· ··
·
···
·
·· ·· · ··· ·· ··
·
··· ··· ·· ·
··
· · ·· ···
·
· ·· ··· ·· · · · ·
·
·
·
···
·
·· ·
y
time
If (t, y) is a point of N , an individual X is chosen according to thecurrent ρ(t−), and ρ→ yδX + (1 − y)ρ.
Note: alternative form of generator
LFV ,λG(ρ) =∫
y−2Λ(dy)∫ρ(dx)
(G(yδx + (1 − y)ρ) − G(ρ)
)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Duality between coalescent and FV
For a partition γ of {1, . . . , p} with |γ| classes and a probabilitymeasure ρ on [0, 1] consider test functions of the type
Φf (γ, ρ) :=
∫ρ(dx1) . . . ρ(dx|γ|)f
(~y(γ; x1, . . . , x|γ|)
)
↑yj = xi if j ∈ i -th class of γ
(“assign an independent ρ-sample to each class of γ”), wheref : [0, 1]p → R.
Then we have with(Γ
(p)t ) . . . . . . . . restriction of Λ-coalescent to {1, . . . , p},
(ρt) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Λ-FV process
E
[Φf (Γ
(p)0 , ρt)
]= E
[Φf (Γ
(p)t , ρ0)
]for all t ≥ 0.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Sample interpretation of the duality
E
[Φf (Γ
(p)0 , ρt)
]= E
[Φf (Γ
(p)t , ρ0)
]ge
nerat
ions
20 40 60 80 100
20
40
60
80
100
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Galton-Watson processes and Cannings’ modelsContinuous state branching processesΛ-Fleming-Viot processesDuality
Pathwise duality via Donnelly & Kurtz’ (modified)
lookdown construction
new particleat level 3
new particleat level 6
post−birth types
pre−birth types
a
b
a
g
b
c
e
f
d
g
b
c
d
b
e
f
7
6
5
4
3
2
1
pre−birth labels
post−birth labels
9
8
7
6
5
4
3
2
1
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
TheoremHeuristics (discont. case)
Let (Xt) be a Dawson-Watanabe process (for simplicity, type space[0, 1], no mutation), Zt := Xt([0, 1]) its total mass process,Rt := Xt/Zt .
Theorem (B., Blath, Capaldo, Etheridge, Mohle, Schweinsberg,Wakolbinger (2005), Hiraba (2000), Perkins (1991))The ratio process (Rt)0≤t<τ can be time-changed with an additivefunctional of the total mass process (Zt) to obtain a Markovprocess if and only if
Z is a CSBP of some index α ∈ (0, 2].
If α = 2, Tt =∫ t
0 σ2Z−1
s ds and T−1(t) = inf{s : Ts > t}. Theprocess (RT−1(t))t≥0 is the classical (non-spatial) Fleming-Viotprocess, dual to Kingman’s coalescent.If α ∈ (0, 2), Tt = const ·
∫ t
0 Z 1−αs ds and (RT−1(t))t≥0 is the
Beta(2 − α,α)-Fleming-Viot process.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
TheoremHeuristics (discont. case)
Let X and X ′ be two independent CSBP’s with the samecharacteristics (ν), St := Xt + X ′
t , Rt := Xt/St .
At rate St−ν(dh)dt, a new family of size h > 0 is created , therelative mass of the newborns is y := h/(St− + h), so
∆Rt =
{y(1 − R) with probability Rt− and−yR with probability (1 − Rt−) .
To eliminate the dependence of the relative jump size y on thecurrent total population size St−, ν must satisfy
sν({h : h/(h + s) > y}) = sν({h : h > sy/(1 − y)}) = g(s)f (y)
for all s, y > 0 for some functions f , g .
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
TheoremHeuristics (discont. case)
‘Meta-mathematic’ associations
Brownian motion ↔ Kingman’s coalescent∩ ∩
Stable processes ↔ Beta(2 − α,α)-coalescents∩ ∩
General Levy processes ↔ General Λ-coalescents
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
A computer experimentInfinitely many allelesInfinitely many sites(Λ-) Ethier-Griffiths’ urn
Playing god with simulated “full trees”
1.0
1.0
true α
αM
L,fulltree
1.2
1.2
1.4
1.4
1.6
1.6
1.8
1.8
2.0
2.0
ML estimates of α for simulted datasets with sample size n = 100,estimate based on full genealogical tree
(400 replicates for each value of α).
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
A computer experimentInfinitely many allelesInfinitely many sites(Λ-) Ethier-Griffiths’ urn
Consider n-Λ-coalescent with mutation rate r per line (and infinitealleles mutation model). n = (n1, . . . , nℓ), possible type
configuration
Theorem (Mohle) The probability p(n) of observing a typeconfiguration n = (n1, . . . , nℓ) satisfies the recursion given byp(1) = 1 and
p(n) =nr∑n
k=2
(nk
)λn,k + nr
ℓ∑
j=1nj =1
1
ℓp(n(j))
+1∑n
k=2
(nk
)λn,k + nr
n∑
k=2
ℓ∑
j=1nj≥k
(n
k
)λn,k
nj − k + 1
n − k + 1p(n − (k − 1)ej ).
(n(j) = (n1, . . . , nj−1, nj+1, . . . , nk) )
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
A computer experimentInfinitely many allelesInfinitely many sites(Λ-) Ethier-Griffiths’ urn
Watterson’s infinitely many sites model
Model genetic locus as infinite sequence of completely linked sites,mutations always hit a new site.
Mathematical abstraction:
a gene is [0, 1]
a type is a configuration of points on [0, 1]
Ethier & Griffiths (1987) parametrisation:
type space E = [0, 1]N
mutation operator
Bf((x1, x2, . . . )
)= r
∫ 1
0f((u, x1, x2, . . . )
)−f
((x1, x2, . . . )
)du
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
A computer experimentInfinitely many allelesInfinitely many sites(Λ-) Ethier-Griffiths’ urn
Asymptotics of the frequency spectrum
Consider an n-Beta(2 − α,α)-coalescent, mutations at rate r
according to the infinitely-many-sites model (assuming knownancestral types). Let
M(n) := #total number of mutations in the sample,
Mk(n) := #number of mutations affecting exaktly k samples,
k = 1, 2, . . . , n − 1.Theorem (Berestycki, Berestycki & Schweinsberg)
M(n)
n2−α→ r
α(α− 1)Γ(α)
2 − α,
Mk(n)
n2−α→ rα(α− 1)2
Γ(k + α− 2)
k!
in probability as n → ∞.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
A computer experimentInfinitely many allelesInfinitely many sites(Λ-) Ethier-Griffiths’ urn
Asymptotics of the frequency spectrum
Consider an n-Beta(2 − α,α)-coalescent, mutations at rate r
according to the infinitely-many-sites model (assuming knownancestral types). Let
M(n) := #total number of mutations in the sample,
Mk(n) := #number of mutations affecting exaktly k samples,
k = 1, 2, . . . , n − 1.Theorem (Berestycki, Berestycki & Schweinsberg)
M(n)
n2−α→ r
α(α− 1)Γ(α)
2 − α,
Mk(n)
n2−α→ rα(α− 1)2
Γ(k + α− 2)
k!
in probability as n → ∞.
Thus M1(n)/M(n) ≈ 2 − α for n large, which suggests
αBBS := 2 −M1(n)
M(n)as an estimator for α.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
A computer experimentInfinitely many allelesInfinitely many sites(Λ-) Ethier-Griffiths’ urn
Infinitely-many-sites model
Model genetic locus as infinite sequence of completely linked sites,mutations always hit a new site
Example: segr. siteSeq. 1 2 3 41 1 0 0 02 1 1 0 03 0 0 1 14 0 0 1 15 0 0 1 0
(0=wild type, 1=mutant
assume known ancestral types)
Obs. fit IMS ⇐⇒ no sub-matrix1 01 10 1
aa bb cc(nor row permutation).
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
A computer experimentInfinitely many allelesInfinitely many sites(Λ-) Ethier-Griffiths’ urn
Infinitely-many-sites model, II
If the infinitely-many-sites model applies, the observationscorrespond to a unique rooted perfect phylogeny (or ‘genetree’).
Sequences, Genetree, obs. types
segr. siteSeq. 1 2 3 41 1 0 0 02 1 1 0 03 0 0 1 14 0 0 1 15 0 0 1 0
1
2
3
4
1 2 3,4 5
type multiplicity(1, 0) 1(2, 1, 0) 1(4, 3, 0) 2(3, 0) 1
Construct e.g. using Gusfield’s (1991) algorithm.
Note: purely combinatorial, does not depend on a probabilisticmodel for the observations.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
A computer experimentInfinitely many allelesInfinitely many sites(Λ-) Ethier-Griffiths’ urn
Simulating samples under the IMS model
The Ethier-Griffiths urn (1987) can be used to generate arandom sample of size n under Kingman’s coalescent(with mutation rate r per line):
Start with 2 leaves.
When there are k leaves:
Add a mutation to a leaf w. prob. 2r2r+(k−1) ,
split one leaf w. prob. k−12r+(k−1)
(leaf picked uniformly among the k).
Stop when n + 1 leaves, delete last leaf.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
A computer experimentInfinitely many allelesInfinitely many sites(Λ-) Ethier-Griffiths’ urn
Simulating samples under the IMS model: Λ-case
(Y(n)t )≥0 block counting process of Λ-coalescent starting from n
blocks:
Jump from i to j ∈ {1, 2, . . . , i − 1} at rateqij :=
(i
i−j+1
)λi ,i−j+1.
τ1 := inf{t ≥ 0 : Y(n)t = 1}.
Y(n)t := Y
(n)(τ1−t)− time-reversed block counting process
(Y(n)t = ∂ for t ≥ τ1).
Jump rates q(n)ji = gniqi j
gnj, q
(n)n∂ = −qnn =
∑n−1j=1 qnj ,
P(Y(n)0 = k) = P(Y
(n)τ1−
= k) = gnkqk1.
gni := E∫ ∞0 1(Y
(n)t = i) dt is the Green function (in general,
not known explicitly, but easy recursion).
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
A computer experimentInfinitely many allelesInfinitely many sites(Λ-) Ethier-Griffiths’ urn
Simulating samples under the IMS model: Λ-case, cont.
The n-Λ-“Ethier-Griffiths urn” (mutation rate r).
Begin with K leaves, P(K = k) = P(Y(n)0 = k).
While there are k leaves:
Add a mutation to a leaf w. prob. r
kr−eq(n)kk
,
split one leaf into ℓ w. prob.eq(n)
k,k+ℓ−1
eq(n)kk
,
if k = n goto stop w. prob. −eq(n)nn
kr−eq(n)nn
(leaf picked uniformly among the k).
Stop.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Recursion for tree probabilities
Can calculate P(r ,Λ)
(observed sequence data (t,n)
)=: p(t,n) via
1 2
34
0
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Recursion for tree probabilities
Can calculate P(r ,Λ)
(observed sequence data (t,n)
)=: p(t,n) via
1 2
34
0
p(t,n) =1
rn + λn
∑
i :ni≥2
ni∑
k=2
(n
k
)λn,k
ni − k + 1
n − k + 1p(t,n− (k − 1)ei
)
+r
rn + λn
∑
i :ni =1,xi0unique,
s(xi )6=xj∀j
p(si (t),n
)
+r
rn + λn
1
d
∑
i :ni =1,
xi0unique
∑
j:s(xi )=xj
(nj + 1)p(ri (t), ri (n + ej)
).
Extends Ethier & Griffiths (1987) to Λ-coalescents andMohle’s recursion (2005) to IMS model.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Compute probabilities
Use exact recursions for moderate sample complexities.Approach more complex samples by version of Griffiths &Tavare’s (1994)
Monte Carlo method
p(t,n) = E(t,n)
[ τ−1∏
i=0
f(r ,Λ)(Xi)]
For suitable Markov chain Xi on sample configurations.
Estimate expectation via empirical mean of independent runs.
extension to Λ-coalescents by B. & Blath (2008)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Artifical sample of size 12 analysed with r = 1 and α = 1.5:
5.0 5.5 6.0 6.5 7.0
−2.
5−
2.0
−1.
5−
1.0
−0.
50.
0
Griffiths & Tavaré
log10(
b sdest
imate
)
log10(#runs)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
HIStory
Interpret genealogy as sequency of historical states:H =
(H−τ = ((1), (0)),Hτ−1 , . . . ,H−1,H0 = (t,n)
)
1
2
3
0
(((3, 1, 0), (1, 0), (2, 1, 0), (0)
), (1, 2, 1, 1)
)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
HIStory
Interpret genealogy as sequency of historical states:H =
(H−τ = ((1), (0)),Hτ−1 , . . . ,H−1,H0 = (t,n)
)
1
2
3
0
H0
H−1
H−2
H−3
H−4
H−5
H−6
(((3, 1, 0), (1, 0), (2, 1, 0), (0)
), (1, 2, 1, 1)
)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
HIStory
Interpret genealogy as sequency of historical states:H =
(H−τ = ((1), (0)),Hτ−1 , . . . ,H−1,H0 = (t,n)
)
1
2
3
0
1
2
3
0
(((3, 1, 0), (1, 0), (2, 1, 0), (0)
), (1, 2, 1, 1)
)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
HIStory
Interpret genealogy as sequency of historical states:H =
(H−τ = ((1), (0)),Hτ−1 , . . . ,H−1,H0 = (t,n)
)
1
2
3
0
1
2
3
0
1
2
3
0
(((3, 1, 0), (1, 0), (2, 1, 0), (0)
), (1, 2, 1, 1)
)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
HIStory
Interpret genealogy as sequency of historical states:H =
(H−τ = ((1), (0)),Hτ−1 , . . . ,H−1,H0 = (t,n)
)
1
2
3
0
1
2
3
0
1
2
3
0
different histories can lead to same sample(((3, 1, 0), (1, 0), (2, 1, 0), (0)
), (1, 2, 1, 1)
)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Importance sampling
We have
p(t,n) = P(r ,Λ)
(H0 = (t,n)
)=
∑
H:H0=(t,n)
P(r ,Λ)
(H
)
=∑
H:H0=(t,n)
P(r ,Λ)
(H
)
Q(H
)︸ ︷︷ ︸
=:w(H)importance weight
Q(H),
for any law Q on histories s.th. P(r ,Λ)
∣∣∣{H0=(t,n)}
≪ Q.
Thus,
p(t,n) ≈1
R
R∑
i=1
w(H(i)),
where H(1), . . . ,H(R) are independent samples from Q
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Simply the best
p(t,n) =∑
H:H0=(t,n)
P(r ,Λ)
(H
)
Q(H
) Q(H
)≈
1
R
R∑
i=1
w(H(i))
Qopt(·) := P(r ,Λ)
(·∣∣H0 = (t,n)
)is optimal (Stephens &
Donnelly 2000):
Variance of estimator is zero since w(H(i)) ≡ p(t,n).
Finding Qopt is as hard as the original problem.
H0,H−1, . . . is Markov chain under Qopt.
Remark: Transistion probabilities qGT(Hi |Hi+1) ∝ P(r ,Λ)(Hi+1|Hi )gives (Λ-)Griffiths-Tavare method.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Stephens and Donnelly’s (2000) IMS candidate
Kingman case: Choose individual uniformly:
If type is unique in sample, remove “outmost” mutation,
if at least two individuals with this type, merge two lines.
(optimal proposal distribution for infinitely-many-alleles model,HUW (2008))
Heuristic extension to Λ case:
(t,n) →
(si (t),n
)w.p. ∝ 1 if ni = 1, xi0 unique, si (xi ) 6= xj∀j(
ri (t, ri (n + ej )) w.p. ∝ 1 if ni = 1, xi0 unique, si (xi ) 6= xj(t,n − (k − 1)ei
)w.p. ∝ ni qni
(k) if 2 ≤ k ≤ ni ,
where qni(k) =
qn,n−k+1Pnil=2 qn,n−l+1
, jump probabilities of block countingprocess.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Artifical sample of size 12 analysed with r = 1 and α = 1.5:
5.0 5.5 6.0 6.5 7.0
−2.
5−
2.0
−1.
5−
1.0
−0.
50.
0
Griffiths & TavaréStephens & Donnelly
log10(
b sdest
imate
)
log10(#runs)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Hobolth, Uyenoyama & Wiuf’s (2008) idea
Sample of size n where exactly one mutation is visible (in d copies).
0 n-d
1 d
p(1)(r ,Λ)(n, d) = P(r ,Λ)
{most recent event involves in-dividual bearing mutation
}
Probability can be computed
Kingman case: explicit formula (HUW (2008))
Λ case: numerically, using recursion
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Hobolth, Uyenoyama & Wiuf’s (2008) idea contd.
For a general sample (t,n)0 0
1 2
4 2 5 4
2 1 3 3
iput
ui ,m =
{
where mutation m is present in dm individuals.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Hobolth, Uyenoyama & Wiuf’s (2008) idea contd.
For a general sample (t,n)0 0
1 2
4 2 5 4
2 1 3 3
i
carrying:
0 4
1 = d11 8
put
ui ,m =
{p
(1)(r ,Λ)(n, dm) · ni
dmif i bears m
where mutation m is present in dm individuals.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Hobolth, Uyenoyama & Wiuf’s (2008) idea contd.
For a general sample (t,n)0 0
1 2
4 2 5 4
2 1 3 3
i
carrying:
0 4
1 8
0 8
5 4
put
ui ,m =
{p
(1)(r ,Λ)(n, dm) · ni
dmif i bears m
where mutation m is present in dm individuals.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Hobolth, Uyenoyama & Wiuf’s (2008) idea contd.
For a general sample (t,n)0 0
1 2
4 2 5 4
2 1 3 3
i
carrying:
0 4
1 8
0 8
5 4
not carrying:
0 11
2 1
put
ui ,m =
{p
(1)(r ,Λ)(n, dm) · ni
dmif i bears m
(1 − p
(1)(r ,Λ)(n, dm)
)· ni
n−dmotherwise,
where mutation m is present in dm individuals.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Hobolth, Uyenoyama & Wiuf’s (2008) idea contd.
For a general sample (t,n)0 0
1 2
4 2 5 4
2 1 3 3
i
carrying:
0 4
1 8
0 8
5 4
not carrying:
0 11
2 1
0 9
3 3
0 10
4 2
put
ui ,m =
{p
(1)(r ,Λ)(n, dm) · ni
dmif i bears m
(1 − p
(1)(r ,Λ)(n, dm)
)· ni
n−dmotherwise,
where mutation m is present in dm individuals.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Hobolth, Uyenoyama & Wiuf’s (2008) idea contd.
For a general sample (t,n)0 0
1 2
4 2 5 4
2 1 3 3
i
carrying:
0 4
1 8
0 8
5 4
not carrying:
0 11
2 1
0 9
3 3
0 10
4 2
put
ui ,m =
{p
(1)(r ,Λ)(n, dm) · ni
dmif i bears m
(1 − p
(1)(r ,Λ)(n, dm)
)· ni
n−dmotherwise,
where mutation m is present in dm individuals. Propose type i
according to
qΛ-HUW
(i |(t,n)
)∝
{∑m ui ,m if i is allowed to act
0 otherwise.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Hobolth, Uyenoyama & Wiuf’s (2008) idea contd.
If proposed type i
is singleton: remove “outmost” mutation,
has ni ≥ 2: merger inside type i .
Kingman case: merge two lines
Λ-case: propose ℓ+ 1-merger w.p. ∝ Pr,Λ
{0 n
1 do − l
∣∣∣∣0 n
1 do
}
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Artifical sample of size 12 analysed with r = 1 and α = 1.5:
5.0 5.5 6.0 6.5 7.0
−2.
5−
2.0
−1.
5−
1.0
−0.
50.
0
Griffiths & TavaréStephens & DonnellyHobolth, Uyenoyama & Wiuf
log10(
b sdest
imate
)
log10(#runs)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
Tree probabilitiesImportance samplingPerformance
Performance
Simulated 50 samples of size 15 with r = 2 and α = 1.5. Analysedwith r = 1 and α = 1.5. Time needed to get relative error below0.01:
05
1015
20
3.9 4.5 5.1 5.7 6.3 6.9 7.5 8.1 8.7
(a) measured in log10(# runs of MC)
05
1015
-0.7 -0.2 0.3 0.8 1.3 1.8 2.3 2.8 3.3
(b) measured in log10(seconds)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
IllustrationOutlook
Dataset from Ward et al, Extensive Mitochondrial
Diversity Within a Single Amerindian Tribe, PNAS 1991
Analysis with Beta-Coalescent:
1
2
3
4
5
1.2 1.4 1.6 1.8 2.0
1e−2
5
1e−25
1e−2
4
1e−241e−23
1e−2
3
1e−22
1e−2
2
1e−211e−21
1e−202e−20
4e−206e−20
8e−209e−20
α
r
Mitochondrial control region from 55 female Nuu-Chah-Nulth,αML = 1.9, rML = 2.2(Sample as edited in Griffiths & Tavare, Stat. Sci., 1994)
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
IllustrationOutlook
Genetic variation at the mitochondrial cyt b locus of Atlantic cod
Carr & Marshall 1991 Pepin & Carr 1993 Arnason & Palsson 1996
0.5
1.0
1.5
2.0
2.5
3.0
1.2 1.4 1.6 1.8 2.0
−15.0
−12.
0
−12.0−11.0
−11.0
−10.0
−10.0
−9.0
−8.7 −8.5
−15
−14
−13
−12
−11
−10
−9
−8
α
r
0.5
1.0
1.5
2.0
2.5
3.0
1.2 1.4 1.6 1.8 2.0
−15.0
−13.0
−13.0−12.0
−12.0
−11.5
−11.5
−11.0 −10.5
−10.0 −9.8
−20
−18
−16
−14
−12
−10
α
r0.5
1.0
1.5
2.0
2.5
3.0
1.2 1.4 1.6 1.8 2.0
−20.0
−18.0
−15.0
−15.0−14.0−14.0
−13.5
−13.0−12.7
−22
−21
−20
−19
−18
−17
−16
−15
−14
−13
α
r
bαML = 1.4,brML = 0.8 bαML = 1.4,brML = 0.6 bαML = 1.65,brML = 0.7
Arnason et al 1998 Arnason et al 2000 Sigurgıslason & Arnason 2003
0.5
1.0
1.5
2.0
2.5
3.0
1.2 1.4 1.6 1.8 2.0
−18.0
−18.0−15.0
−15.0
−14.5
−14.5
−14.0−13.7
−22
−21
−20
−19
−18
−17
−16
−15
−14
α
r
0.5
1.0
1.5
2.0
2.5
3.0
1.2 1.4 1.6 1.8 2.0
−15.0
−12.
0
−12.0−11.0
−11.0
−10.0
−10.0
−9.0
−8.7 −8.5
−20
−18
−16
−14
−12
−10
−8
α
r
0.5
1.0
1.5
2.0
2.5
3.0
1.2 1.4 1.6 1.8 2.0
−15.0
−12.
0
−12.0−11.0
−11.0
−10.0
−10.0
−9.0
−8.8
−8.5
−20
−18
−16
−14
−12
−10
−8
α
r
bαML = 1.55,brML = 0.6 bαML = 1.4,brML = 0.8 bαML = 1.4,brML = 0.8
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
IllustrationOutlook
Summary & Outlook
Eldon & Wakeley, Genetics 2006, wrote
For many species, the coalescent with multiple mergers might
be a better null model than Kingman’s coalescent.
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
IllustrationOutlook
Summary & Outlook
Eldon & Wakeley, Genetics 2006, wrote
For many species, the coalescent with multiple mergers might
be a better null model than Kingman’s coalescent.
For panmictic fixed-size discrete generations populations,haploid neutral one-locus theory is “mathematicallycomplete”.
Tools for estimation exist, results point towards“non-Kingman-ness” in certain cases.
Statistical properties of estimators?
speed-up of computer-intensive methods?combinations between IS-methods possibleDouble-HUW: ask all pairs of mutations what to do
A good class of alternative models?
IntroductionCSBPes and Fleming-Viot processes
Time change relationNeutral mutation models
Computing likelihoodsIllustration and outlook
IllustrationOutlook
Thank you for your attention!