polychord: next generation nested samplingaws/polychord_garching.pdf · polychord: next generation...
TRANSCRIPT
PolyChord: Next Generation Nested SamplingSampling, Parameter Estimation and Bayesian Model
Comparison
Will [email protected]
Supervisors: Anthony Lasenby & Mike HobsonAstrophysics DepartmentCavendish Laboratory
University of Cambridge
December 11, 2015
Parameter estimation & model comparison
Metropolis Hastings
Nested Sampling
PolyChord
Applications
Notation
I Data: D
I Model: M
I Parameters: Θ
I Likelihood: P(D|Θ,M) = L(Θ)
I Posterior: P(Θ|D,M) = P(Θ)
I Prior: P(Θ|M) = π(Θ)
I Evidence: P(D|M) = Z
Notation
I Data: D
I Model: M
I Parameters: Θ
I Likelihood: P(D|Θ,M) = L(Θ)
I Posterior: P(Θ|D,M) = P(Θ)
I Prior: P(Θ|M) = π(Θ)
I Evidence: P(D|M) = Z
Notation
I Data: D
I Model: M
I Parameters: Θ
I Likelihood: P(D|Θ,M) = L(Θ)
I Posterior: P(Θ|D,M) = P(Θ)
I Prior: P(Θ|M) = π(Θ)
I Evidence: P(D|M) = Z
Notation
I Data: D
I Model: M
I Parameters: Θ
I Likelihood: P(D|Θ,M) = L(Θ)
I Posterior: P(Θ|D,M) = P(Θ)
I Prior: P(Θ|M) = π(Θ)
I Evidence: P(D|M) = Z
Notation
I Data: D
I Model: M
I Parameters: Θ
I Likelihood: P(D|Θ,M) = L(Θ)
I Posterior: P(Θ|D,M) = P(Θ)
I Prior: P(Θ|M) = π(Θ)
I Evidence: P(D|M) = Z
Notation
I Data: D
I Model: M
I Parameters: Θ
I Likelihood: P(D|Θ,M) = L(Θ)
I Posterior: P(Θ|D,M) = P(Θ)
I Prior: P(Θ|M) = π(Θ)
I Evidence: P(D|M) = Z
Notation
I Data: D
I Model: M
I Parameters: Θ
I Likelihood: P(D|Θ,M) = L(Θ)
I Posterior: P(Θ|D,M) = P(Θ)
I Prior: P(Θ|M) = π(Θ)
I Evidence: P(D|M) = Z
Notation
I Data: D
I Model: M
I Parameters: Θ
I Likelihood: P(D|Θ,M) = L(Θ)
I Posterior: P(Θ|D,M) = P(Θ)
I Prior: P(Θ|M) = π(Θ)
I Evidence: P(D|M) = Z
Bayes’ theoremParameter estimation
What does the data tell us about the params Θ of our model M?
Objective: Update our prior information π(Θ) in light of data D.
π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)
Solution: Use the likelihood L via Bayes’ theorem:
P(Θ|D,M) =P(D|Θ,M)P(Θ|M)
P(D|M)
Posterior =Likelihood× Prior
Evidence
Bayes’ theoremParameter estimation
What does the data tell us about the params Θ of our model M?
Objective: Update our prior information π(Θ) in light of data D.
π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)
Solution: Use the likelihood L via Bayes’ theorem:
P(Θ|D,M) =P(D|Θ,M)P(Θ|M)
P(D|M)
Posterior =Likelihood× Prior
Evidence
Bayes’ theoremParameter estimation
What does the data tell us about the params Θ of our model M?
Objective: Update our prior information π(Θ) in light of data D.
π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)
Solution: Use the likelihood L via Bayes’ theorem:
P(Θ|D,M) =P(D|Θ,M)P(Θ|M)
P(D|M)
Posterior =Likelihood× Prior
Evidence
Bayes’ theoremParameter estimation
What does the data tell us about the params Θ of our model M?
Objective: Update our prior information π(Θ) in light of data D.
π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)
Solution: Use the likelihood L via Bayes’ theorem:
P(Θ|D,M) =P(D|Θ,M)P(Θ|M)
P(D|M)
Posterior =Likelihood× Prior
Evidence
Bayes’ theoremParameter estimation
What does the data tell us about the params Θ of our model M?
Objective: Update our prior information π(Θ) in light of data D.
π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)
Solution: Use the likelihood L via Bayes’ theorem:
P(Θ|D,M) =P(D|Θ,M)P(Θ|M)
P(D|M)
Posterior =Likelihood× Prior
Evidence
Bayes’ theoremParameter estimation
What does the data tell us about the params Θ of our model M?
Objective: Update our prior information π(Θ) in light of data D.
π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)
Solution: Use the likelihood L via Bayes’ theorem:
P(Θ|D,M) =P(D|Θ,M)P(Θ|M)
P(D|M)
Posterior =Likelihood× Prior
Evidence
Bayes’ theoremParameter estimation
What does the data tell us about the params Θ of our model M?
Objective: Update our prior information π(Θ) in light of data D.
π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)
Solution: Use the likelihood L via Bayes’ theorem:
P(Θ|D,M) =P(D|Θ,M)P(Θ|M)
P(D|M)
Posterior =Likelihood× Prior
Evidence
Bayes’ theoremModel comparison
What does the data tell us about our model Mi in relation to othermodels {M1,M2, · · · }?
P(Mi )D−→ P(Mi |D)
P(Mi |D) =P(D|Mi )P(Mi )
P(D)
P(D|Mi ) = Zi = Evidence of Mi
Bayes’ theoremModel comparison
What does the data tell us about our model Mi in relation to othermodels {M1,M2, · · · }?
P(Mi )D−→ P(Mi |D)
P(Mi |D) =P(D|Mi )P(Mi )
P(D)
P(D|Mi ) = Zi = Evidence of Mi
Bayes’ theoremModel comparison
What does the data tell us about our model Mi in relation to othermodels {M1,M2, · · · }?
P(Mi )D−→ P(Mi |D)
P(Mi |D) =P(D|Mi )P(Mi )
P(D)
P(D|Mi ) = Zi = Evidence of Mi
Bayes’ theoremModel comparison
What does the data tell us about our model Mi in relation to othermodels {M1,M2, · · · }?
P(Mi )D−→ P(Mi |D)
P(Mi |D) =P(D|Mi )P(Mi )
P(D)
P(D|Mi ) = Zi = Evidence of Mi
Bayes’ theoremModel comparison
What does the data tell us about our model Mi in relation to othermodels {M1,M2, · · · }?
P(Mi )D−→ P(Mi |D)
P(Mi |D) =P(D|Mi )P(Mi )
P(D)
P(D|Mi ) = Zi = Evidence of Mi
Parameter estimation & model comparisonThe challenge
Parameter estimation: what does the data tell us about a model?(Computing posteriors)
Model comparison: what does the data tell us about all models?(Computing evidences)
Both of these are challenging things to compute.
I Markov-Chain Monte-Carlo (MCMC) can solve the first ofthese (kind of)
I Nested sampling (NS) promises to solve both simultaneously.
Parameter estimation & model comparisonThe challenge
Parameter estimation: what does the data tell us about a model?(Computing posteriors)
Model comparison: what does the data tell us about all models?(Computing evidences)
Both of these are challenging things to compute.
I Markov-Chain Monte-Carlo (MCMC) can solve the first ofthese (kind of)
I Nested sampling (NS) promises to solve both simultaneously.
Parameter estimation & model comparisonThe challenge
Parameter estimation: what does the data tell us about a model?(Computing posteriors)
Model comparison: what does the data tell us about all models?(Computing evidences)
Both of these are challenging things to compute.
I Markov-Chain Monte-Carlo (MCMC) can solve the first ofthese (kind of)
I Nested sampling (NS) promises to solve both simultaneously.
Parameter estimation & model comparisonThe challenge
Parameter estimation: what does the data tell us about a model?(Computing posteriors)
Model comparison: what does the data tell us about all models?(Computing evidences)
Both of these are challenging things to compute.
I Markov-Chain Monte-Carlo (MCMC) can solve the first ofthese (kind of)
I Nested sampling (NS) promises to solve both simultaneously.
Parameter estimation & model comparisonThe challenge
Parameter estimation: what does the data tell us about a model?(Computing posteriors)
Model comparison: what does the data tell us about all models?(Computing evidences)
Both of these are challenging things to compute.
I Markov-Chain Monte-Carlo (MCMC) can solve the first ofthese (kind of)
I Nested sampling (NS) promises to solve both simultaneously.
Parameter estimation & model comparisonThe challenge
Parameter estimation: what does the data tell us about a model?(Computing posteriors)
Model comparison: what does the data tell us about all models?(Computing evidences)
Both of these are challenging things to compute.
I Markov-Chain Monte-Carlo (MCMC) can solve the first ofthese (kind of)
I Nested sampling (NS) promises to solve both simultaneously.
Parameter estimation & model comparisonWhy is it difficult?
1. In high dimensions, posteriorP occupies a vanishinglysmall region of the prior π.
2. Worse, you don’t knowwhere this region is.
I Describing an N-dimensional posterior fully is impossible.
I Project/marginalise into 2- or 3-dimensions at best
I Sampling the posterior is an excellent compression scheme.
Parameter estimation & model comparisonWhy is it difficult?
1. In high dimensions, posteriorP occupies a vanishinglysmall region of the prior π.
2. Worse, you don’t knowwhere this region is.
I Describing an N-dimensional posterior fully is impossible.
I Project/marginalise into 2- or 3-dimensions at best
I Sampling the posterior is an excellent compression scheme.
Parameter estimation & model comparisonWhy is it difficult?
1. In high dimensions, posteriorP occupies a vanishinglysmall region of the prior π.
2. Worse, you don’t knowwhere this region is.
I Describing an N-dimensional posterior fully is impossible.
I Project/marginalise into 2- or 3-dimensions at best
I Sampling the posterior is an excellent compression scheme.
Parameter estimation & model comparisonWhy is it difficult?
1. In high dimensions, posteriorP occupies a vanishinglysmall region of the prior π.
2. Worse, you don’t knowwhere this region is.
I Describing an N-dimensional posterior fully is impossible.
I Project/marginalise into 2- or 3-dimensions at best
I Sampling the posterior is an excellent compression scheme.
Parameter estimation & model comparisonWhy is it difficult?
1. In high dimensions, posteriorP occupies a vanishinglysmall region of the prior π.
2. Worse, you don’t knowwhere this region is.
I Describing an N-dimensional posterior fully is impossible.
I Project/marginalise into 2- or 3-dimensions at best
I Sampling the posterior is an excellent compression scheme.
Parameter estimation & model comparisonWhy is it difficult?
1. In high dimensions, posteriorP occupies a vanishinglysmall region of the prior π.
2. Worse, you don’t knowwhere this region is.
I Describing an N-dimensional posterior fully is impossible.
I Project/marginalise into 2- or 3-dimensions at best
I Sampling the posterior is an excellent compression scheme.
Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .
I Turn the N-dimensional problem into a one-dimensional one.I Explore the space via a biased random walk.
1. Pick random direction2. Choose step length3. If uphill, make step. . .4. . . . otherwise sometimes make step.
Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .
I Turn the N-dimensional problem into a one-dimensional one.
I Explore the space via a biased random walk.
1. Pick random direction2. Choose step length3. If uphill, make step. . .4. . . . otherwise sometimes make step.
Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .
I Turn the N-dimensional problem into a one-dimensional one.I Explore the space via a biased random walk.
1. Pick random direction2. Choose step length3. If uphill, make step. . .4. . . . otherwise sometimes make step.
Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .
I Turn the N-dimensional problem into a one-dimensional one.I Explore the space via a biased random walk.
1. Pick random direction
2. Choose step length3. If uphill, make step. . .4. . . . otherwise sometimes make step.
Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .
I Turn the N-dimensional problem into a one-dimensional one.I Explore the space via a biased random walk.
1. Pick random direction2. Choose step length
3. If uphill, make step. . .4. . . . otherwise sometimes make step.
Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .
I Turn the N-dimensional problem into a one-dimensional one.I Explore the space via a biased random walk.
1. Pick random direction2. Choose step length3. If uphill, make step. . .
4. . . . otherwise sometimes make step.
Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .
I Turn the N-dimensional problem into a one-dimensional one.I Explore the space via a biased random walk.
1. Pick random direction2. Choose step length3. If uphill, make step. . .4. . . . otherwise sometimes make step.
MCMC in action
0.4 0.2 0.0 0.2 0.4µ
0.6
0.8
1.0
1.2
1.4
A
4 2 0 2 4position
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
sign
al
MCMC in action
0.4 0.2 0.0 0.2 0.4µ
0.6
0.8
1.0
1.2
1.4
A
4 2 0 2 4position
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
sign
al
When MCMC failsBurn in
0.4 0.2 0.0 0.2 0.4µ
0.6
0.8
1.0
1.2
1.4
A
4 2 0 2 4position
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
sign
al
When MCMC failsBurn in
0.4 0.2 0.0 0.2 0.4µ
0.6
0.8
1.0
1.2
1.4
A
4 2 0 2 4position
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
sign
al
When MCMC failsTuning the proposal distribution
0.4 0.2 0.0 0.2 0.4µ
0.6
0.8
1.0
1.2
1.4
A
4 2 0 2 4position
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
sign
al
When MCMC failsTuning the proposal distribution
0.4 0.2 0.0 0.2 0.4µ
0.6
0.8
1.0
1.2
1.4
A
4 2 0 2 4position
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
sign
al
When MCMC failsMultimodality
4 2 0 2 4µ
0.0
0.5
1.0
1.5
2.0
A
4 2 0 2 4position
0.0
0.5
1.0
1.5
2.0
sign
al
When MCMC failsMultimodality
4 2 0 2 4µ
0.0
0.5
1.0
1.5
2.0
A
4 2 0 2 4position
0.0
0.5
1.0
1.5
2.0
sign
al
When MCMC failsPhase transitions
L
X
When MCMC failsThe real reason. . .
I MCMC does not give you evidences!
Z = P(D|M)
=
∫P(D|Θ,M)P(Θ|M)dΘ
=
∫L(Θ)π(Θ)dΘ
= 〈L〉π
I MCMC fundamentally explores the posterior, and cannotaverage over the prior.
When MCMC failsThe real reason. . .
I MCMC does not give you evidences!
Z = P(D|M)
=
∫P(D|Θ,M)P(Θ|M)dΘ
=
∫L(Θ)π(Θ)dΘ
= 〈L〉π
I MCMC fundamentally explores the posterior, and cannotaverage over the prior.
When MCMC failsThe real reason. . .
I MCMC does not give you evidences!
Z = P(D|M)
=
∫P(D|Θ,M)P(Θ|M)dΘ
=
∫L(Θ)π(Θ)dΘ
= 〈L〉π
I MCMC fundamentally explores the posterior, and cannotaverage over the prior.
When MCMC failsThe real reason. . .
I MCMC does not give you evidences!
Z = P(D|M)
=
∫P(D|Θ,M)P(Θ|M)dΘ
=
∫L(Θ)π(Θ)dΘ
= 〈L〉π
I MCMC fundamentally explores the posterior, and cannotaverage over the prior.
When MCMC failsThe real reason. . .
I MCMC does not give you evidences!
Z = P(D|M)
=
∫P(D|Θ,M)P(Θ|M)dΘ
=
∫L(Θ)π(Θ)dΘ
= 〈L〉π
I MCMC fundamentally explores the posterior, and cannotaverage over the prior.
When MCMC failsThe real reason. . .
I MCMC does not give you evidences!
Z = P(D|M)
=
∫P(D|Θ,M)P(Θ|M)dΘ
=
∫L(Θ)π(Θ)dΘ
= 〈L〉π
I MCMC fundamentally explores the posterior, and cannotaverage over the prior.
When MCMC failsThe real reason. . .
I MCMC does not give you evidences!
Z = P(D|M)
=
∫P(D|Θ,M)P(Θ|M)dΘ
=
∫L(Θ)π(Θ)dΘ
= 〈L〉π
I MCMC fundamentally explores the posterior, and cannotaverage over the prior.
Nested SamplingJohn Skilling’s alternative to MCMC!
New procedure:Maintain a set S of n samples, which are sequentially updated:
S0: Generate n samples from the prior π.
Sn+1: Delete the lowest likelihood sample in Sn, and replaceit with a new sample with higher likelihood
Requires one to be able to sample from the prior, subject to a hardlikelihood constraint.
Nested SamplingJohn Skilling’s alternative to MCMC!
New procedure:
Maintain a set S of n samples, which are sequentially updated:
S0: Generate n samples from the prior π.
Sn+1: Delete the lowest likelihood sample in Sn, and replaceit with a new sample with higher likelihood
Requires one to be able to sample from the prior, subject to a hardlikelihood constraint.
Nested SamplingJohn Skilling’s alternative to MCMC!
New procedure:Maintain a set S of n samples, which are sequentially updated:
S0: Generate n samples from the prior π.
Sn+1: Delete the lowest likelihood sample in Sn, and replaceit with a new sample with higher likelihood
Requires one to be able to sample from the prior, subject to a hardlikelihood constraint.
Nested SamplingJohn Skilling’s alternative to MCMC!
New procedure:Maintain a set S of n samples, which are sequentially updated:
S0: Generate n samples from the prior π.
Sn+1: Delete the lowest likelihood sample in Sn, and replaceit with a new sample with higher likelihood
Requires one to be able to sample from the prior, subject to a hardlikelihood constraint.
Nested SamplingJohn Skilling’s alternative to MCMC!
New procedure:Maintain a set S of n samples, which are sequentially updated:
S0: Generate n samples from the prior π.
Sn+1: Delete the lowest likelihood sample in Sn, and replaceit with a new sample with higher likelihood
Requires one to be able to sample from the prior, subject to a hardlikelihood constraint.
Nested SamplingJohn Skilling’s alternative to MCMC!
New procedure:Maintain a set S of n samples, which are sequentially updated:
S0: Generate n samples from the prior π.
Sn+1: Delete the lowest likelihood sample in Sn, and replaceit with a new sample with higher likelihood
Requires one to be able to sample from the prior, subject to a hardlikelihood constraint.
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingGraphical aid
Nested SamplingWhy bother?
I At each iteration, the likelihood contour will shrink in volumeby a factor of ≈ 1/n.
I Nested sampling zooms in to the peak of the posteriorexponentially.
I Nested sampling can be used to get evidences!
Nested SamplingWhy bother?
I At each iteration, the likelihood contour will shrink in volumeby a factor of ≈ 1/n.
I Nested sampling zooms in to the peak of the posteriorexponentially.
I Nested sampling can be used to get evidences!
Nested SamplingWhy bother?
I At each iteration, the likelihood contour will shrink in volumeby a factor of ≈ 1/n.
I Nested sampling zooms in to the peak of the posteriorexponentially.
I Nested sampling can be used to get evidences!
Nested SamplingWhy bother?
I At each iteration, the likelihood contour will shrink in volumeby a factor of ≈ 1/n.
I Nested sampling zooms in to the peak of the posteriorexponentially.
I Nested sampling can be used to get evidences!
Calculating evidences
I Transform to 1 dimensional integral π(θ)dθ = dX
Z =
∫L(θ)π(θ)dθ
=
∫L(X )dX
I X is the prior volume
X (L) =
∫
L(θ)>Lπ(θ)dθ
I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.
Calculating evidences
I Transform to 1 dimensional integral π(θ)dθ = dX
Z =
∫L(θ)π(θ)dθ
=
∫L(X )dX
I X is the prior volume
X (L) =
∫
L(θ)>Lπ(θ)dθ
I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.
Calculating evidences
I Transform to 1 dimensional integral
π(θ)dθ = dX
Z =
∫L(θ)π(θ)dθ
=
∫L(X )dX
I X is the prior volume
X (L) =
∫
L(θ)>Lπ(θ)dθ
I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.
Calculating evidences
I Transform to 1 dimensional integral π(θ)dθ = dX
Z =
∫L(θ)π(θ)dθ
=
∫L(X )dX
I X is the prior volume
X (L) =
∫
L(θ)>Lπ(θ)dθ
I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.
Calculating evidences
I Transform to 1 dimensional integral π(θ)dθ = dX
Z =
∫L(θ)π(θ)dθ =
∫L(X )dX
I X is the prior volume
X (L) =
∫
L(θ)>Lπ(θ)dθ
I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.
Calculating evidences
I Transform to 1 dimensional integral π(θ)dθ = dX
Z =
∫L(θ)π(θ)dθ =
∫L(X )dX
I X is the prior volume
X (L) =
∫
L(θ)>Lπ(θ)dθ
I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.
Calculating evidences
I Transform to 1 dimensional integral π(θ)dθ = dX
Z =
∫L(θ)π(θ)dθ =
∫L(X )dX
I X is the prior volume
X (L) =
∫
L(θ)>Lπ(θ)dθ
I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.
Calculating evidences
I Transform to 1 dimensional integral π(θ)dθ = dX
Z =
∫L(θ)π(θ)dθ =
∫L(X )dX
I X is the prior volume
X (L) =
∫
L(θ)>Lπ(θ)dθ
I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.
Nested SamplingCalculating evidences
Nested SamplingCalculating evidences
L
X
Nested SamplingCalculating evidences
L
X
L1L2
L3
L4
L5
Nested SamplingCalculating evidences
L
XX2 X1X3X4X5
L1L2
L3
L4
L5
Nested SamplingCalculating evidences
L
X
∫L(X )dX ≈∑
i
Li(Xi−1 − Xi)
X2 X1X3X4X5
L1L2
L3
L4
L5
Nested SamplingCalculating evidences
L
XX2 X1X3X4X5
L1L2
L3
L4
L5
∫L(X )dX ≈∑
i
Liwi
Estimating evidencesEvidence error
L
X
Estimating evidencesEvidence error
L
X
Estimating evidencesEvidence error
L
X
Estimating evidencesEvidence error
L
logX
Estimating evidencesEvidence error
L
logX
Z =∫LdX
Estimating evidencesEvidence error
L
logX
Z =∫XL d(logX )
Estimating evidencesEvidence error
logX
L
XL
Estimating evidencesEvidence error
logX
L
XL
H
Estimating evidencesEvidence error
logX
L
XL
H
H =∫log
dPdX
dX
Estimating evidencesEvidence error
I approximate compression:
∆ logX ∼ −1
n± 1
n
logXi ∼ −i
n±√i
n
I # of steps to get to H:
iH ∼ nH
I estimate of volume at H:
logXH ≈ −H ±√
H
n
I estimate of evidence error:
logZ ≈∑
wiLi ±√
H
n
Estimating evidencesEvidence error
I approximate compression:
∆ logX ∼ −1
n
± 1
n
logXi ∼ −i
n±√i
n
I # of steps to get to H:
iH ∼ nH
I estimate of volume at H:
logXH ≈ −H ±√
H
n
I estimate of evidence error:
logZ ≈∑
wiLi ±√
H
n
Estimating evidencesEvidence error
I approximate compression:
∆ logX ∼ −1
n± 1
n
logXi ∼ −i
n±√i
n
I # of steps to get to H:
iH ∼ nH
I estimate of volume at H:
logXH ≈ −H ±√
H
n
I estimate of evidence error:
logZ ≈∑
wiLi ±√
H
n
Estimating evidencesEvidence error
I approximate compression:
∆ logX ∼ −1
n± 1
n
logXi ∼ −i
n
±√i
n
I # of steps to get to H:
iH ∼ nH
I estimate of volume at H:
logXH ≈ −H ±√
H
n
I estimate of evidence error:
logZ ≈∑
wiLi ±√
H
n
Estimating evidencesEvidence error
I approximate compression:
∆ logX ∼ −1
n± 1
n
logXi ∼ −i
n±√i
n
I # of steps to get to H:
iH ∼ nH
I estimate of volume at H:
logXH ≈ −H ±√
H
n
I estimate of evidence error:
logZ ≈∑
wiLi ±√
H
n
Estimating evidencesEvidence error
I approximate compression:
∆ logX ∼ −1
n± 1
n
logXi ∼ −i
n±√i
n
I # of steps to get to H:
iH ∼ nH
I estimate of volume at H:
logXH ≈ −H ±√
H
n
I estimate of evidence error:
logZ ≈∑
wiLi ±√
H
n
Estimating evidencesEvidence error
I approximate compression:
∆ logX ∼ −1
n± 1
n
logXi ∼ −i
n±√i
n
I # of steps to get to H:
iH ∼ nH
I estimate of volume at H:
logXH ≈ −H ±√
H
n
I estimate of evidence error:
logZ ≈∑
wiLi ±√
H
n
Estimating evidencesEvidence error
I approximate compression:
∆ logX ∼ −1
n± 1
n
logXi ∼ −i
n±√i
n
I # of steps to get to H:
iH ∼ nH
I estimate of volume at H:
logXH ≈ −H ±√
H
n
I estimate of evidence error:
logZ ≈∑
wiLi ±√
H
n
Nested samplingParameter estimation
I NS can also be used to sample the posterior
I The set of dead points are posterior samples with anappropriate weighting factor
Nested samplingParameter estimation
I NS can also be used to sample the posterior
I The set of dead points are posterior samples with anappropriate weighting factor
Nested samplingParameter estimation
I NS can also be used to sample the posterior
I The set of dead points are posterior samples with anappropriate weighting factor
When NS succeeds
0.4 0.2 0.0 0.2 0.4µ
0.6
0.8
1.0
1.2
1.4
A
4 2 0 2 4position
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
sign
al
When NS suceeds
0.4 0.2 0.0 0.2 0.4µ
0.6
0.8
1.0
1.2
1.4
A
4 2 0 2 4position
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
sign
al
When NS succeeds
4 2 0 2 4µ
0.0
0.5
1.0
1.5
2.0
A
4 2 0 2 4position
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
sign
al
When NS suceeds
4 2 0 2 4µ
0.0
0.5
1.0
1.5
2.0
A
4 2 0 2 4position
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
sign
al
Sampling from a hard likelihood constraint
“It is not the purpose of this introductory paper todevelop the technology of navigation within such avolume. We merely note that exploring a hard-edgedlikelihood-constrained domain should prove to be neithermore nor less demanding than exploring alikelihood-weighted space.”
— John Skilling
Sampling from a hard likelihood constraint
“It is not the purpose of this introductory paper todevelop the technology of navigation within such avolume. We merely note that exploring a hard-edgedlikelihood-constrained domain should prove to be neithermore nor less demanding than exploring alikelihood-weighted space.”
— John Skilling
Sampling within an iso-likelihood contourPrevious attempts
Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).
I Suffers in high dimensions
Hamiltonian sampling F. Feroz & J. Skilling (2013).
I Requires gradients and tuning
Diffusion Nested Sampling B. Brewer et al. (2009).
I Very promisingI Too many tuning parameters
Sampling within an iso-likelihood contourPrevious attempts
Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).
I Suffers in high dimensions
Hamiltonian sampling F. Feroz & J. Skilling (2013).
I Requires gradients and tuning
Diffusion Nested Sampling B. Brewer et al. (2009).
I Very promisingI Too many tuning parameters
Sampling within an iso-likelihood contourPrevious attempts
Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).
I Suffers in high dimensions
Hamiltonian sampling F. Feroz & J. Skilling (2013).
I Requires gradients and tuning
Diffusion Nested Sampling B. Brewer et al. (2009).
I Very promisingI Too many tuning parameters
Sampling within an iso-likelihood contourPrevious attempts
Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).
I Suffers in high dimensions
Hamiltonian sampling F. Feroz & J. Skilling (2013).
I Requires gradients and tuning
Diffusion Nested Sampling B. Brewer et al. (2009).
I Very promisingI Too many tuning parameters
Sampling within an iso-likelihood contourPrevious attempts
Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).
I Suffers in high dimensions
Hamiltonian sampling F. Feroz & J. Skilling (2013).
I Requires gradients and tuning
Diffusion Nested Sampling B. Brewer et al. (2009).
I Very promisingI Too many tuning parameters
Sampling within an iso-likelihood contourPrevious attempts
Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).
I Suffers in high dimensions
Hamiltonian sampling F. Feroz & J. Skilling (2013).
I Requires gradients and tuning
Diffusion Nested Sampling B. Brewer et al. (2009).
I Very promisingI Too many tuning parameters
Sampling within an iso-likelihood contourPrevious attempts
Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).
I Suffers in high dimensions
Hamiltonian sampling F. Feroz & J. Skilling (2013).
I Requires gradients and tuning
Diffusion Nested Sampling B. Brewer et al. (2009).
I Very promising
I Too many tuning parameters
Sampling within an iso-likelihood contourPrevious attempts
Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).
I Suffers in high dimensions
Hamiltonian sampling F. Feroz & J. Skilling (2013).
I Requires gradients and tuning
Diffusion Nested Sampling B. Brewer et al. (2009).
I Very promisingI Too many tuning parameters
PolyChord“Hit and run” slice sampling
L0
x0
PolyChord“Hit and run” slice sampling
L0
x0
1) From an initial starting point x0,choose a random direction
PolyChord“Hit and run” slice sampling
L0
x0
1) From an initial starting point x0,choose a random direction
PolyChord“Hit and run” slice sampling
L0
x0
2) Place random initial boundof width w
PolyChord“Hit and run” slice sampling
L0
x0
2) Place random initial boundof width w
PolyChord“Hit and run” slice sampling
L0
x0
3) Extend bounds by“stepping out” procedure
PolyChord“Hit and run” slice sampling
L0
x0
3) Extend bounds by“stepping out” procedure
PolyChord“Hit and run” slice sampling
L0
x0
3) Extend bounds by“stepping out” procedure
PolyChord“Hit and run” slice sampling
L0
x0
3) Extend bounds by“stepping out” procedure
PolyChord“Hit and run” slice sampling
L0
x0
3) Extend bounds by“stepping out” procedure
PolyChord“Hit and run” slice sampling
L0
x0
4) Sample uniformly along this chord
PolyChord“Hit and run” slice sampling
L0
x0
4) Sample uniformly along this chord
x1
PolyChord“Hit and run” slice sampling
L0
x0
x1
If x1 is drawn outside of L0,then contract bounds,
PolyChord“Hit and run” slice sampling
L0
x0
If x1 is drawn outside of L0,then contract bounds,
PolyChord“Hit and run” slice sampling
L0
x0
Draw again until x1 is within L0.
PolyChord“Hit and run” slice sampling
L0
x0
Draw again until x1 is within L0.
x1
PolyChord“Hit and run” slice sampling
L0
x0
x1
5) Repeat this procedure,starting with x1
PolyChord“Hit and run” slice sampling
L0
x0
x1
PolyChord“Hit and run” slice sampling
L0
x0
x1
PolyChord“Hit and run” slice sampling
L0
x0
x1
x2
PolyChord“Hit and run” slice sampling
L0
x0
x2
PolyChord“Hit and run” slice sampling
L0
x0
x2
PolyChord“Hit and run” slice sampling
L0
x0
x2x3
PolyChord“Hit and run” slice sampling
L0
x0
x3
PolyChord“Hit and run” slice sampling
L0
x0
x3
PolyChord“Hit and run” slice sampling
L0
x0
x3x4
PolyChord“Hit and run” slice sampling
L0
x0
x3
PolyChord“Hit and run” slice sampling
L0
x0
x3
x4
PolyChord“Hit and run” slice sampling
L0
x0x4
PolyChord“Hit and run” slice sampling
L0
x0x4
PolyChord“Hit and run” slice sampling
L0
x0x4
xN
PolyChord“Hit and run” slice sampling
L0
x0
xN
PolyChordKey points
I This procedure satisfies detailed balance.
I Works even if L0 contour is disjoint.
I Need N reasonably large ∼ O(ndims) so that xN isde-correlated from x1.
PolyChordKey points
I This procedure satisfies detailed balance.
I Works even if L0 contour is disjoint.
I Need N reasonably large ∼ O(ndims) so that xN isde-correlated from x1.
PolyChordKey points
I This procedure satisfies detailed balance.
I Works even if L0 contour is disjoint.
I Need N reasonably large ∼ O(ndims) so that xN isde-correlated from x1.
PolyChordKey points
I This procedure satisfies detailed balance.
I Works even if L0 contour is disjoint.
I Need N reasonably large ∼ O(ndims) so that xN isde-correlated from x1.
PolyChordKey points
I This procedure satisfies detailed balance.
I Works even if L0 contour is disjoint.
I Need N reasonably large ∼ O(ndims) so that xN isde-correlated from x1.
Issues with Slice Sampling
1. Does not deal well with correlated distributions.
2. Need to “tune” w parameter.
Issues with Slice Sampling
1. Does not deal well with correlated distributions.
2. Need to “tune” w parameter.
Issues with Slice Sampling
1. Does not deal well with correlated distributions.
2. Need to “tune” w parameter.
PolyChord’s solutionsCorrelated distributions
Contour fromcorrelateddistribution
Affine transformation y = Lx
y
x Contour withdimensions∼ O(1)
PolyChord’s solutionsCorrelated distributions
I We make an affine transformation to remove degeneracies,and “whiten” the space.
I Samples remain uniformly sampled
I We use the covariance matrix of the live points and allinter-chain points
I Cholesky decomposition is the required skew transformation
I w = 1 in this transformed space
PolyChord’s solutionsCorrelated distributions
I We make an affine transformation to remove degeneracies,and “whiten” the space.
I Samples remain uniformly sampled
I We use the covariance matrix of the live points and allinter-chain points
I Cholesky decomposition is the required skew transformation
I w = 1 in this transformed space
PolyChord’s solutionsCorrelated distributions
I We make an affine transformation to remove degeneracies,and “whiten” the space.
I Samples remain uniformly sampled
I We use the covariance matrix of the live points and allinter-chain points
I Cholesky decomposition is the required skew transformation
I w = 1 in this transformed space
PolyChord’s solutionsCorrelated distributions
I We make an affine transformation to remove degeneracies,and “whiten” the space.
I Samples remain uniformly sampled
I We use the covariance matrix of the live points and allinter-chain points
I Cholesky decomposition is the required skew transformation
I w = 1 in this transformed space
PolyChord’s solutionsCorrelated distributions
I We make an affine transformation to remove degeneracies,and “whiten” the space.
I Samples remain uniformly sampled
I We use the covariance matrix of the live points and allinter-chain points
I Cholesky decomposition is the required skew transformation
I w = 1 in this transformed space
PolyChord’s solutionsCorrelated distributions
I We make an affine transformation to remove degeneracies,and “whiten” the space.
I Samples remain uniformly sampled
I We use the covariance matrix of the live points and allinter-chain points
I Cholesky decomposition is the required skew transformation
I w = 1 in this transformed space
PolyChord’s Additions
I Parallelised up to number of live points with openMPI.
I Novel method for identifying and evolving modes separately.
I Implemented in CosmoMC, as “CosmoChord”, with fast-slowparameters.
PolyChord’s Additions
I Parallelised up to number of live points with openMPI.
I Novel method for identifying and evolving modes separately.
I Implemented in CosmoMC, as “CosmoChord”, with fast-slowparameters.
PolyChord’s Additions
I Parallelised up to number of live points with openMPI.
I Novel method for identifying and evolving modes separately.
I Implemented in CosmoMC, as “CosmoChord”, with fast-slowparameters.
PolyChord’s Additions
I Parallelised up to number of live points with openMPI.
I Novel method for identifying and evolving modes separately.
I Implemented in CosmoMC, as “CosmoChord”, with fast-slowparameters.
PolyChord in actionPrimordial power spectrum PR(k) reconstruction
logPR(k)
log k
As
(kk∗
)ns−1
PolyChord in actionPrimordial power spectrum PR(k) reconstruction
logPR(k)
log k
(k1,P1)
(k2,P2)
PolyChord in actionPrimordial power spectrum PR(k) reconstruction
logPR(k)
log k
(k1,P1)
(k2,P2)
(k3,P3)
PolyChord in actionPrimordial power spectrum PR(k) reconstruction
logPR(k)
log k
(k1,P1)
(k2,P2)
(k3,P3)
(k4,P4)
PolyChord in actionPrimordial power spectrum PR(k) reconstruction
logPR(k)
log k
(k1,P1)
(k2,P2)
(k3,P3)
(k4,P4)
(kNknots,PNknots
)
Planck dataPrimordial power spectrum PR(k) reconstruction
I Temperature data TT+lowP
I Foreground (14) & cosmological (4 + 2 ∗ Nknots − 2)parameters
I Marginalised plots of PR(k)
I
P(PR|k ,Nknots) =
∫δ(PR − f (k ; θ))P(θ)dθ
Planck dataPrimordial power spectrum PR(k) reconstruction
I Temperature data TT+lowP
I Foreground (14) & cosmological (4 + 2 ∗ Nknots − 2)parameters
I Marginalised plots of PR(k)
I
P(PR|k ,Nknots) =
∫δ(PR − f (k ; θ))P(θ)dθ
Planck dataPrimordial power spectrum PR(k) reconstruction
I Temperature data TT+lowP
I Foreground (14) & cosmological (4 + 2 ∗ Nknots − 2)parameters
I Marginalised plots of PR(k)
I
P(PR|k ,Nknots) =
∫δ(PR − f (k ; θ))P(θ)dθ
Planck dataPrimordial power spectrum PR(k) reconstruction
I Temperature data TT+lowP
I Foreground (14) & cosmological (4 + 2 ∗ Nknots − 2)parameters
I Marginalised plots of PR(k)
I
P(PR|k ,Nknots) =
∫δ(PR − f (k ; θ))P(θ)dθ
Planck dataPrimordial power spectrum PR(k) reconstruction
I Temperature data TT+lowP
I Foreground (14) & cosmological (4 + 2 ∗ Nknots − 2)parameters
I Marginalised plots of PR(k)
I
P(PR|k ,Nknots) =
∫δ(PR − f (k ; θ))P(θ)dθ
0 internal knotsPrimordial power spectrum PR(k) reconstruction
10−4 10−3 10−2 10−1
k/Mpc
2.0
2.5
3.0
3.5
4.0
log
(10
10P R
)10 100 1000
`
0σ
1σ
2σ
3σ
1 internal knotsPrimordial power spectrum PR(k) reconstruction
10−4 10−3 10−2 10−1
k/Mpc
2.0
2.5
3.0
3.5
4.0
log
(10
10P R
)10 100 1000
`
0σ
1σ
2σ
3σ
2 internal knotsPrimordial power spectrum PR(k) reconstruction
10−4 10−3 10−2 10−1
k/Mpc
2.0
2.5
3.0
3.5
4.0
log
(10
10P R
)10 100 1000
`
0σ
1σ
2σ
3σ
3 internal knotsPrimordial power spectrum PR(k) reconstruction
10−4 10−3 10−2 10−1
k/Mpc
2.0
2.5
3.0
3.5
4.0
log
(10
10P R
)10 100 1000
`
0σ
1σ
2σ
3σ
4 internal knotsPrimordial power spectrum PR(k) reconstruction
10−4 10−3 10−2 10−1
k/Mpc
2.0
2.5
3.0
3.5
4.0
log
(10
10P R
)10 100 1000
`
0σ
1σ
2σ
3σ
5 internal knotsPrimordial power spectrum PR(k) reconstruction
10−4 10−3 10−2 10−1
k/Mpc
2.0
2.5
3.0
3.5
4.0
log
(10
10P R
)10 100 1000
`
0σ
1σ
2σ
3σ
6 internal knotsPrimordial power spectrum PR(k) reconstruction
10−4 10−3 10−2 10−1
k/Mpc
2.0
2.5
3.0
3.5
4.0
log
(10
10P R
)10 100 1000
`
0σ
1σ
2σ
3σ
7 internal knotsPrimordial power spectrum PR(k) reconstruction
10−4 10−3 10−2 10−1
k/Mpc
2.0
2.5
3.0
3.5
4.0
log
(10
10P R
)10 100 1000
`
0σ
1σ
2σ
3σ
8 internal knotsPrimordial power spectrum PR(k) reconstruction
10−4 10−3 10−2 10−1
k/Mpc
2.0
2.5
3.0
3.5
4.0
log
(10
10P R
)10 100 1000
`
0σ
1σ
2σ
3σ
Bayes FactorsPrimordial power spectrum PR(k) reconstruction
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
0 1 2 3 4 5 6 7 8
Bayes
factor
B 0,N
w.r.t.ΛCDM
Number of internal knots N
Marginalised plotPrimordial power spectrum PR(k) reconstruction
10−4 10−3 10−2 10−1
k/Mpc
2.0
2.5
3.0
3.5
4.0
log
(10
10P R
)10 100 1000
`
0σ
1σ
2σ
3σ
Object detectionToy problem
Object detectionEvidences
I logZ ratio: −251 : −156 : −114 : −117 : −136
I odds ratio: 10−60 : 10−19 : 1 : 0.04 : 10−10
Object detectionEvidences
I logZ ratio: −251 : −156 : −114 : −117 : −136
I odds ratio: 10−60 : 10−19 : 1 : 0.04 : 10−10
Object detectionEvidences
I logZ ratio: −251 : −156 : −114 : −117 : −136
I odds ratio: 10−60 : 10−19 : 1 : 0.04 : 10−10
PolyChord vs. MultiNestGaussian likelihood
102
103
104
105
106
107
108
109
1010
1 2 4 8 16 32 64 128 256
Number
ofLikelihoodevaluations,
NL
Number of dimensions, D
PolyChordMultiNest
PolyChord vs. MultiNestGaussian likelihood
102
103
104
105
106
107
108
109
1010
1 2 4 8 16 32 64 128 256 512
Number
ofLikelihoodevaluations,
NL
Number of dimensions, D
PolyChord 2.0PolyChord 1.0
MultiNest
ConclusionsThe future of nested sampling
I We are at the beginning of a new era of sampling algorithms
I Plenty of more work in to be done in exploring new versions ofnested sampling
I Nested sampling is just the beginning
I arXiv:1506.00171
I http://ccpforge.cse.rl.ac.uk/gf/project/polychord/
ConclusionsThe future of nested sampling
I We are at the beginning of a new era of sampling algorithms
I Plenty of more work in to be done in exploring new versions ofnested sampling
I Nested sampling is just the beginning
I arXiv:1506.00171
I http://ccpforge.cse.rl.ac.uk/gf/project/polychord/
ConclusionsThe future of nested sampling
I We are at the beginning of a new era of sampling algorithms
I Plenty of more work in to be done in exploring new versions ofnested sampling
I Nested sampling is just the beginning
I arXiv:1506.00171
I http://ccpforge.cse.rl.ac.uk/gf/project/polychord/
ConclusionsThe future of nested sampling
I We are at the beginning of a new era of sampling algorithms
I Plenty of more work in to be done in exploring new versions ofnested sampling
I Nested sampling is just the beginning
I arXiv:1506.00171
I http://ccpforge.cse.rl.ac.uk/gf/project/polychord/
ConclusionsThe future of nested sampling
I We are at the beginning of a new era of sampling algorithms
I Plenty of more work in to be done in exploring new versions ofnested sampling
I Nested sampling is just the beginning
I arXiv:1506.00171
I http://ccpforge.cse.rl.ac.uk/gf/project/polychord/
ConclusionsThe future of nested sampling
I We are at the beginning of a new era of sampling algorithms
I Plenty of more work in to be done in exploring new versions ofnested sampling
I Nested sampling is just the beginning
I arXiv:1506.00171
I http://ccpforge.cse.rl.ac.uk/gf/project/polychord/