approximate bayesian computation using indirect inference€¦ · smc algorithm settings and...

Intro Algorithms Application Summary Statistics Auxiliary models Results Ending

Approximate Bayesian Computation using

Indirect Inference

Chris [email protected]

Acknowledgement: Prof Tony Pettitt and Prof Malcolm Faddy

School of Mathematical Sciences, Queensland University ofTechnology

June 26, 2012

Chris Drovandi ISBA 2012


Approximate Bayesian Computation (ABC)

Bayesian statistics involves inference based on the posteriordistribution

π(θ|y) ∝ f (y|θ)π(θ).

What happens when likelihood f (y|θ) unavailable?

ABC instead uses model simulations and compares simulatedwith observed

Naive Algorithm - Rejection Sampling

Sample θ ∼ π(·)Simulate x ∼ f (·|θ)Accept θ if ρ(y, x) ≤ ǫ

Repeat the above until we have N draws, θ1, . . . ,θN



The Approximate Posterior

Involves a joint ‘approximate’ posterior distribution

π(θ, x|y, ǫ) ∝ g(y|x, ǫ)f (x|θ)π(θ)

g(y|x, ǫ) is a weighting function (Reeves and Pettitt, ’05).

How do we compare y and x? Directly? But too highdimensional?

Typically use statistics to summarise the dataS(·) = S1(·), . . . ,Sp(·) (low dimensional)

ρ(y, x) = ‖S(y)− S(x)‖

One popular choice is to set

g(y|x, ǫ) = 1(ρ(y, x) ≤ ǫ)

Choice of ǫ trade-off between accuracy and efficiency



The Approximation and Summary Statistic Choice

Errors from insufficient summaries and ǫ > 0 (see Fearnheadand Prangle (2012))

Efficient algorithm gets low tolerance

Thus summary statistic choice is vital for good approximation

Approaches for Summary Statistics

Full data (Barthelme and Chopin (2011), White et al (2012))

Data reduction techniques (Blum et al (2012))

Choose subset out of larger set (Nunes and Balding (2010),Barnes et al (2012))

Indirect inference (motivated by application in this talk)

Advantage of indirect inference approach: knowledge that summarystatistics are ‘good’ can be obtained prior to ABC analysis



ABC Algorithms

Rejection Sampling (above) (Pritchard et al, 1999 andBeaumont et al, 2002)

Advantage: Simplicity, Independent drawsDisadvantage: Acceptance rate too low.

MCMC ABC (Marjoram et al, 2003)

Carefully chosen proposal q(θ∗, x∗|θ, x) = q(θ∗|θ)f (x∗|θ∗).Likelihoods CancelAdvantage: Efficient relative to Rejection SamplingDisadvantage: Still quite inefficient, can get stuck,multi-modal targets?

SMC ABC (Sisson et al (2007), Toni et al (2009), Drovandiand Pettitt (2011), Beaumont et al (2009), Del Moral et al(2012))

SMC Advantages: Very efficient, adaptive



Motivating Application - Macroparasite Immunity

Estimate parameters of a Markov process model explainingmacroparasite population development with host immunity

212 hosts (cats) i = 1, . . . , 212. Each cat injected with lijuvenile Brugia pahangi larvae (approximately 100 or 200).

At time ti host is sacrificed and the number of matures arerecorded

Host assumed to develop an immunity

Three variable problem: M(t) matures, L(t) juveniles, I (t)immunity.

Only L(0) and M(ti) is observed for each host



0 200 400 600 800 1000 12000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time

Pro

porto

n of

Mat

ures



Trivariate Markov Process of Riley et al (2003)

M(t) L(t)

I(t)

Mature Parasites Juvenile Parasites

Immunity

Maturation

Gain of immunity Loss of immunity

Death due to im

munity

Natural death

Natural death

Invisible

Invisible

Invisible

γL(t)

νL(t) µI I (t)

βI(t)L

(t)

µLL(t)

µMM

(t)



The Model and Intractable Likelihood

Deterministic form of the model

dL

dt= −µLL− βIL− γL,

dM

dt= γL− µMM,

dI

dt= νL− µI I ,

µm, γ fixed. ν, µL, µI , β require estimation

Likelihood-based inference appears intractable.

Could form complete likelihood with missing larvae andimmunity (too much missing data)Matrix exponential form of the likelihood (matrix too large)

Simulation from the model straightforward using Gillespie’salgorithm (Gillespie, 1977)



Developing Summary Statistics

Need to develop summary statistics that efficiently summarizedata!

Covariates: numbers of juveniles, sacrifice time.An approach based on indirect inference (Gourieroux andRonchetti, 1993)

Propose an auxiliary model pa(y|θa) where parameter θa iseasily estimableAuxiliary model is flexible enough to provide a gooddescription of the dataSimulate xθ from target intractable likelihood p(·|θ) and findθa(xθ)Estimate θ using θa(xθ) closest to θa(y)

Estimates of parameters of the auxiliary model fitted to thedata become the summary statistics in ABC.

Compare and contrast models based on Beta Binomial (BB)and Binomial mixture (BM).



Summary statistics

For each model

Compare the auxiliary estimates for simulated data x , θxa, and

observations y , θya, with the Mahalanobis distance

ρ(y, x) = ρ(θya,θ

xa) =

√

(θya − θ

xa)

TS−1(θya − θ

xa),

where S is the covariance matrix for the MLE



Auxiliary Beta-Binomial model

The data show too much variation for Binomial

A Beta-Binomial model has an extra parameter to capturedispersion

p(mi |αi , βi ) =

(

li

mi

)

B(mi + αi , li −mi + βi )

B(αi , βi ),

Useful reparameterisation pi = αi/(αi + βi ) andθi = 1/(αi + βi )

Relate the proportion and over dispersion parameters to time,ti , and initial larvae, li , covariates

logit(pi ) = β0 + β1 log(ti) + β2(log(ti ))2,

log(θi) =

{

η100, if li ≈ 100η200, if li ≈ 200

,

Five parameters



Auxiliary Binomial Mixture model

An auxiliary model based on a three component Binomialmixture.

The ith observation has the distribution

p(mi |Θ) =(

limi

)∑3

k=1 wk(θki )

mi (1− θki )li−mi ,

where w3 = 1− w1 − w2.

Reparameterise the θki , logit(θki ) = γk0 + γ1 log(ti ), so that

each component has the same slope but a different intercept.

Six parameters, Θ = (w1,w2, γ10 , γ

20 , γ

30 , γ1).



ABC Summary Statistics from auxiliary models

How to choose between different auxiliary models?

use data analytical tools for the original data setTry each model/summary statistic (Nunes and Balding, 2010)Former is far less computer intensive but does it give best ABCapproximation?

Fit to data

Relative measure: Beta-Binomial better than Binomialmixture by 170 on log likelihood scale and 1 less parameter

Absolute measure: Generalized Pearson statistics

Beta-Binomial, 213.2 on 207 d.o.f.Binomial mixture, 278.9 on 206 d.o.f.

Plots suggest Binomial mixture does not capture variability ofdata

Residual analysis: Beta-binomial X



SMC Algorithm Settings and Posterior Results

Algorithm Settings

Take N = 1000 particles, start with 60% acceptance, discardhalf each iteration, finish with 3% acceptance for MCMCkernel.

Repeat MCMC kernel to get about 99% acceptance at eachiteration

Fixed values γ = 0.04, µM = 0.0015

Prior choices:

ν: U(0,1)µL: U(0,1)µI : U(0,2),β: U(0,2),



Posterior Densities

Posteriors Beta Binomial vs Binomial mixture

6.0 6.5 7.0 7.5

0.0

0.5

1.0

1.5

nu

nu

Dens

ity

0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

mu_I

mu_I

Dens

ity

0.00 0.01 0.02 0.03 0.04

010

3050

70

mu_L

mu_L

Dens

ity

0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

beta

beta

Dens

ity



Bivariate Posteriors

Posteriors Beta Binomial vs Binomial mixture

6.0 6.5 7.0 7.5 8.0

0.0

0.5

1.0

1.5

2.0

nu

mu_

IPlots of posteriors, green is BM, red is BB

0.000 0.010 0.020 0.030

0.0

0.5

1.0

1.5

2.0

mu_L

mu_

I

0.000 0.010 0.020 0.030

0.5

1.0

1.5

2.0

mu_L

beta

0.0 0.5 1.0 1.5 2.0

0.5

1.0

1.5

2.0

mu_I

beta



Posterior Density Differences

For ν BB posterior is more concentrated than BM, similarmodes

For µL similar concentration but very different modes , 0.0055vs 0.0244.

Using the Nunes-Balding criterion for summary statisticswould prefer BB to BM.

But what are the predictions from each ABC fitted model?



ABC Fit to data

95 percent prediction interval from the stochastic model usingABC modal estimates: with auxiliary model Beta-Binomial (left)and the Binomial mixture (right) model.

0 200 400 600 800 1000 1200 1400

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

195% prediction intervals based on the auxiliary Beta−Binomial model

Autopsy time

mat

ure

coun

t

0 200 400 600 800 1000 1200 1400

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

195% prediction intervals based on the auxiliary Binomial mixture model

Autopsy time

mat

ure

coun

t

ABC model fit based on Beta-Binomial auxiliary model capturesvariability of data



Summary of Method

Advantages

Can be applied in non-iid settings

Appropriate of statistics assessed via data analytic techniques

Statistics are complicated functions of data

Disadvantages

Likelihood maximisation for each simulated data drawn



Reference

Drovandi et al (2011). JRSS C 60, 317-337.

Drovandi and Pettitt (2011). Biometrics 67, 225-233.

Riley et al (2003). J. Theoret. Biol. 225, 419430.

Heggland and Frigessi (2004). JRSS B 66, 447-462.

Nunes and Balding (2010). Stat. Appl. Genet. Mol. Biol.9(1).


approximate bayesian computation using indirect inference€¦ · smc algorithm settings and...

Documents