introduction to sequential monte carlo methods

55
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) Introduction to Sequential Monte Carlo Methods SMC for Static Problems Online Bayesian Parameter Estimation Prof. Nicholas Zabaras Materials Process Design and Control Laboratory Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Cornell University Ithaca, NY 14853-3801 Email: [email protected] URL: http://mpdc.mae.cornell.edu/ April 27, 2013 1

Upload: others

Post on 29-Nov-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Introduction to Sequential Monte Carlo Methods

SMC for Static Problems Online Bayesian Parameter Estimation

Prof Nicholas Zabaras Materials Process Design and Control Laboratory

Sibley School of Mechanical and Aerospace Engineering 101 Frank H T Rhodes Hall

Cornell University Ithaca NY 14853-3801

Email nzabarasgmailcom

URL httpmpdcmaecornelledu

April 27 2013

1

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Contents References and SMC Resources

SMC FOR STATIC PROBLEMS Important Application Problems Algorithm Choice of Reversed Kernel

ONLINE PARAMETER ESTIMATION Using MCMC Recursive MLE Parameter Estimation Importance Sampling Estimation of Sensitivity Degeneracy of the SMC Algorithm Marginal Importance Sampling for Sensitivity Estimation Examples Gibbs Sampling Strategy Marginal Metropolis Hastings Algorithm

OPEN PROBLEMS

2

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References CP Robert amp G Casella Monte Carlo Statistical Methods Chapter 11

JS Liu Monte Carlo Strategies in Scientific Computing Chapter 3 Springer-Verlag New York

A Doucet N De Freitas amp N Gordon (eds) Sequential Monte Carlo in Practice Springer-Verlag

2001

A Doucet N De Freitas NJ Gordon An introduction to Sequential Monte Carlo in SMC in Practice 2001

D Wilkison Stochastic Modelling for Systems Biology Second Edition 2006

E Ionides Inference for Nonlinear Dynamical Systems PNAS 2006

JS Liu and R Chen Sequential Monte Carlo methods for dynamic systems JASA 1998

A Doucet Sequential Monte Carlo Methods Short Course at SAMSI

A Doucet Sequential Monte Carlo Methods amp Particle Filters Resources

Pierre Del Moral Feynman-Kac models and interacting particle systems (SMC resources)

A Doucet Sequential Monte Carlo Methods Video Lectures 2007

N de Freitas and A Doucet Sequential MC Methods N de Freitas and A Doucet Video Lectures

2010

3

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References MK Pitt and N Shephard Filtering via Simulation Auxiliary Particle Filter JASA 1999

A Doucet SJ Godsill and C Andrieu On Sequential Monte Carlo sampling methods for Bayesian

filtering Stat Comp 2000

J Carpenter P Clifford and P Fearnhead An Improved Particle Filter for Non-linear Problems IEE 1999

A Kong JS Liu amp WH Wong Sequential Imputations and Bayesian Missing Data Problems JASA 1994

O Cappe E Moulines amp T Ryden Inference in Hidden Markov Models Springer-Verlag 2005

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

G Poyadjis A Doucet and SS Singh Maximum Likelihood Parameter Estimation using Particle Methods Joint Statistical Meeting 2005

N Gordon D J Salmond AFM Smith Novel Approach to nonlinear non Gaussian Bayesian state estimation IEE 1993

Particle Filters S Godsill 2009 (Video Lectures)

R Chen and JS Liu Predictive Updating Methods with Application to Bayesian Classification JRSS B 1996

4

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References C Andrieu and A Doucet Particle Filtering for Partially Observed Gaussian State-Space Models

JRSS B 2002

R Chen and J Liu Mixture Kalman Filters JRSSB 2000

A Doucet S J Godsill C Andrieu On SMC sampling methods for Bayesian Filtering Stat Comp 2000

N Kantas AD SS Singh and JM Maciejowski An overview of sequential Monte Carlo methods for parameter estimation in general state-space models in Proceedings IFAC System Identification (SySid) Meeting 2009

C Andrieu ADoucet amp R Holenstein Particle Markov chain Monte Carlo methods JRSS B 2010

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999

P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002

G Storvik Particle filters for state-space models with the presence of unknown static parameters IEEE Trans Signal Processing 2002

5

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References C Andrieu A Doucet and VB Tadic Online EM for parameter estimation in nonlinear-non

Gaussian state-space models Proc IEEE CDC 2005

G Poyadjis A Doucet and SS Singh Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation Biometrika 2011

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for

Genome Wide Transcript Analysis Technical report 2008

R Martinez-Cantin J Castellanos and N de Freitas Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm Marginal-SLAM International Conference on Robotics and Automation

C Andrieu AD amp R Holenstein Particle Markov chain Monte Carlo methods (with discussion) JRSS B 2010

A Doucet Sequential Monte Carlo Methods and Particle Filters List of Papers Codes and Viedo lectures on SMC and particle filters

Pierre Del Moral Feynman-Kac models and interacting particle systems

6

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian

Statistics 2006

P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009

A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004

Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of

Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011

A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008

A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011

7

Bayesian Scientific Computing Spring 2013 (N Zabaras) 8

SEQUENTIAL MONTE CARLO METHODS

FOR STATIC PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Classical Bayesian Inference

Global Optimization

It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn

1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=

[ ]( ) ( ) n

n nx x γπ π γprop rarr infin

9

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Global Optimization

Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence

1( ) ( | )n nx p xπ = y

[ ]( ) ( ) n

n nx x ηπ π ηprop rarr infin

10

( ) ( )1( ) ( ) ( )n n

n x x xη ηπ micro π minusprop

1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Rare Event Simulation

The required probability is then

The Boolean Satisfiability Problem

Computing Matrix Permanents

Computing volumes in high dimensions

11

1

1 2

( ) 1 ( ) ( )1 ( )

nn n E

Final

A x x x Normalizing Factor Z known

Simulate with sequence E E E A

π π πltlt prop =

= sup sup sup =X

( )FinalZ Aπ=

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

Apply SMC to an augmented sequence of distributions For example

1

nn

nonπ

geX

( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x

( ) ( ) ( )1

1 1 1 1 |

n

n nn n n n n n

Use any conditionaldistribution on

x x xπ π π

minus

minus minus=

x x

X

12

bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible

Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 2: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Contents References and SMC Resources

SMC FOR STATIC PROBLEMS Important Application Problems Algorithm Choice of Reversed Kernel

ONLINE PARAMETER ESTIMATION Using MCMC Recursive MLE Parameter Estimation Importance Sampling Estimation of Sensitivity Degeneracy of the SMC Algorithm Marginal Importance Sampling for Sensitivity Estimation Examples Gibbs Sampling Strategy Marginal Metropolis Hastings Algorithm

OPEN PROBLEMS

2

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References CP Robert amp G Casella Monte Carlo Statistical Methods Chapter 11

JS Liu Monte Carlo Strategies in Scientific Computing Chapter 3 Springer-Verlag New York

A Doucet N De Freitas amp N Gordon (eds) Sequential Monte Carlo in Practice Springer-Verlag

2001

A Doucet N De Freitas NJ Gordon An introduction to Sequential Monte Carlo in SMC in Practice 2001

D Wilkison Stochastic Modelling for Systems Biology Second Edition 2006

E Ionides Inference for Nonlinear Dynamical Systems PNAS 2006

JS Liu and R Chen Sequential Monte Carlo methods for dynamic systems JASA 1998

A Doucet Sequential Monte Carlo Methods Short Course at SAMSI

A Doucet Sequential Monte Carlo Methods amp Particle Filters Resources

Pierre Del Moral Feynman-Kac models and interacting particle systems (SMC resources)

A Doucet Sequential Monte Carlo Methods Video Lectures 2007

N de Freitas and A Doucet Sequential MC Methods N de Freitas and A Doucet Video Lectures

2010

3

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References MK Pitt and N Shephard Filtering via Simulation Auxiliary Particle Filter JASA 1999

A Doucet SJ Godsill and C Andrieu On Sequential Monte Carlo sampling methods for Bayesian

filtering Stat Comp 2000

J Carpenter P Clifford and P Fearnhead An Improved Particle Filter for Non-linear Problems IEE 1999

A Kong JS Liu amp WH Wong Sequential Imputations and Bayesian Missing Data Problems JASA 1994

O Cappe E Moulines amp T Ryden Inference in Hidden Markov Models Springer-Verlag 2005

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

G Poyadjis A Doucet and SS Singh Maximum Likelihood Parameter Estimation using Particle Methods Joint Statistical Meeting 2005

N Gordon D J Salmond AFM Smith Novel Approach to nonlinear non Gaussian Bayesian state estimation IEE 1993

Particle Filters S Godsill 2009 (Video Lectures)

R Chen and JS Liu Predictive Updating Methods with Application to Bayesian Classification JRSS B 1996

4

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References C Andrieu and A Doucet Particle Filtering for Partially Observed Gaussian State-Space Models

JRSS B 2002

R Chen and J Liu Mixture Kalman Filters JRSSB 2000

A Doucet S J Godsill C Andrieu On SMC sampling methods for Bayesian Filtering Stat Comp 2000

N Kantas AD SS Singh and JM Maciejowski An overview of sequential Monte Carlo methods for parameter estimation in general state-space models in Proceedings IFAC System Identification (SySid) Meeting 2009

C Andrieu ADoucet amp R Holenstein Particle Markov chain Monte Carlo methods JRSS B 2010

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999

P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002

G Storvik Particle filters for state-space models with the presence of unknown static parameters IEEE Trans Signal Processing 2002

5

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References C Andrieu A Doucet and VB Tadic Online EM for parameter estimation in nonlinear-non

Gaussian state-space models Proc IEEE CDC 2005

G Poyadjis A Doucet and SS Singh Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation Biometrika 2011

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for

Genome Wide Transcript Analysis Technical report 2008

R Martinez-Cantin J Castellanos and N de Freitas Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm Marginal-SLAM International Conference on Robotics and Automation

C Andrieu AD amp R Holenstein Particle Markov chain Monte Carlo methods (with discussion) JRSS B 2010

A Doucet Sequential Monte Carlo Methods and Particle Filters List of Papers Codes and Viedo lectures on SMC and particle filters

Pierre Del Moral Feynman-Kac models and interacting particle systems

6

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian

Statistics 2006

P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009

A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004

Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of

Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011

A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008

A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011

7

Bayesian Scientific Computing Spring 2013 (N Zabaras) 8

SEQUENTIAL MONTE CARLO METHODS

FOR STATIC PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Classical Bayesian Inference

Global Optimization

It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn

1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=

[ ]( ) ( ) n

n nx x γπ π γprop rarr infin

9

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Global Optimization

Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence

1( ) ( | )n nx p xπ = y

[ ]( ) ( ) n

n nx x ηπ π ηprop rarr infin

10

( ) ( )1( ) ( ) ( )n n

n x x xη ηπ micro π minusprop

1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Rare Event Simulation

The required probability is then

The Boolean Satisfiability Problem

Computing Matrix Permanents

Computing volumes in high dimensions

11

1

1 2

( ) 1 ( ) ( )1 ( )

nn n E

Final

A x x x Normalizing Factor Z known

Simulate with sequence E E E A

π π πltlt prop =

= sup sup sup =X

( )FinalZ Aπ=

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

Apply SMC to an augmented sequence of distributions For example

1

nn

nonπ

geX

( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x

( ) ( ) ( )1

1 1 1 1 |

n

n nn n n n n n

Use any conditionaldistribution on

x x xπ π π

minus

minus minus=

x x

X

12

bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible

Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 3: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References CP Robert amp G Casella Monte Carlo Statistical Methods Chapter 11

JS Liu Monte Carlo Strategies in Scientific Computing Chapter 3 Springer-Verlag New York

A Doucet N De Freitas amp N Gordon (eds) Sequential Monte Carlo in Practice Springer-Verlag

2001

A Doucet N De Freitas NJ Gordon An introduction to Sequential Monte Carlo in SMC in Practice 2001

D Wilkison Stochastic Modelling for Systems Biology Second Edition 2006

E Ionides Inference for Nonlinear Dynamical Systems PNAS 2006

JS Liu and R Chen Sequential Monte Carlo methods for dynamic systems JASA 1998

A Doucet Sequential Monte Carlo Methods Short Course at SAMSI

A Doucet Sequential Monte Carlo Methods amp Particle Filters Resources

Pierre Del Moral Feynman-Kac models and interacting particle systems (SMC resources)

A Doucet Sequential Monte Carlo Methods Video Lectures 2007

N de Freitas and A Doucet Sequential MC Methods N de Freitas and A Doucet Video Lectures

2010

3

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References MK Pitt and N Shephard Filtering via Simulation Auxiliary Particle Filter JASA 1999

A Doucet SJ Godsill and C Andrieu On Sequential Monte Carlo sampling methods for Bayesian

filtering Stat Comp 2000

J Carpenter P Clifford and P Fearnhead An Improved Particle Filter for Non-linear Problems IEE 1999

A Kong JS Liu amp WH Wong Sequential Imputations and Bayesian Missing Data Problems JASA 1994

O Cappe E Moulines amp T Ryden Inference in Hidden Markov Models Springer-Verlag 2005

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

G Poyadjis A Doucet and SS Singh Maximum Likelihood Parameter Estimation using Particle Methods Joint Statistical Meeting 2005

N Gordon D J Salmond AFM Smith Novel Approach to nonlinear non Gaussian Bayesian state estimation IEE 1993

Particle Filters S Godsill 2009 (Video Lectures)

R Chen and JS Liu Predictive Updating Methods with Application to Bayesian Classification JRSS B 1996

4

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References C Andrieu and A Doucet Particle Filtering for Partially Observed Gaussian State-Space Models

JRSS B 2002

R Chen and J Liu Mixture Kalman Filters JRSSB 2000

A Doucet S J Godsill C Andrieu On SMC sampling methods for Bayesian Filtering Stat Comp 2000

N Kantas AD SS Singh and JM Maciejowski An overview of sequential Monte Carlo methods for parameter estimation in general state-space models in Proceedings IFAC System Identification (SySid) Meeting 2009

C Andrieu ADoucet amp R Holenstein Particle Markov chain Monte Carlo methods JRSS B 2010

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999

P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002

G Storvik Particle filters for state-space models with the presence of unknown static parameters IEEE Trans Signal Processing 2002

5

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References C Andrieu A Doucet and VB Tadic Online EM for parameter estimation in nonlinear-non

Gaussian state-space models Proc IEEE CDC 2005

G Poyadjis A Doucet and SS Singh Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation Biometrika 2011

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for

Genome Wide Transcript Analysis Technical report 2008

R Martinez-Cantin J Castellanos and N de Freitas Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm Marginal-SLAM International Conference on Robotics and Automation

C Andrieu AD amp R Holenstein Particle Markov chain Monte Carlo methods (with discussion) JRSS B 2010

A Doucet Sequential Monte Carlo Methods and Particle Filters List of Papers Codes and Viedo lectures on SMC and particle filters

Pierre Del Moral Feynman-Kac models and interacting particle systems

6

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian

Statistics 2006

P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009

A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004

Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of

Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011

A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008

A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011

7

Bayesian Scientific Computing Spring 2013 (N Zabaras) 8

SEQUENTIAL MONTE CARLO METHODS

FOR STATIC PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Classical Bayesian Inference

Global Optimization

It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn

1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=

[ ]( ) ( ) n

n nx x γπ π γprop rarr infin

9

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Global Optimization

Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence

1( ) ( | )n nx p xπ = y

[ ]( ) ( ) n

n nx x ηπ π ηprop rarr infin

10

( ) ( )1( ) ( ) ( )n n

n x x xη ηπ micro π minusprop

1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Rare Event Simulation

The required probability is then

The Boolean Satisfiability Problem

Computing Matrix Permanents

Computing volumes in high dimensions

11

1

1 2

( ) 1 ( ) ( )1 ( )

nn n E

Final

A x x x Normalizing Factor Z known

Simulate with sequence E E E A

π π πltlt prop =

= sup sup sup =X

( )FinalZ Aπ=

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

Apply SMC to an augmented sequence of distributions For example

1

nn

nonπ

geX

( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x

( ) ( ) ( )1

1 1 1 1 |

n

n nn n n n n n

Use any conditionaldistribution on

x x xπ π π

minus

minus minus=

x x

X

12

bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible

Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 4: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References MK Pitt and N Shephard Filtering via Simulation Auxiliary Particle Filter JASA 1999

A Doucet SJ Godsill and C Andrieu On Sequential Monte Carlo sampling methods for Bayesian

filtering Stat Comp 2000

J Carpenter P Clifford and P Fearnhead An Improved Particle Filter for Non-linear Problems IEE 1999

A Kong JS Liu amp WH Wong Sequential Imputations and Bayesian Missing Data Problems JASA 1994

O Cappe E Moulines amp T Ryden Inference in Hidden Markov Models Springer-Verlag 2005

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

G Poyadjis A Doucet and SS Singh Maximum Likelihood Parameter Estimation using Particle Methods Joint Statistical Meeting 2005

N Gordon D J Salmond AFM Smith Novel Approach to nonlinear non Gaussian Bayesian state estimation IEE 1993

Particle Filters S Godsill 2009 (Video Lectures)

R Chen and JS Liu Predictive Updating Methods with Application to Bayesian Classification JRSS B 1996

4

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References C Andrieu and A Doucet Particle Filtering for Partially Observed Gaussian State-Space Models

JRSS B 2002

R Chen and J Liu Mixture Kalman Filters JRSSB 2000

A Doucet S J Godsill C Andrieu On SMC sampling methods for Bayesian Filtering Stat Comp 2000

N Kantas AD SS Singh and JM Maciejowski An overview of sequential Monte Carlo methods for parameter estimation in general state-space models in Proceedings IFAC System Identification (SySid) Meeting 2009

C Andrieu ADoucet amp R Holenstein Particle Markov chain Monte Carlo methods JRSS B 2010

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999

P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002

G Storvik Particle filters for state-space models with the presence of unknown static parameters IEEE Trans Signal Processing 2002

5

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References C Andrieu A Doucet and VB Tadic Online EM for parameter estimation in nonlinear-non

Gaussian state-space models Proc IEEE CDC 2005

G Poyadjis A Doucet and SS Singh Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation Biometrika 2011

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for

Genome Wide Transcript Analysis Technical report 2008

R Martinez-Cantin J Castellanos and N de Freitas Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm Marginal-SLAM International Conference on Robotics and Automation

C Andrieu AD amp R Holenstein Particle Markov chain Monte Carlo methods (with discussion) JRSS B 2010

A Doucet Sequential Monte Carlo Methods and Particle Filters List of Papers Codes and Viedo lectures on SMC and particle filters

Pierre Del Moral Feynman-Kac models and interacting particle systems

6

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian

Statistics 2006

P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009

A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004

Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of

Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011

A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008

A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011

7

Bayesian Scientific Computing Spring 2013 (N Zabaras) 8

SEQUENTIAL MONTE CARLO METHODS

FOR STATIC PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Classical Bayesian Inference

Global Optimization

It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn

1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=

[ ]( ) ( ) n

n nx x γπ π γprop rarr infin

9

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Global Optimization

Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence

1( ) ( | )n nx p xπ = y

[ ]( ) ( ) n

n nx x ηπ π ηprop rarr infin

10

( ) ( )1( ) ( ) ( )n n

n x x xη ηπ micro π minusprop

1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Rare Event Simulation

The required probability is then

The Boolean Satisfiability Problem

Computing Matrix Permanents

Computing volumes in high dimensions

11

1

1 2

( ) 1 ( ) ( )1 ( )

nn n E

Final

A x x x Normalizing Factor Z known

Simulate with sequence E E E A

π π πltlt prop =

= sup sup sup =X

( )FinalZ Aπ=

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

Apply SMC to an augmented sequence of distributions For example

1

nn

nonπ

geX

( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x

( ) ( ) ( )1

1 1 1 1 |

n

n nn n n n n n

Use any conditionaldistribution on

x x xπ π π

minus

minus minus=

x x

X

12

bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible

Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 5: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References C Andrieu and A Doucet Particle Filtering for Partially Observed Gaussian State-Space Models

JRSS B 2002

R Chen and J Liu Mixture Kalman Filters JRSSB 2000

A Doucet S J Godsill C Andrieu On SMC sampling methods for Bayesian Filtering Stat Comp 2000

N Kantas AD SS Singh and JM Maciejowski An overview of sequential Monte Carlo methods for parameter estimation in general state-space models in Proceedings IFAC System Identification (SySid) Meeting 2009

C Andrieu ADoucet amp R Holenstein Particle Markov chain Monte Carlo methods JRSS B 2010

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999

P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002

G Storvik Particle filters for state-space models with the presence of unknown static parameters IEEE Trans Signal Processing 2002

5

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References C Andrieu A Doucet and VB Tadic Online EM for parameter estimation in nonlinear-non

Gaussian state-space models Proc IEEE CDC 2005

G Poyadjis A Doucet and SS Singh Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation Biometrika 2011

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for

Genome Wide Transcript Analysis Technical report 2008

R Martinez-Cantin J Castellanos and N de Freitas Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm Marginal-SLAM International Conference on Robotics and Automation

C Andrieu AD amp R Holenstein Particle Markov chain Monte Carlo methods (with discussion) JRSS B 2010

A Doucet Sequential Monte Carlo Methods and Particle Filters List of Papers Codes and Viedo lectures on SMC and particle filters

Pierre Del Moral Feynman-Kac models and interacting particle systems

6

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian

Statistics 2006

P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009

A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004

Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of

Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011

A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008

A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011

7

Bayesian Scientific Computing Spring 2013 (N Zabaras) 8

SEQUENTIAL MONTE CARLO METHODS

FOR STATIC PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Classical Bayesian Inference

Global Optimization

It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn

1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=

[ ]( ) ( ) n

n nx x γπ π γprop rarr infin

9

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Global Optimization

Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence

1( ) ( | )n nx p xπ = y

[ ]( ) ( ) n

n nx x ηπ π ηprop rarr infin

10

( ) ( )1( ) ( ) ( )n n

n x x xη ηπ micro π minusprop

1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Rare Event Simulation

The required probability is then

The Boolean Satisfiability Problem

Computing Matrix Permanents

Computing volumes in high dimensions

11

1

1 2

( ) 1 ( ) ( )1 ( )

nn n E

Final

A x x x Normalizing Factor Z known

Simulate with sequence E E E A

π π πltlt prop =

= sup sup sup =X

( )FinalZ Aπ=

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

Apply SMC to an augmented sequence of distributions For example

1

nn

nonπ

geX

( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x

( ) ( ) ( )1

1 1 1 1 |

n

n nn n n n n n

Use any conditionaldistribution on

x x xπ π π

minus

minus minus=

x x

X

12

bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible

Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 6: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References C Andrieu A Doucet and VB Tadic Online EM for parameter estimation in nonlinear-non

Gaussian state-space models Proc IEEE CDC 2005

G Poyadjis A Doucet and SS Singh Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation Biometrika 2011

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for

Genome Wide Transcript Analysis Technical report 2008

R Martinez-Cantin J Castellanos and N de Freitas Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm Marginal-SLAM International Conference on Robotics and Automation

C Andrieu AD amp R Holenstein Particle Markov chain Monte Carlo methods (with discussion) JRSS B 2010

A Doucet Sequential Monte Carlo Methods and Particle Filters List of Papers Codes and Viedo lectures on SMC and particle filters

Pierre Del Moral Feynman-Kac models and interacting particle systems

6

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian

Statistics 2006

P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009

A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004

Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of

Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011

A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008

A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011

7

Bayesian Scientific Computing Spring 2013 (N Zabaras) 8

SEQUENTIAL MONTE CARLO METHODS

FOR STATIC PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Classical Bayesian Inference

Global Optimization

It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn

1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=

[ ]( ) ( ) n

n nx x γπ π γprop rarr infin

9

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Global Optimization

Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence

1( ) ( | )n nx p xπ = y

[ ]( ) ( ) n

n nx x ηπ π ηprop rarr infin

10

( ) ( )1( ) ( ) ( )n n

n x x xη ηπ micro π minusprop

1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Rare Event Simulation

The required probability is then

The Boolean Satisfiability Problem

Computing Matrix Permanents

Computing volumes in high dimensions

11

1

1 2

( ) 1 ( ) ( )1 ( )

nn n E

Final

A x x x Normalizing Factor Z known

Simulate with sequence E E E A

π π πltlt prop =

= sup sup sup =X

( )FinalZ Aπ=

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

Apply SMC to an augmented sequence of distributions For example

1

nn

nonπ

geX

( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x

( ) ( ) ( )1

1 1 1 1 |

n

n nn n n n n n

Use any conditionaldistribution on

x x xπ π π

minus

minus minus=

x x

X

12

bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible

Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 7: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian

Statistics 2006

P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009

A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004

Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of

Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011

A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008

A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006

C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011

7

Bayesian Scientific Computing Spring 2013 (N Zabaras) 8

SEQUENTIAL MONTE CARLO METHODS

FOR STATIC PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Classical Bayesian Inference

Global Optimization

It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn

1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=

[ ]( ) ( ) n

n nx x γπ π γprop rarr infin

9

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Global Optimization

Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence

1( ) ( | )n nx p xπ = y

[ ]( ) ( ) n

n nx x ηπ π ηprop rarr infin

10

( ) ( )1( ) ( ) ( )n n

n x x xη ηπ micro π minusprop

1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Rare Event Simulation

The required probability is then

The Boolean Satisfiability Problem

Computing Matrix Permanents

Computing volumes in high dimensions

11

1

1 2

( ) 1 ( ) ( )1 ( )

nn n E

Final

A x x x Normalizing Factor Z known

Simulate with sequence E E E A

π π πltlt prop =

= sup sup sup =X

( )FinalZ Aπ=

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

Apply SMC to an augmented sequence of distributions For example

1

nn

nonπ

geX

( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x

( ) ( ) ( )1

1 1 1 1 |

n

n nn n n n n n

Use any conditionaldistribution on

x x xπ π π

minus

minus minus=

x x

X

12

bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible

Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 8: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras) 8

SEQUENTIAL MONTE CARLO METHODS

FOR STATIC PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Classical Bayesian Inference

Global Optimization

It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn

1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=

[ ]( ) ( ) n

n nx x γπ π γprop rarr infin

9

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Global Optimization

Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence

1( ) ( | )n nx p xπ = y

[ ]( ) ( ) n

n nx x ηπ π ηprop rarr infin

10

( ) ( )1( ) ( ) ( )n n

n x x xη ηπ micro π minusprop

1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Rare Event Simulation

The required probability is then

The Boolean Satisfiability Problem

Computing Matrix Permanents

Computing volumes in high dimensions

11

1

1 2

( ) 1 ( ) ( )1 ( )

nn n E

Final

A x x x Normalizing Factor Z known

Simulate with sequence E E E A

π π πltlt prop =

= sup sup sup =X

( )FinalZ Aπ=

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

Apply SMC to an augmented sequence of distributions For example

1

nn

nonπ

geX

( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x

( ) ( ) ( )1

1 1 1 1 |

n

n nn n n n n n

Use any conditionaldistribution on

x x xπ π π

minus

minus minus=

x x

X

12

bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible

Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 9: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Classical Bayesian Inference

Global Optimization

It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn

1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=

[ ]( ) ( ) n

n nx x γπ π γprop rarr infin

9

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Global Optimization

Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence

1( ) ( | )n nx p xπ = y

[ ]( ) ( ) n

n nx x ηπ π ηprop rarr infin

10

( ) ( )1( ) ( ) ( )n n

n x x xη ηπ micro π minusprop

1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Rare Event Simulation

The required probability is then

The Boolean Satisfiability Problem

Computing Matrix Permanents

Computing volumes in high dimensions

11

1

1 2

( ) 1 ( ) ( )1 ( )

nn n E

Final

A x x x Normalizing Factor Z known

Simulate with sequence E E E A

π π πltlt prop =

= sup sup sup =X

( )FinalZ Aπ=

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

Apply SMC to an augmented sequence of distributions For example

1

nn

nonπ

geX

( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x

( ) ( ) ( )1

1 1 1 1 |

n

n nn n n n n n

Use any conditionaldistribution on

x x xπ π π

minus

minus minus=

x x

X

12

bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible

Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 10: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Sequential Bayesian Estimation

Global Optimization

Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence

1( ) ( | )n nx p xπ = y

[ ]( ) ( ) n

n nx x ηπ π ηprop rarr infin

10

( ) ( )1( ) ( ) ( )n n

n x x xη ηπ micro π minusprop

1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Rare Event Simulation

The required probability is then

The Boolean Satisfiability Problem

Computing Matrix Permanents

Computing volumes in high dimensions

11

1

1 2

( ) 1 ( ) ( )1 ( )

nn n E

Final

A x x x Normalizing Factor Z known

Simulate with sequence E E E A

π π πltlt prop =

= sup sup sup =X

( )FinalZ Aπ=

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

Apply SMC to an augmented sequence of distributions For example

1

nn

nonπ

geX

( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x

( ) ( ) ( )1

1 1 1 1 |

n

n nn n n n n n

Use any conditionaldistribution on

x x xπ π π

minus

minus minus=

x x

X

12

bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible

Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 11: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

This case can appear often for example

Rare Event Simulation

The required probability is then

The Boolean Satisfiability Problem

Computing Matrix Permanents

Computing volumes in high dimensions

11

1

1 2

( ) 1 ( ) ( )1 ( )

nn n E

Final

A x x x Normalizing Factor Z known

Simulate with sequence E E E A

π π πltlt prop =

= sup sup sup =X

( )FinalZ Aπ=

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

Apply SMC to an augmented sequence of distributions For example

1

nn

nonπ

geX

( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x

( ) ( ) ( )1

1 1 1 1 |

n

n nn n n n n n

Use any conditionaldistribution on

x x xπ π π

minus

minus minus=

x x

X

12

bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible

Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 12: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

What about if all distributions πn nge 1 are defined on X

Apply SMC to an augmented sequence of distributions For example

1

nn

nonπ

geX

( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x

( ) ( ) ( )1

1 1 1 1 |

n

n nn n n n n n

Use any conditionaldistribution on

x x xπ π π

minus

minus minus=

x x

X

12

bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible

Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 13: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Let be a sequence of probability distributions defined on X

st each is known up to a normalizing constant

We are interested to sample from and compute Zn sequentially

This is not the same as the standard SMC discussed earlier which was defined on Xn

13

( ) 1n nxπ

ge

( )n xπ

( )

( )

1n n nUnknown Known

x Z xπ γ=

( )n xπ

( ) ( )1 1 1|n n n npπ =x x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 14: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models We will construct an artificial distribution that is the product of the

target distribution from which we want to sample and a backward kernel L as follows

such that The importance weights now become

14

( ) ( )

( ) ( ) ( )

11 1

1

1 11

arg

|

n n n n n

n

n n n n k k kk

T etBackwardTransitions

Z where

x L x x

π γ

γ γ

minus

minus

+=

=

= prod

x x

x

( ) ( )1 1 1nn n n nx dπ π minus= int x x

( )( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( )( ) ( )

1

11 1 1 1 1 1

1 1 21 1 1 1 1

1 1 1 11

1 11

1 1 1

|

| |

||

n

n n k k kn n n n n n k

n n n nn n n n n n

n n k k k n n nk

n n n n nn

n n n n n

x L x xqW W W

q q x L x x q x x

x L x xW

x q x x

γγ γγ γ

γγ

minus

+minus minus =

minus minus minusminus minus

minus minus + minus=

minus minusminus

minus minus minus

= = =

=

prod

prod

x x xx x x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 15: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models

Any MCMC kernel can be used for the proposal q(|)

Since our interest is in computing only

there is no degeneracy problem

15

( ) ( )( ) ( )

1 11

1 1 1

||

n n n n nn n

n n n n n

x L x xW W

x q x xγ

γminus minus

minusminus minus minus

=

( ) ( )1 1 11 ( )nn n n n n nx d xZ

π π γminus= =int x x

P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 16: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2

Sample and augment

Compute the importance weights

Then the weighted approximation is We finally resample from to obtain

16

( )( ) ( )1~ |

i in n n nX q x X minus

( )( ) ( )( )1 1~

i iin n nnX Xminus minusX

( )

( )( )( )

1i

n

Ni

n n n nXi

x W xπ δ=

= sum

( ) ( )( ) ( )

( ) ( ) ( )11

( ) ( )1 ( ) ( ) ( )

1 11

|

|

i i in n nn n

i in n i i i

n n nn n

X L X XW W

X q X X

γ

γ

minusminus

minusminus minusminus

=

( ) ( )( )

1

1i

n

N

n n nXi

x xN

π δ=

= sum

( )( ) ~inn nX xπ

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 17: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then

the corresponding reversed kernel Ln-1

Using this easy choice we can simplify the expression for the weights

This is known as annealed importance sampling The particular choice has been used in physics and statistics

17

( ) ( ) ( )( )

1 11 1

|| n n n n n

n n nn n

x q x xL x x

πminus minus

minus minus =

( ) ( )( ) ( )

( )( ) ( )

( ) ( )( )

1 1 1 11 1

1 1 1 1 1 1

| || |

n n n n n n n n n n n nn n n

n n n n n n n n n n n n

x L x x x x q x xW W W

x q x x x q x x xγ γ π

γ γ πminus minus minus minus

minus minusminus minus minus minus minus minus

= = rArr

( )( )

( )1

1 ( )1 1

in n

n n in n

XW W

X

γ

γminus

minusminus minus

=

W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 18: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras) 18

ONLINE PARAMETER ESTIMATION

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 19: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ)

Given data y1n inference now is based on

We can use standard SMC but on the extended space Zn=(Xnθn)

Note that θ is a static parameter ndashdoes not involve with n

19

( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

prop

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 20: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates

In a Bayesian context the problem is even more severe as

Exponential stability assumption cannot hold as θn = θ1

To mitigate this problem introduce MCMC steps on θ

20

( ) ( ) ( )1 1| n np p pθθ θpropy y

C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B

2001

1log ( )nCnVar pNθ

le y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 21: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Online Bayesian Parameter Estimation When

This becomes an elegant algorithm that however still has the degeneracy problem since it uses

As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors

21

( ) ( )1 1 1 1| | n n n n n

FixedDimensional

p p sθ θ

=

y x x y

( )1 1|n np x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 22: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a

way that does not modify their distributions ie if at time n

then we would like a transition kernel such that if Then

In Markov chain language we want to be invariant

22

( )( )1|i

n npθ θ~ y

( )iθ

( )( ) ( ) ( )| i i in n n nMθ θ θ~

( )( )1|i

n npθ θ~ y

( ) nM θ θ ( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 23: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Using MCMC There is a whole literature on the design of such kernels known as

Markov chain Monte Carlo eg the Metropolis-Hastings algorithm

We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown

However we can use a very simple Gibbs update

Indeed note that if then if we have

Indeed note that

23

( )1| np θ y( ) ( )1 1|n np pθθ equivy y

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y ( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( )( ) ( )1 1 1 ~ |i i

n n n npθ θX x y

( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 24: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given an approximation at time n-1

Sample set and then approximate

Resample then sample to obtain

24

( )( )1

( ) ( )1~ |i

n

i in n nX f x X

θ minusminus

( )( ) ( )( )1 1 1~ i iin nn XminusX X

( ) ( ) ( )( ) ( )1 1 1

1 1 1 1 1 11

1| i in n

N

n n ni

pN θ

θ δ θminus minus

minus minus minus=

= sum X x y x

( )( )1 1 1~ |i

n n npX x y

( )( ) ( ) ( )( )( )( )

11 1

( )( ) ( )1 1 1

1| |iii

nn n

N ii inn n n n n n

ip W W g y

θθθ δ

minusminus=

= propsum X x y x X

( )( ) ( )1 1| i i

n n npθ θ~ y X

( ) ( ) ( )( )( )1

1 1 11

1| iin n

N

n n ni

pN θ

θ δ θ=

= sum X x y x

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 25: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Consider the following model

We set the prior on θ as

In this case

25

( )( )

( )

1 1

21 0

~ 01

~ 01

~ 0

n n v n n

n n w n n

X X V V

Y X W W

X

θ σ

σ

σ

+ += +

= +

N

N

N

~ ( 11)θ minusU

( ) ( ) ( )21 1 ( 11)

12 2 2

12 2

|

n n n n

n n

n n k k n kk k

p m

m x x x

θ θ σ θ

σ σ

minus

minus

minus= =

prop

= = sum sum

x y N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 26: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains

SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value

26

True Value

SMCMCMC

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 27: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ

according to we are still relying implicitly on the approximation of the joint distribution

As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution

27

( )1 1n np θ | x y( )1 1|n np x y

( )1 1|n np x y

Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ

21

1

1 |n

k nk

Xn =

sum y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 28: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Recursive MLE Parameter Estimation The log likelihood can be written as

Here we compute

Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution

28

( )1 1 11

l ( ) log ( ) log |n

n n k kk

p p Yθ θθ minus=

= = sumY Y

( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y

( ) 1 1 |n n n nX Y p xθ minusY

l ( )lim l( ) log ( | ) ( ) ( )n

ng y x dx dy d

n θ θ θθ θ micro λ micro

rarrinfin= = int int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 29: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro

algorithm

where

We thus need to approximate

29

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

( ) ( )( ) ( )

1 1 1 1

1 1

( ) | |

| |

n n n n n n n

n n n n n

p Y g Y x p x dx

g Y x p x dx

θ θ θ

θ θ

minus minus

minus

nabla = nabla +

nabla

intint

|Y Y

Y

( ) 1 1|n np xθ minusnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 30: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity

We thus can use a SMC approximation of the form

This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of

30

1 1 11 1 1 1 1 1 1

1 1 1

1 1 1 1 1 1 1 1

( )( ) ( )( )

log ( ) ( )

n nn n n n n

n n

n n n n n

pp Y p dp

p p d

θθ θ

θ

θ θ

minusminus minus minus

minus

minus minus minus

nablanabla =

= nabla

int

int

x |Y|Y x |Y xx |Y

x |Y x |Y x

( )( )

1 1 11

( ) ( )1 1 1

1( ) ( )

log ( )

in

Ni

n n n nXi

i in n n

p xN

p

θ

θ

α δ

α

minus=

minus

nabla =

= nabla

sum|Y x

X |Y

1 1 1( )n npθ minusx |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 31: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC

approximation (red)

31

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 32: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact

(red) vs SMC (bluegreen) Positive and negative particles are mixed

32

1( )n np xθnabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 33: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

An alternative identity is the following

It suggests using an SMC approximation of the form

Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as

33

1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y

( )( )

1 1 11

( )( ) ( ) 1 1

1 1 ( )1 1

1( ) ( )

( )log ( )( )

in

Ni

n n n nXi

ii i n n

n n n in n

p xN

p Xp Xp X

θ

θθ

θ

β δ

β

minus=

minusminus

minus

nabla =

nabla= nabla =

sum|Y x

|Y|Y|Y

( ) ( ) ( ) ( )1 1 1 1 1 1 1

1

1( ) ( ) ( ) ( )N

i i i jn n n n n n n n n

jp X f X x p x dx f X X

Nθ θθ θminus minus minus minus minus=

prop = sumint|Y | |Y |

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 34: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Importance Sampling for Sensitivity Estimation

The optimal filter satisfies

The derivatives satisfy the following

This way we obtain a simple recursion of as a function of and

34

( ) ( )

11

1

1 1 1 1 1 1

( )( )( )

( ) | | ( )

n nn n

n n n

n n n n n n n n n

xp xx dx

x g Y x f x x p x dx

θθ

θ

θ θ θ θ

ξξ

ξ minus minus minus minus

=

=

intint

|Y|Y|Y

|Y |Y

111 1

1 1

( )( )( ) ( )( ) ( )

n n nn nn n n n

n n n n n n

x dxxp x p xx dx x dx

θθθ θ

θ θ

ξξξ ξ

nablanablanabla = minus int

int int|Y|Y|Y |Y

|Y Y

1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 35: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC Approximation of the Sensitivity Sample

Compute

We have

35

( )

( )( ) ( )

( ) ( )1 1( ) ( )

( )( ) ( )

1( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1

( ) ( )| |

i ii in n n n

n ni in n n n

Nj

ni iji i i in n

n n n nN N Nj j j

n n nj j j

X Xq X Y q X Y

W W W

θ θ

θ θ

ξ ξα ρ

ρα ρβ

α α α

=

= = =

nabla= =

= = minussum

sum sum sum

Y Y

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1

1~ | | |

Ni j j j j j

n n n n n n n n nj

X q Y q Y X W p Y Xθ θη ηminus minus minus=

= propsum

( )

( )

( )1

1

( ) ( )1

1

( ) ( )

( ) ( )

in

in

Ni

n n n nXi

Ni i

n n n n nXi

p x W x

p x W x

θ

θ

δ

β δ

=

=

=

nabla =

sum

sum

|Y

|Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 36: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Consider the following model

We have

We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)

36

1n n v n

n n w n

X X VY X W

φ σσminus= +

= +

v wθ φ σ σ=

1log ( )n np xθnabla |Y

1( )npθnabla Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 37: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the

marginal (bottom)

37

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 38: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Consider the following model

We have We use SMC with N=1000 for batch and on-line estimation

For Batch Estimation

For online estimation

38

1

exp2

n n v n

nn n

X X VXY W

φ σ

β

minus= +

=

vθ φ σ β=

11 1log ( )nn n n npθθ θ γ

minusminus= + nabla Y

1 11 1 1log ( )nn n n n np Yθθ θ γ

minusminus minus= + nabla |Y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 39: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar

81-85

39

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 40: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates

converge towards the true values

40

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 41: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

SMC with MCMC for Parameter Estimation Given data y1n inference relies on

We have seen that SMC are rather inefficient to sample from

so we look here at an MCMC approach For a given parameter value θ SMC can estimate both

It will be useful if we can use SMC within MCMC to sample from

41

( ) ( ) ( )

( ) ( ) ( )

1 1 1 1 1

1 1

| | |

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

=

x y y x y

y y

( ) ( )1 1 1|n n np and pθ θx y y

( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R

Statist SocB (2010) 72 Part 3 pp 269ndash342

( )1| np θ y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 42: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from

and

However it is impossible to sample from

We can sample from instead but convergence will be slow

Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability

42

( )1 1| n np θ x y( )1 1| n np θx y

( )1 1| n np θx y

( )1 1 1 1| k n k k np x θ minus +y x x

( )1 1| n np θx y ( )1 1 1~ |n n nqX x y

( ) ( )( ) ( )

1 1 1 1

1 1 1 1

| |

| m n

|i 1 n n n n

n n n n

p q

p p

θ

θ

X y X y

X y X y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 43: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution

We know that the SMC approximation degrades an

n increases but we also have under mixing assumptions

Thus things degrade linearly rather than exponentially

These results are useful for small C

43

( )1 1| n np θx y

( )1 1| n np θx y

( )( )1 1 1( ) | i

n n n TV

Cnaw pN

θminus leX x yL

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 44: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method

(CBMC) in Molecular simulation There are however significant differences

CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on

44

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 45: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

MCMC We use as proposal distribution and store

the estimate

It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables

The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate

Finally accept with probability Otherwise stay where you are

45

D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076

( )1 1 1~ | n n nNp θX x y

( )1 |nNp θy

1nX

( )1 |nNp θy1nX ( ) ( )

1

1

m|

in 1|nN

nN

pp

θθ

yy

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 46: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example Consider the following model

We take the same prior proposal distribution for both CBMC and

SMC Acceptance rates for CBMC (left) and SMC (right) are shown

46

( )

( ) ( )

11 2

12

1

1 25 8cos(12 ) ~ 0152 1

~ 0001 ~ 052

kk k k k

k

kn k k

XX X k V VX

XY W W X

minusminus

minus

= + + ++

= +

N

N N

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 47: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example We now consider the case when both variances are

unknown and given inverse Gamma (conditionally conjugate) vague priors

We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as

importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step

of invariance distribution with the same proposal

In both cases we update the parameters according to Both algorithms are run with the same computational effort

47

2 2v wσ σ

( )11000 11000 |p θx y

( )1 1 k k k kp x x x yminus +|

( )1500 1500| p θ x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 48: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Example MH one at a time missed the mode and estimates

If X1n and θ are very correlated the MCMC algorithm is very

inefficient We will next discuss the Marginal Metropolis Hastings

21500

21500

2

| 137 ( )

| 99 ( )

10

v

v

v

MH

PF MCMC

σ

σ

σ

asymp asymp minus =

y

y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 49: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with

kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)

It can be easily shown that and

under weak assumptions

Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)

1 Given 119909119905minus1 generate 2 Compute

3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1

( 1) )~ ( tx xq x minus

( ) ( )( 1)

( 1)( 1) 1

( ) (min 1

))

(

tt

t tx

xx q x xx

x xqπρ

π

minusminus

minus minus

=

( ) ~ ( )iX x as iπ rarr infin

( ) ( ) ( )Transition Kernel

x x K x x dxπ π= int

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 50: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Consider the target

We use the following proposal distribution

Then the acceptance probability becomes

We will use SMC approximations to compute

( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y

( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p

θθ θ θ θ=x x x y

( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )

1 1 1 1

1 1 1 1

1

1

| min 1

|

min 1

|

|

|

|

n n n n

n n n n

n

n

p q

p q

p p q

p p qθ

θ

θ

θ

θ θ

θ θ

θ

θ θ θ

θ θ

=

x y x x

x y x x

y

y

( ) ( )1 1 1 |n n np and pθ θy x y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 51: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm Step 1

Step 2

Step 3 With probability

( ) ( )

( ) ( )

1 1( 1)

1 1 1

( 1) ( 1)

~ | ( 1)

|

n ni

n n n

Given i i p

Sample q i

Run an SMC to obtain p p

θ

θθ

θ

θ θ θ

minusminus minus

minus

X y

y x y

( )1 1 1 ~ |n n nSample pθX x y

( ) ( ) ( ) ( ) ( ) ( )

1

1( 1)

( 1) |

( 1) | ( 1min 1

)n

ni

p p q i

p p i q iθ

θ

θ θ θ

θ θ θminus

minus minus minus

y

y

( ) ( ) ( ) ( )

1 1 1 1( )

1 1 1 1( ) ( 1)

( ) ( )

( ) ( ) ( 1) ( 1)

n n n ni

n n n ni i

Set i X i p X p

Otherwise i X i p i X i p

θ θ

θ θ

θ θ

θ θ minus

=

= minus minus

y y

y y

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 52: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an

approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)

When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists

The higher N the better the performance of the algorithm N scales roughly linearly with n

Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)

Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model

with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal

Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical

systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 53: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras) 53

OPEN PROBLEMS

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 54: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Open Problems Thus SMC can be successfully used within MCMC

The major advantage of SMC is that it builds automatically efficient

very high dimensional proposal distributions based only on low-dimensional proposals

This is computationally expensive but it is very helpful in challenging problems

Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter

estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results

54

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics
Page 55: Introduction to Sequential Monte Carlo Methods

Bayesian Scientific Computing Spring 2013 (N Zabaras)

Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow

ourselves to modify the past

It is possible to modify the algorithm to take this into account

55

bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19

  • Slide Number 1
  • Contents
  • References
  • References
  • References
  • References
  • References
  • Slide Number 8
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • What about if all distributions pn nge1 are defined on X
  • SMC For Static Models
  • SMC For Static Models
  • SMC For Static Models
  • Algorithm SMC For Static Problems
  • SMC For Static Models Choice of L
  • Slide Number 18
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Online Bayesian Parameter Estimation
  • Artificial Dynamics for q
  • Using MCMC
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • SMC with MCMC for Parameter Estimation
  • Recursive MLE Parameter Estimation
  • Robbins-Monro Algorithm for MLE
  • Importance Sampling Estimation of Sensitivity
  • Degeneracy of the SMC Algorithm
  • Degeneracy of the SMC Algorithm
  • Marginal Importance Sampling for Sensitivity Estimation
  • Marginal Importance Sampling for Sensitivity Estimation
  • SMC Approximation of the Sensitivity
  • Example Linear Gaussian Model
  • Example Linear Gaussian Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • Example Stochastic Volatility Model
  • SMC with MCMC for Parameter Estimation
  • Gibbs Sampling Strategy
  • Gibbs Sampling Strategy
  • Configurational Biased Monte Carlo
  • MCMC
  • Example
  • Example
  • Example
  • Review of Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Marginal Metropolis Hastings Algorithm
  • Slide Number 53
  • Open Problems
  • Other Topics