introduction to sequential monte carlo methods
TRANSCRIPT
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Introduction to Sequential Monte Carlo Methods
SMC for Static Problems Online Bayesian Parameter Estimation
Prof Nicholas Zabaras Materials Process Design and Control Laboratory
Sibley School of Mechanical and Aerospace Engineering 101 Frank H T Rhodes Hall
Cornell University Ithaca NY 14853-3801
Email nzabarasgmailcom
URL httpmpdcmaecornelledu
April 27 2013
1
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Contents References and SMC Resources
SMC FOR STATIC PROBLEMS Important Application Problems Algorithm Choice of Reversed Kernel
ONLINE PARAMETER ESTIMATION Using MCMC Recursive MLE Parameter Estimation Importance Sampling Estimation of Sensitivity Degeneracy of the SMC Algorithm Marginal Importance Sampling for Sensitivity Estimation Examples Gibbs Sampling Strategy Marginal Metropolis Hastings Algorithm
OPEN PROBLEMS
2
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References CP Robert amp G Casella Monte Carlo Statistical Methods Chapter 11
JS Liu Monte Carlo Strategies in Scientific Computing Chapter 3 Springer-Verlag New York
A Doucet N De Freitas amp N Gordon (eds) Sequential Monte Carlo in Practice Springer-Verlag
2001
A Doucet N De Freitas NJ Gordon An introduction to Sequential Monte Carlo in SMC in Practice 2001
D Wilkison Stochastic Modelling for Systems Biology Second Edition 2006
E Ionides Inference for Nonlinear Dynamical Systems PNAS 2006
JS Liu and R Chen Sequential Monte Carlo methods for dynamic systems JASA 1998
A Doucet Sequential Monte Carlo Methods Short Course at SAMSI
A Doucet Sequential Monte Carlo Methods amp Particle Filters Resources
Pierre Del Moral Feynman-Kac models and interacting particle systems (SMC resources)
A Doucet Sequential Monte Carlo Methods Video Lectures 2007
N de Freitas and A Doucet Sequential MC Methods N de Freitas and A Doucet Video Lectures
2010
3
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References MK Pitt and N Shephard Filtering via Simulation Auxiliary Particle Filter JASA 1999
A Doucet SJ Godsill and C Andrieu On Sequential Monte Carlo sampling methods for Bayesian
filtering Stat Comp 2000
J Carpenter P Clifford and P Fearnhead An Improved Particle Filter for Non-linear Problems IEE 1999
A Kong JS Liu amp WH Wong Sequential Imputations and Bayesian Missing Data Problems JASA 1994
O Cappe E Moulines amp T Ryden Inference in Hidden Markov Models Springer-Verlag 2005
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
G Poyadjis A Doucet and SS Singh Maximum Likelihood Parameter Estimation using Particle Methods Joint Statistical Meeting 2005
N Gordon D J Salmond AFM Smith Novel Approach to nonlinear non Gaussian Bayesian state estimation IEE 1993
Particle Filters S Godsill 2009 (Video Lectures)
R Chen and JS Liu Predictive Updating Methods with Application to Bayesian Classification JRSS B 1996
4
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References C Andrieu and A Doucet Particle Filtering for Partially Observed Gaussian State-Space Models
JRSS B 2002
R Chen and J Liu Mixture Kalman Filters JRSSB 2000
A Doucet S J Godsill C Andrieu On SMC sampling methods for Bayesian Filtering Stat Comp 2000
N Kantas AD SS Singh and JM Maciejowski An overview of sequential Monte Carlo methods for parameter estimation in general state-space models in Proceedings IFAC System Identification (SySid) Meeting 2009
C Andrieu ADoucet amp R Holenstein Particle Markov chain Monte Carlo methods JRSS B 2010
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999
P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002
G Storvik Particle filters for state-space models with the presence of unknown static parameters IEEE Trans Signal Processing 2002
5
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References C Andrieu A Doucet and VB Tadic Online EM for parameter estimation in nonlinear-non
Gaussian state-space models Proc IEEE CDC 2005
G Poyadjis A Doucet and SS Singh Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation Biometrika 2011
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for
Genome Wide Transcript Analysis Technical report 2008
R Martinez-Cantin J Castellanos and N de Freitas Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm Marginal-SLAM International Conference on Robotics and Automation
C Andrieu AD amp R Holenstein Particle Markov chain Monte Carlo methods (with discussion) JRSS B 2010
A Doucet Sequential Monte Carlo Methods and Particle Filters List of Papers Codes and Viedo lectures on SMC and particle filters
Pierre Del Moral Feynman-Kac models and interacting particle systems
6
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian
Statistics 2006
P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009
A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004
Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of
Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011
A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008
A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011
7
Bayesian Scientific Computing Spring 2013 (N Zabaras) 8
SEQUENTIAL MONTE CARLO METHODS
FOR STATIC PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Classical Bayesian Inference
Global Optimization
It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn
1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=
[ ]( ) ( ) n
n nx x γπ π γprop rarr infin
9
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Global Optimization
Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence
1( ) ( | )n nx p xπ = y
[ ]( ) ( ) n
n nx x ηπ π ηprop rarr infin
10
( ) ( )1( ) ( ) ( )n n
n x x xη ηπ micro π minusprop
1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Rare Event Simulation
The required probability is then
The Boolean Satisfiability Problem
Computing Matrix Permanents
Computing volumes in high dimensions
11
1
1 2
( ) 1 ( ) ( )1 ( )
nn n E
Final
A x x x Normalizing Factor Z known
Simulate with sequence E E E A
π π πltlt prop =
= sup sup sup =X
( )FinalZ Aπ=
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
Apply SMC to an augmented sequence of distributions For example
1
nn
nonπ
geX
( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x
( ) ( ) ( )1
1 1 1 1 |
n
n nn n n n n n
Use any conditionaldistribution on
x x xπ π π
minus
minus minus=
x x
X
12
bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible
Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Contents References and SMC Resources
SMC FOR STATIC PROBLEMS Important Application Problems Algorithm Choice of Reversed Kernel
ONLINE PARAMETER ESTIMATION Using MCMC Recursive MLE Parameter Estimation Importance Sampling Estimation of Sensitivity Degeneracy of the SMC Algorithm Marginal Importance Sampling for Sensitivity Estimation Examples Gibbs Sampling Strategy Marginal Metropolis Hastings Algorithm
OPEN PROBLEMS
2
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References CP Robert amp G Casella Monte Carlo Statistical Methods Chapter 11
JS Liu Monte Carlo Strategies in Scientific Computing Chapter 3 Springer-Verlag New York
A Doucet N De Freitas amp N Gordon (eds) Sequential Monte Carlo in Practice Springer-Verlag
2001
A Doucet N De Freitas NJ Gordon An introduction to Sequential Monte Carlo in SMC in Practice 2001
D Wilkison Stochastic Modelling for Systems Biology Second Edition 2006
E Ionides Inference for Nonlinear Dynamical Systems PNAS 2006
JS Liu and R Chen Sequential Monte Carlo methods for dynamic systems JASA 1998
A Doucet Sequential Monte Carlo Methods Short Course at SAMSI
A Doucet Sequential Monte Carlo Methods amp Particle Filters Resources
Pierre Del Moral Feynman-Kac models and interacting particle systems (SMC resources)
A Doucet Sequential Monte Carlo Methods Video Lectures 2007
N de Freitas and A Doucet Sequential MC Methods N de Freitas and A Doucet Video Lectures
2010
3
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References MK Pitt and N Shephard Filtering via Simulation Auxiliary Particle Filter JASA 1999
A Doucet SJ Godsill and C Andrieu On Sequential Monte Carlo sampling methods for Bayesian
filtering Stat Comp 2000
J Carpenter P Clifford and P Fearnhead An Improved Particle Filter for Non-linear Problems IEE 1999
A Kong JS Liu amp WH Wong Sequential Imputations and Bayesian Missing Data Problems JASA 1994
O Cappe E Moulines amp T Ryden Inference in Hidden Markov Models Springer-Verlag 2005
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
G Poyadjis A Doucet and SS Singh Maximum Likelihood Parameter Estimation using Particle Methods Joint Statistical Meeting 2005
N Gordon D J Salmond AFM Smith Novel Approach to nonlinear non Gaussian Bayesian state estimation IEE 1993
Particle Filters S Godsill 2009 (Video Lectures)
R Chen and JS Liu Predictive Updating Methods with Application to Bayesian Classification JRSS B 1996
4
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References C Andrieu and A Doucet Particle Filtering for Partially Observed Gaussian State-Space Models
JRSS B 2002
R Chen and J Liu Mixture Kalman Filters JRSSB 2000
A Doucet S J Godsill C Andrieu On SMC sampling methods for Bayesian Filtering Stat Comp 2000
N Kantas AD SS Singh and JM Maciejowski An overview of sequential Monte Carlo methods for parameter estimation in general state-space models in Proceedings IFAC System Identification (SySid) Meeting 2009
C Andrieu ADoucet amp R Holenstein Particle Markov chain Monte Carlo methods JRSS B 2010
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999
P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002
G Storvik Particle filters for state-space models with the presence of unknown static parameters IEEE Trans Signal Processing 2002
5
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References C Andrieu A Doucet and VB Tadic Online EM for parameter estimation in nonlinear-non
Gaussian state-space models Proc IEEE CDC 2005
G Poyadjis A Doucet and SS Singh Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation Biometrika 2011
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for
Genome Wide Transcript Analysis Technical report 2008
R Martinez-Cantin J Castellanos and N de Freitas Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm Marginal-SLAM International Conference on Robotics and Automation
C Andrieu AD amp R Holenstein Particle Markov chain Monte Carlo methods (with discussion) JRSS B 2010
A Doucet Sequential Monte Carlo Methods and Particle Filters List of Papers Codes and Viedo lectures on SMC and particle filters
Pierre Del Moral Feynman-Kac models and interacting particle systems
6
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian
Statistics 2006
P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009
A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004
Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of
Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011
A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008
A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011
7
Bayesian Scientific Computing Spring 2013 (N Zabaras) 8
SEQUENTIAL MONTE CARLO METHODS
FOR STATIC PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Classical Bayesian Inference
Global Optimization
It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn
1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=
[ ]( ) ( ) n
n nx x γπ π γprop rarr infin
9
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Global Optimization
Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence
1( ) ( | )n nx p xπ = y
[ ]( ) ( ) n
n nx x ηπ π ηprop rarr infin
10
( ) ( )1( ) ( ) ( )n n
n x x xη ηπ micro π minusprop
1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Rare Event Simulation
The required probability is then
The Boolean Satisfiability Problem
Computing Matrix Permanents
Computing volumes in high dimensions
11
1
1 2
( ) 1 ( ) ( )1 ( )
nn n E
Final
A x x x Normalizing Factor Z known
Simulate with sequence E E E A
π π πltlt prop =
= sup sup sup =X
( )FinalZ Aπ=
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
Apply SMC to an augmented sequence of distributions For example
1
nn
nonπ
geX
( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x
( ) ( ) ( )1
1 1 1 1 |
n
n nn n n n n n
Use any conditionaldistribution on
x x xπ π π
minus
minus minus=
x x
X
12
bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible
Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References CP Robert amp G Casella Monte Carlo Statistical Methods Chapter 11
JS Liu Monte Carlo Strategies in Scientific Computing Chapter 3 Springer-Verlag New York
A Doucet N De Freitas amp N Gordon (eds) Sequential Monte Carlo in Practice Springer-Verlag
2001
A Doucet N De Freitas NJ Gordon An introduction to Sequential Monte Carlo in SMC in Practice 2001
D Wilkison Stochastic Modelling for Systems Biology Second Edition 2006
E Ionides Inference for Nonlinear Dynamical Systems PNAS 2006
JS Liu and R Chen Sequential Monte Carlo methods for dynamic systems JASA 1998
A Doucet Sequential Monte Carlo Methods Short Course at SAMSI
A Doucet Sequential Monte Carlo Methods amp Particle Filters Resources
Pierre Del Moral Feynman-Kac models and interacting particle systems (SMC resources)
A Doucet Sequential Monte Carlo Methods Video Lectures 2007
N de Freitas and A Doucet Sequential MC Methods N de Freitas and A Doucet Video Lectures
2010
3
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References MK Pitt and N Shephard Filtering via Simulation Auxiliary Particle Filter JASA 1999
A Doucet SJ Godsill and C Andrieu On Sequential Monte Carlo sampling methods for Bayesian
filtering Stat Comp 2000
J Carpenter P Clifford and P Fearnhead An Improved Particle Filter for Non-linear Problems IEE 1999
A Kong JS Liu amp WH Wong Sequential Imputations and Bayesian Missing Data Problems JASA 1994
O Cappe E Moulines amp T Ryden Inference in Hidden Markov Models Springer-Verlag 2005
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
G Poyadjis A Doucet and SS Singh Maximum Likelihood Parameter Estimation using Particle Methods Joint Statistical Meeting 2005
N Gordon D J Salmond AFM Smith Novel Approach to nonlinear non Gaussian Bayesian state estimation IEE 1993
Particle Filters S Godsill 2009 (Video Lectures)
R Chen and JS Liu Predictive Updating Methods with Application to Bayesian Classification JRSS B 1996
4
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References C Andrieu and A Doucet Particle Filtering for Partially Observed Gaussian State-Space Models
JRSS B 2002
R Chen and J Liu Mixture Kalman Filters JRSSB 2000
A Doucet S J Godsill C Andrieu On SMC sampling methods for Bayesian Filtering Stat Comp 2000
N Kantas AD SS Singh and JM Maciejowski An overview of sequential Monte Carlo methods for parameter estimation in general state-space models in Proceedings IFAC System Identification (SySid) Meeting 2009
C Andrieu ADoucet amp R Holenstein Particle Markov chain Monte Carlo methods JRSS B 2010
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999
P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002
G Storvik Particle filters for state-space models with the presence of unknown static parameters IEEE Trans Signal Processing 2002
5
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References C Andrieu A Doucet and VB Tadic Online EM for parameter estimation in nonlinear-non
Gaussian state-space models Proc IEEE CDC 2005
G Poyadjis A Doucet and SS Singh Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation Biometrika 2011
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for
Genome Wide Transcript Analysis Technical report 2008
R Martinez-Cantin J Castellanos and N de Freitas Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm Marginal-SLAM International Conference on Robotics and Automation
C Andrieu AD amp R Holenstein Particle Markov chain Monte Carlo methods (with discussion) JRSS B 2010
A Doucet Sequential Monte Carlo Methods and Particle Filters List of Papers Codes and Viedo lectures on SMC and particle filters
Pierre Del Moral Feynman-Kac models and interacting particle systems
6
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian
Statistics 2006
P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009
A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004
Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of
Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011
A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008
A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011
7
Bayesian Scientific Computing Spring 2013 (N Zabaras) 8
SEQUENTIAL MONTE CARLO METHODS
FOR STATIC PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Classical Bayesian Inference
Global Optimization
It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn
1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=
[ ]( ) ( ) n
n nx x γπ π γprop rarr infin
9
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Global Optimization
Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence
1( ) ( | )n nx p xπ = y
[ ]( ) ( ) n
n nx x ηπ π ηprop rarr infin
10
( ) ( )1( ) ( ) ( )n n
n x x xη ηπ micro π minusprop
1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Rare Event Simulation
The required probability is then
The Boolean Satisfiability Problem
Computing Matrix Permanents
Computing volumes in high dimensions
11
1
1 2
( ) 1 ( ) ( )1 ( )
nn n E
Final
A x x x Normalizing Factor Z known
Simulate with sequence E E E A
π π πltlt prop =
= sup sup sup =X
( )FinalZ Aπ=
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
Apply SMC to an augmented sequence of distributions For example
1
nn
nonπ
geX
( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x
( ) ( ) ( )1
1 1 1 1 |
n
n nn n n n n n
Use any conditionaldistribution on
x x xπ π π
minus
minus minus=
x x
X
12
bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible
Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References MK Pitt and N Shephard Filtering via Simulation Auxiliary Particle Filter JASA 1999
A Doucet SJ Godsill and C Andrieu On Sequential Monte Carlo sampling methods for Bayesian
filtering Stat Comp 2000
J Carpenter P Clifford and P Fearnhead An Improved Particle Filter for Non-linear Problems IEE 1999
A Kong JS Liu amp WH Wong Sequential Imputations and Bayesian Missing Data Problems JASA 1994
O Cappe E Moulines amp T Ryden Inference in Hidden Markov Models Springer-Verlag 2005
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
G Poyadjis A Doucet and SS Singh Maximum Likelihood Parameter Estimation using Particle Methods Joint Statistical Meeting 2005
N Gordon D J Salmond AFM Smith Novel Approach to nonlinear non Gaussian Bayesian state estimation IEE 1993
Particle Filters S Godsill 2009 (Video Lectures)
R Chen and JS Liu Predictive Updating Methods with Application to Bayesian Classification JRSS B 1996
4
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References C Andrieu and A Doucet Particle Filtering for Partially Observed Gaussian State-Space Models
JRSS B 2002
R Chen and J Liu Mixture Kalman Filters JRSSB 2000
A Doucet S J Godsill C Andrieu On SMC sampling methods for Bayesian Filtering Stat Comp 2000
N Kantas AD SS Singh and JM Maciejowski An overview of sequential Monte Carlo methods for parameter estimation in general state-space models in Proceedings IFAC System Identification (SySid) Meeting 2009
C Andrieu ADoucet amp R Holenstein Particle Markov chain Monte Carlo methods JRSS B 2010
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999
P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002
G Storvik Particle filters for state-space models with the presence of unknown static parameters IEEE Trans Signal Processing 2002
5
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References C Andrieu A Doucet and VB Tadic Online EM for parameter estimation in nonlinear-non
Gaussian state-space models Proc IEEE CDC 2005
G Poyadjis A Doucet and SS Singh Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation Biometrika 2011
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for
Genome Wide Transcript Analysis Technical report 2008
R Martinez-Cantin J Castellanos and N de Freitas Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm Marginal-SLAM International Conference on Robotics and Automation
C Andrieu AD amp R Holenstein Particle Markov chain Monte Carlo methods (with discussion) JRSS B 2010
A Doucet Sequential Monte Carlo Methods and Particle Filters List of Papers Codes and Viedo lectures on SMC and particle filters
Pierre Del Moral Feynman-Kac models and interacting particle systems
6
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian
Statistics 2006
P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009
A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004
Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of
Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011
A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008
A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011
7
Bayesian Scientific Computing Spring 2013 (N Zabaras) 8
SEQUENTIAL MONTE CARLO METHODS
FOR STATIC PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Classical Bayesian Inference
Global Optimization
It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn
1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=
[ ]( ) ( ) n
n nx x γπ π γprop rarr infin
9
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Global Optimization
Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence
1( ) ( | )n nx p xπ = y
[ ]( ) ( ) n
n nx x ηπ π ηprop rarr infin
10
( ) ( )1( ) ( ) ( )n n
n x x xη ηπ micro π minusprop
1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Rare Event Simulation
The required probability is then
The Boolean Satisfiability Problem
Computing Matrix Permanents
Computing volumes in high dimensions
11
1
1 2
( ) 1 ( ) ( )1 ( )
nn n E
Final
A x x x Normalizing Factor Z known
Simulate with sequence E E E A
π π πltlt prop =
= sup sup sup =X
( )FinalZ Aπ=
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
Apply SMC to an augmented sequence of distributions For example
1
nn
nonπ
geX
( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x
( ) ( ) ( )1
1 1 1 1 |
n
n nn n n n n n
Use any conditionaldistribution on
x x xπ π π
minus
minus minus=
x x
X
12
bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible
Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References C Andrieu and A Doucet Particle Filtering for Partially Observed Gaussian State-Space Models
JRSS B 2002
R Chen and J Liu Mixture Kalman Filters JRSSB 2000
A Doucet S J Godsill C Andrieu On SMC sampling methods for Bayesian Filtering Stat Comp 2000
N Kantas AD SS Singh and JM Maciejowski An overview of sequential Monte Carlo methods for parameter estimation in general state-space models in Proceedings IFAC System Identification (SySid) Meeting 2009
C Andrieu ADoucet amp R Holenstein Particle Markov chain Monte Carlo methods JRSS B 2010
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999
P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002
G Storvik Particle filters for state-space models with the presence of unknown static parameters IEEE Trans Signal Processing 2002
5
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References C Andrieu A Doucet and VB Tadic Online EM for parameter estimation in nonlinear-non
Gaussian state-space models Proc IEEE CDC 2005
G Poyadjis A Doucet and SS Singh Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation Biometrika 2011
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for
Genome Wide Transcript Analysis Technical report 2008
R Martinez-Cantin J Castellanos and N de Freitas Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm Marginal-SLAM International Conference on Robotics and Automation
C Andrieu AD amp R Holenstein Particle Markov chain Monte Carlo methods (with discussion) JRSS B 2010
A Doucet Sequential Monte Carlo Methods and Particle Filters List of Papers Codes and Viedo lectures on SMC and particle filters
Pierre Del Moral Feynman-Kac models and interacting particle systems
6
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian
Statistics 2006
P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009
A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004
Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of
Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011
A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008
A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011
7
Bayesian Scientific Computing Spring 2013 (N Zabaras) 8
SEQUENTIAL MONTE CARLO METHODS
FOR STATIC PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Classical Bayesian Inference
Global Optimization
It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn
1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=
[ ]( ) ( ) n
n nx x γπ π γprop rarr infin
9
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Global Optimization
Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence
1( ) ( | )n nx p xπ = y
[ ]( ) ( ) n
n nx x ηπ π ηprop rarr infin
10
( ) ( )1( ) ( ) ( )n n
n x x xη ηπ micro π minusprop
1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Rare Event Simulation
The required probability is then
The Boolean Satisfiability Problem
Computing Matrix Permanents
Computing volumes in high dimensions
11
1
1 2
( ) 1 ( ) ( )1 ( )
nn n E
Final
A x x x Normalizing Factor Z known
Simulate with sequence E E E A
π π πltlt prop =
= sup sup sup =X
( )FinalZ Aπ=
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
Apply SMC to an augmented sequence of distributions For example
1
nn
nonπ
geX
( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x
( ) ( ) ( )1
1 1 1 1 |
n
n nn n n n n n
Use any conditionaldistribution on
x x xπ π π
minus
minus minus=
x x
X
12
bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible
Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References C Andrieu A Doucet and VB Tadic Online EM for parameter estimation in nonlinear-non
Gaussian state-space models Proc IEEE CDC 2005
G Poyadjis A Doucet and SS Singh Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation Biometrika 2011
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for
Genome Wide Transcript Analysis Technical report 2008
R Martinez-Cantin J Castellanos and N de Freitas Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm Marginal-SLAM International Conference on Robotics and Automation
C Andrieu AD amp R Holenstein Particle Markov chain Monte Carlo methods (with discussion) JRSS B 2010
A Doucet Sequential Monte Carlo Methods and Particle Filters List of Papers Codes and Viedo lectures on SMC and particle filters
Pierre Del Moral Feynman-Kac models and interacting particle systems
6
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian
Statistics 2006
P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009
A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004
Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of
Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011
A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008
A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011
7
Bayesian Scientific Computing Spring 2013 (N Zabaras) 8
SEQUENTIAL MONTE CARLO METHODS
FOR STATIC PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Classical Bayesian Inference
Global Optimization
It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn
1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=
[ ]( ) ( ) n
n nx x γπ π γprop rarr infin
9
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Global Optimization
Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence
1( ) ( | )n nx p xπ = y
[ ]( ) ( ) n
n nx x ηπ π ηprop rarr infin
10
( ) ( )1( ) ( ) ( )n n
n x x xη ηπ micro π minusprop
1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Rare Event Simulation
The required probability is then
The Boolean Satisfiability Problem
Computing Matrix Permanents
Computing volumes in high dimensions
11
1
1 2
( ) 1 ( ) ( )1 ( )
nn n E
Final
A x x x Normalizing Factor Z known
Simulate with sequence E E E A
π π πltlt prop =
= sup sup sup =X
( )FinalZ Aπ=
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
Apply SMC to an augmented sequence of distributions For example
1
nn
nonπ
geX
( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x
( ) ( ) ( )1
1 1 1 1 |
n
n nn n n n n n
Use any conditionaldistribution on
x x xπ π π
minus
minus minus=
x x
X
12
bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible
Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
References P Del Moral A Doucet and A Jasra Sequential Monte Carlo samplers JRSSB 2006
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian
Statistics 2006
P Del Moral A Doucet amp SS Singh Forward Smoothing using Sequential Monte Carlo technical report Cambridge University 2009
A Doucet Short Courses Lecture Notes (A B C) P Del Moral Feynman-Kac Formulae Springer-Verlag 2004
Sequential MC Methods M Davy 2007 A Doucet A Johansen Particle Filtering and Smoothing Fifteen years later in Handbook of
Nonlinear Filtering (edts D Crisan and B Rozovsky) Oxford Univ Press 2011
A Johansen and A Doucet A Note on Auxiliary Particle Filters Stat Proba Letters 2008
A Doucet et al Efficient Block Sampling Strategies for Sequential Monte Carlo (with M Briers amp S Senecal) JCGS 2006
C Caron R Gottardo and A Doucet On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis Stat Comput 2011
7
Bayesian Scientific Computing Spring 2013 (N Zabaras) 8
SEQUENTIAL MONTE CARLO METHODS
FOR STATIC PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Classical Bayesian Inference
Global Optimization
It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn
1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=
[ ]( ) ( ) n
n nx x γπ π γprop rarr infin
9
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Global Optimization
Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence
1( ) ( | )n nx p xπ = y
[ ]( ) ( ) n
n nx x ηπ π ηprop rarr infin
10
( ) ( )1( ) ( ) ( )n n
n x x xη ηπ micro π minusprop
1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Rare Event Simulation
The required probability is then
The Boolean Satisfiability Problem
Computing Matrix Permanents
Computing volumes in high dimensions
11
1
1 2
( ) 1 ( ) ( )1 ( )
nn n E
Final
A x x x Normalizing Factor Z known
Simulate with sequence E E E A
π π πltlt prop =
= sup sup sup =X
( )FinalZ Aπ=
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
Apply SMC to an augmented sequence of distributions For example
1
nn
nonπ
geX
( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x
( ) ( ) ( )1
1 1 1 1 |
n
n nn n n n n n
Use any conditionaldistribution on
x x xπ π π
minus
minus minus=
x x
X
12
bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible
Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras) 8
SEQUENTIAL MONTE CARLO METHODS
FOR STATIC PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Classical Bayesian Inference
Global Optimization
It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn
1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=
[ ]( ) ( ) n
n nx x γπ π γprop rarr infin
9
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Global Optimization
Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence
1( ) ( | )n nx p xπ = y
[ ]( ) ( ) n
n nx x ηπ π ηprop rarr infin
10
( ) ( )1( ) ( ) ( )n n
n x x xη ηπ micro π minusprop
1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Rare Event Simulation
The required probability is then
The Boolean Satisfiability Problem
Computing Matrix Permanents
Computing volumes in high dimensions
11
1
1 2
( ) 1 ( ) ( )1 ( )
nn n E
Final
A x x x Normalizing Factor Z known
Simulate with sequence E E E A
π π πltlt prop =
= sup sup sup =X
( )FinalZ Aπ=
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
Apply SMC to an augmented sequence of distributions For example
1
nn
nonπ
geX
( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x
( ) ( ) ( )1
1 1 1 1 |
n
n nn n n n n n
Use any conditionaldistribution on
x x xπ π π
minus
minus minus=
x x
X
12
bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible
Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Classical Bayesian Inference
Global Optimization
It is not clear how SMC can be related to this problem since Xn=X instead of Xn=Xn
1( ) ( | )n nx p xπ = y( ) ( )n x xπ π=
[ ]( ) ( ) n
n nx x γπ π γprop rarr infin
9
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Global Optimization
Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence
1( ) ( | )n nx p xπ = y
[ ]( ) ( ) n
n nx x ηπ π ηprop rarr infin
10
( ) ( )1( ) ( ) ( )n n
n x x xη ηπ micro π minusprop
1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Rare Event Simulation
The required probability is then
The Boolean Satisfiability Problem
Computing Matrix Permanents
Computing volumes in high dimensions
11
1
1 2
( ) 1 ( ) ( )1 ( )
nn n E
Final
A x x x Normalizing Factor Z known
Simulate with sequence E E E A
π π πltlt prop =
= sup sup sup =X
( )FinalZ Aπ=
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
Apply SMC to an augmented sequence of distributions For example
1
nn
nonπ
geX
( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x
( ) ( ) ( )1
1 1 1 1 |
n
n nn n n n n n
Use any conditionaldistribution on
x x xπ π π
minus
minus minus=
x x
X
12
bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible
Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Sequential Bayesian Estimation
Global Optimization
Sampling from a fixed distribution where micro(x) is an easy to sample from distribution Use a sequence
1( ) ( | )n nx p xπ = y
[ ]( ) ( ) n
n nx x ηπ π ηprop rarr infin
10
( ) ( )1( ) ( ) ( )n n
n x x xη ηπ micro π minusprop
1 2 11 0 ( ) ( ) ( ) ( )Final Finali e x x x xη η η π micro π π= gt gt gt = prop prop
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Rare Event Simulation
The required probability is then
The Boolean Satisfiability Problem
Computing Matrix Permanents
Computing volumes in high dimensions
11
1
1 2
( ) 1 ( ) ( )1 ( )
nn n E
Final
A x x x Normalizing Factor Z known
Simulate with sequence E E E A
π π πltlt prop =
= sup sup sup =X
( )FinalZ Aπ=
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
Apply SMC to an augmented sequence of distributions For example
1
nn
nonπ
geX
( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x
( ) ( ) ( )1
1 1 1 1 |
n
n nn n n n n n
Use any conditionaldistribution on
x x xπ π π
minus
minus minus=
x x
X
12
bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible
Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
This case can appear often for example
Rare Event Simulation
The required probability is then
The Boolean Satisfiability Problem
Computing Matrix Permanents
Computing volumes in high dimensions
11
1
1 2
( ) 1 ( ) ( )1 ( )
nn n E
Final
A x x x Normalizing Factor Z known
Simulate with sequence E E E A
π π πltlt prop =
= sup sup sup =X
( )FinalZ Aπ=
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
Apply SMC to an augmented sequence of distributions For example
1
nn
nonπ
geX
( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x
( ) ( ) ( )1
1 1 1 1 |
n
n nn n n n n n
Use any conditionaldistribution on
x x xπ π π
minus
minus minus=
x x
X
12
bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible
Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
What about if all distributions πn nge 1 are defined on X
Apply SMC to an augmented sequence of distributions For example
1
nn
nonπ
geX
( ) ( )1 1 1 1n n n n n nx d xπ πminus minus =int x x
( ) ( ) ( )1
1 1 1 1 |
n
n nn n n n n n
Use any conditionaldistribution on
x x xπ π π
minus
minus minus=
x x
X
12
bull C Jarzynski Nonequilibrium Equality for Free Energy Differences Phys Rev Lett 78 2690ndash2693 (1997) bull Gavin E Crooks Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible
Markovian Systems Journal of Statistical Physics March 1998 Volume 90 Issue 5-6 pp 1481-1487 bull W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001 bull Neal R M (2001) ``Annealed importance sampling Statistics and Computing vol 11 pp 125-139 bull N Chopin A sequential partile filter method for static problems Biometrika 2002 89 3 pp 539-551 bull P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian bull Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Let be a sequence of probability distributions defined on X
st each is known up to a normalizing constant
We are interested to sample from and compute Zn sequentially
This is not the same as the standard SMC discussed earlier which was defined on Xn
13
( ) 1n nxπ
ge
( )n xπ
( )
( )
1n n nUnknown Known
x Z xπ γ=
( )n xπ
( ) ( )1 1 1|n n n npπ =x x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models We will construct an artificial distribution that is the product of the
target distribution from which we want to sample and a backward kernel L as follows
such that The importance weights now become
14
( ) ( )
( ) ( ) ( )
11 1
1
1 11
arg
|
n n n n n
n
n n n n k k kk
T etBackwardTransitions
Z where
x L x x
π γ
γ γ
minus
minus
+=
=
= prod
x x
x
( ) ( )1 1 1nn n n nx dπ π minus= int x x
( )( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )( ) ( )
1
11 1 1 1 1 1
1 1 21 1 1 1 1
1 1 1 11
1 11
1 1 1
|
| |
||
n
n n k k kn n n n n n k
n n n nn n n n n n
n n k k k n n nk
n n n n nn
n n n n n
x L x xqW W W
q q x L x x q x x
x L x xW
x q x x
γγ γγ γ
γγ
minus
+minus minus =
minus minus minusminus minus
minus minus + minus=
minus minusminus
minus minus minus
= = =
=
prod
prod
x x xx x x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models
Any MCMC kernel can be used for the proposal q(|)
Since our interest is in computing only
there is no degeneracy problem
15
( ) ( )( ) ( )
1 11
1 1 1
||
n n n n nn n
n n n n n
x L x xW W
x q x xγ
γminus minus
minusminus minus minus
=
( ) ( )1 1 11 ( )nn n n n n nx d xZ
π π γminus= =int x x
P Del Moral A Doucet and A Jasra Sequential Monte Carlo for Bayesian Computation Bayesian Statistics 2006
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Algorithm SMC For Static Problems (1) Initialize at time n=1 (2) At time nge2
Sample and augment
Compute the importance weights
Then the weighted approximation is We finally resample from to obtain
16
( )( ) ( )1~ |
i in n n nX q x X minus
( )( ) ( )( )1 1~
i iin n nnX Xminus minusX
( )
( )( )( )
1i
n
Ni
n n n nXi
x W xπ δ=
= sum
( ) ( )( ) ( )
( ) ( ) ( )11
( ) ( )1 ( ) ( ) ( )
1 11
|
|
i i in n nn n
i in n i i i
n n nn n
X L X XW W
X q X X
γ
γ
minusminus
minusminus minusminus
=
( ) ( )( )
1
1i
n
N
n n nXi
x xN
π δ=
= sum
( )( ) ~inn nX xπ
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC For Static Models Choice of L A default choice is first using a πn-invariant MCMC kernel qn and then
the corresponding reversed kernel Ln-1
Using this easy choice we can simplify the expression for the weights
This is known as annealed importance sampling The particular choice has been used in physics and statistics
17
( ) ( ) ( )( )
1 11 1
|| n n n n n
n n nn n
x q x xL x x
xπ
πminus minus
minus minus =
( ) ( )( ) ( )
( )( ) ( )
( ) ( )( )
1 1 1 11 1
1 1 1 1 1 1
| || |
n n n n n n n n n n n nn n n
n n n n n n n n n n n n
x L x x x x q x xW W W
x q x x x q x x xγ γ π
γ γ πminus minus minus minus
minus minusminus minus minus minus minus minus
= = rArr
( )( )
( )1
1 ( )1 1
in n
n n in n
XW W
X
γ
γminus
minusminus minus
=
W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B 2001
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras) 18
ONLINE PARAMETER ESTIMATION
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ)
Given data y1n inference now is based on
We can use standard SMC but on the extended space Zn=(Xnθn)
Note that θ is a static parameter ndashdoes not involve with n
19
( ) ( )1 1 1 1~ () | ~ |n n n n nX and X X x f x xθmicro minus minus minus=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
prop
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θminusminus minus= =
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation For fixed θ using our earlier error estimates
In a Bayesian context the problem is even more severe as
Exponential stability assumption cannot hold as θn = θ1
To mitigate this problem introduce MCMC steps on θ
20
( ) ( ) ( )1 1| n np p pθθ θpropy y
C Andrieu N De Freitas and A Doucet Sequential MCMC for Bayesian Model Selection Proc IEEE Workshop HOS 1999 P Fearnhead MCMC sufficient statistics and particle filters JCGS 2002 W Gilks and C Berzuini Following a moving target MC inference for dynamic Bayesian Models JRSS B
2001
1log ( )nCnVar pNθ
le y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Online Bayesian Parameter Estimation When
This becomes an elegant algorithm that however still has the degeneracy problem since it uses
As dim(Zn) = dim(Xn) + dim(θ) such methods are not recommended for high-dimensional θ especially with vague priors
21
( ) ( )1 1 1 1| | n n n n n
FixedDimensional
p p sθ θ
=
y x x y
( )1 1|n np x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Artificial Dynamics for θ A solution consists of perturbing the location of the particles in a
way that does not modify their distributions ie if at time n
then we would like a transition kernel such that if Then
In Markov chain language we want to be invariant
22
( )( )1|i
n npθ θ~ y
( )iθ
( )( ) ( ) ( )| i i in n n nMθ θ θ~
( )( )1|i
n npθ θ~ y
( ) nM θ θ ( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Using MCMC There is a whole literature on the design of such kernels known as
Markov chain Monte Carlo eg the Metropolis-Hastings algorithm
We cannot use these algorithms directly as would need to be known up to a normalizing constant but is unknown
However we can use a very simple Gibbs update
Indeed note that if then if we have
Indeed note that
23
( )1| np θ y( ) ( )1 1|n np pθθ equivy y
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y ( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( )( ) ( )1 1 1 ~ |i i
n n n npθ θX x y
( ) ( ) ( )1 1 1 1 1| | |n n n n np p d pθ θ θ θ=int x y y x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given an approximation at time n-1
Sample set and then approximate
Resample then sample to obtain
24
( )( )1
( ) ( )1~ |i
n
i in n nX f x X
θ minusminus
( )( ) ( )( )1 1 1~ i iin nn XminusX X
( ) ( ) ( )( ) ( )1 1 1
1 1 1 1 1 11
1| i in n
N
n n ni
pN θ
θ δ θminus minus
minus minus minus=
= sum X x y x
( )( )1 1 1~ |i
n n npX x y
( )( ) ( ) ( )( )( )( )
11 1
( )( ) ( )1 1 1
1| |iii
nn n
N ii inn n n n n n
ip W W g y
θθθ δ
minusminus=
= propsum X x y x X
( )( ) ( )1 1| i i
n n npθ θ~ y X
( ) ( ) ( )( )( )1
1 1 11
1| iin n
N
n n ni
pN θ
θ δ θ=
= sum X x y x
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Consider the following model
We set the prior on θ as
In this case
25
( )( )
( )
1 1
21 0
~ 01
~ 01
~ 0
n n v n n
n n w n n
X X V V
Y X W W
X
θ σ
σ
σ
+ += +
= +
N
N
N
~ ( 11)θ minusU
( ) ( ) ( )21 1 ( 11)
12 2 2
12 2
|
n n n n
n n
n n k k n kk k
p m
m x x x
θ θ σ θ
σ σ
minus
minus
minus= =
prop
= = sum sum
x y N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation We use SMC with Gibbs step The degeneracy problem remains
SMC estimate is shown of [θ|y1n] as n increases (From A Doucet lecture notes) The parameter converges to the wrong value
26
True Value
SMCMCMC
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation The problem with this approach is that although we move θ
according to we are still relying implicitly on the approximation of the joint distribution
As n increases the SMC approximation of deteriorates and the algorithm cannot converge towards the right solution
27
( )1 1n np θ | x y( )1 1|n np x y
( )1 1|n np x y
Sufficient statistics computed exactly through the Kalman filter (blue) vs SMC approximation (red) for a fixed value of θ
21
1
1 |n
k nk
Xn =
sum y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Recursive MLE Parameter Estimation The log likelihood can be written as
Here we compute
Under regularity assumptions is an inhomogeneous Markov chain whish converges towards its invariant distribution
28
( )1 1 11
l ( ) log ( ) log |n
n n k kk
p p Yθ θθ minus=
= = sumY Y
( ) ( ) ( )1 1 1 1| | |k k k k k k kp Y g Y x p x dxθ θ θminus minus= intY Y
( ) 1 1 |n n n nX Y p xθ minusY
l ( )lim l( ) log ( | ) ( ) ( )n
ng y x dx dy d
n θ θ θθ θ micro λ micro
rarrinfin= = int int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Robbins- Monro Algorithm for MLE We can maximize l(q) by using the gradient and the Robbins-Monro
algorithm
where
We thus need to approximate
29
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
( ) ( )( ) ( )
1 1 1 1
1 1
( ) | |
| |
n n n n n n n
n n n n n
p Y g Y x p x dx
g Y x p x dx
θ θ θ
θ θ
minus minus
minus
nabla = nabla +
nabla
intint
|Y Y
Y
( ) 1 1|n np xθ minusnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Importance Sampling Estimation of Sensitivity The various proposed approximations are based on the identity
We thus can use a SMC approximation of the form
This is simple but inefficient as based on IS on spaces of increasing dimension and in addition it relies on being able to obtain a good approximation of
30
1 1 11 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
( )( ) ( )( )
log ( ) ( )
n nn n n n n
n n
n n n n n
pp Y p dp
p p d
θθ θ
θ
θ θ
minusminus minus minus
minus
minus minus minus
nablanabla =
= nabla
int
int
x |Y|Y x |Y xx |Y
x |Y x |Y x
( )( )
1 1 11
( ) ( )1 1 1
1( ) ( )
log ( )
in
Ni
n n n nXi
i in n n
p xN
p
θ
θ
α δ
α
minus=
minus
nabla =
= nabla
sum|Y x
X |Y
1 1 1( )n npθ minusx |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Score for linear Gaussian models exact (blue) vs SMC
approximation (red)
31
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Degeneracy of the SMC Algorithm Signed measure ) for linear Gaussian models exact
(red) vs SMC (bluegreen) Positive and negative particles are mixed
32
1( )n np xθnabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
An alternative identity is the following
It suggests using an SMC approximation of the form
Such an approximation relies on a pointwise approximation of both the filter and its derivative The computational complexity is O(N2) as
33
1 1 1 1 1 1( ) log ( ) ( )n n n n n np x p x p xθ θ θminus minus minusnabla = nabla|Y |Y |Y
( )( )
1 1 11
( )( ) ( ) 1 1
1 1 ( )1 1
1( ) ( )
( )log ( )( )
in
Ni
n n n nXi
ii i n n
n n n in n
p xN
p Xp Xp X
θ
θθ
θ
β δ
β
minus=
minusminus
minus
nabla =
nabla= nabla =
sum|Y x
|Y|Y|Y
( ) ( ) ( ) ( )1 1 1 1 1 1 1
1
1( ) ( ) ( ) ( )N
i i i jn n n n n n n n n
jp X f X x p x dx f X X
Nθ θθ θminus minus minus minus minus=
prop = sumint|Y | |Y |
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Importance Sampling for Sensitivity Estimation
The optimal filter satisfies
The derivatives satisfy the following
This way we obtain a simple recursion of as a function of and
34
( ) ( )
11
1
1 1 1 1 1 1
( )( )( )
( ) | | ( )
n nn n
n n n
n n n n n n n n n
xp xx dx
x g Y x f x x p x dx
θθ
θ
θ θ θ θ
ξξ
ξ minus minus minus minus
=
=
intint
|Y|Y|Y
|Y |Y
111 1
1 1
( )( )( ) ( )( ) ( )
n n nn nn n n n
n n n n n n
x dxxp x p xx dx x dx
θθθ θ
θ θ
ξξξ ξ
nablanablanabla = minus int
int int|Y|Y|Y |Y
|Y Y
1( )n np xθnabla |Y1 1 1( )n np xθ minus minusnabla |Y 1 1 1( )n np xθ minus minus|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC Approximation of the Sensitivity Sample
Compute
We have
35
( )
( )( ) ( )
( ) ( )1 1( ) ( )
( )( ) ( )
1( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
( ) ( )| |
i ii in n n n
n ni in n n n
Nj
ni iji i i in n
n n n nN N Nj j j
n n nj j j
X Xq X Y q X Y
W W W
θ θ
θ θ
ξ ξα ρ
ρα ρβ
α α α
=
= = =
nabla= =
= = minussum
sum sum sum
Y Y
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 1
1~ | | |
Ni j j j j j
n n n n n n n n nj
X q Y q Y X W p Y Xθ θη ηminus minus minus=
= propsum
( )
( )
( )1
1
( ) ( )1
1
( ) ( )
( ) ( )
in
in
Ni
n n n nXi
Ni i
n n n n nXi
p x W x
p x W x
θ
θ
δ
β δ
=
=
=
nabla =
sum
sum
|Y
|Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Consider the following model
We have
We compare next the SMC estimate for N=1000 to its exact value given by the Kalman filter derivative (exact with cyan vs SMC with black)
36
1n n v n
n n w n
X X VY X W
φ σσminus= +
= +
v wθ φ σ σ=
1log ( )n np xθnabla |Y
1( )npθnabla Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Linear Gaussian Model Marginal of the Hahn Jordan of the joint (top) and Hahn Jordan of the
marginal (bottom)
37
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Consider the following model
We have We use SMC with N=1000 for batch and on-line estimation
For Batch Estimation
For online estimation
38
1
exp2
n n v n
nn n
X X VXY W
φ σ
β
minus= +
=
vθ φ σ β=
11 1log ( )nn n n npθθ θ γ
minusminus= + nabla Y
1 11 1 1log ( )nn n n n np Yθθ θ γ
minusminus minus= + nabla |Y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Batch Parameter Estimation for Daily Exchange Rate PoundDollar
81-85
39
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Stochastic Volatility Model Recursive parameter estimation for SV model The estimates
converge towards the true values
40
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
SMC with MCMC for Parameter Estimation Given data y1n inference relies on
We have seen that SMC are rather inefficient to sample from
so we look here at an MCMC approach For a given parameter value θ SMC can estimate both
It will be useful if we can use SMC within MCMC to sample from
41
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1 1
1 1
| | |
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
=
x y y x y
y y
( ) ( )1 1 1|n n np and pθ θx y y
( )1 1|n np θ x ybull C Andrieu R Holenstein and GO Roberts Particle Markov chain Monte Carlo methods J R
Statist SocB (2010) 72 Part 3 pp 269ndash342
( )1| np θ y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy Using a Gibbs Sampling we can sample iteratively from
and
However it is impossible to sample from
We can sample from instead but convergence will be slow
Alternatively we would like to use Metropolis Hastings step to sample from ie sample and accept with probability
42
( )1 1| n np θ x y( )1 1| n np θx y
( )1 1| n np θx y
( )1 1 1 1| k n k k np x θ minus +y x x
( )1 1| n np θx y ( )1 1 1~ |n n nqX x y
( ) ( )( ) ( )
1 1 1 1
1 1 1 1
| |
| m n
|i 1 n n n n
n n n n
p q
p p
θ
θ
X y X y
X y X y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Gibbs Sampling Strategy We will use the output of an SMC method approximating as a proposal distribution
We know that the SMC approximation degrades an
n increases but we also have under mixing assumptions
Thus things degrade linearly rather than exponentially
These results are useful for small C
43
( )1 1| n np θx y
( )1 1| n np θx y
( )( )1 1 1( ) | i
n n n TV
Cnaw pN
θminus leX x yL
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Configurational Biased Monte Carlo The idea is related to that in the Configurational Biased MC Method
(CBMC) in Molecular simulation There are however significant differences
CBMC looks like an SMC method but at each step only one particle is selected that gives N offsprings Thus if you have done the wrong selection then you cannot recover later on
44
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
MCMC We use as proposal distribution and store
the estimate
It is impossible to compute analytically the unconditional law of a particle as it would require integrating out (N-1)n variables
The solution inspired by CBMC is as follows For the current state of the Markov chain run a virtual SMC method with only N-1 free particles Ensure that at each time step k the current state Xk is deterministically selected and compute the associated estimate
Finally accept with probability Otherwise stay where you are
45
D Frenkel G C A M Mooij and B Smit Novel scheme to study structural and thermal properties of continuously deformable molecules I Phys Condens Matter 3 (1991) 3053-3076
( )1 1 1~ | n n nNp θX x y
( )1 |nNp θy
1nX
( )1 |nNp θy1nX ( ) ( )
1
1
m|
in 1|nN
nN
pp
θθ
yy
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example Consider the following model
We take the same prior proposal distribution for both CBMC and
SMC Acceptance rates for CBMC (left) and SMC (right) are shown
46
( )
( ) ( )
11 2
12
1
1 25 8cos(12 ) ~ 0152 1
~ 0001 ~ 052
kk k k k
k
kn k k
XX X k V VX
XY W W X
minusminus
minus
= + + ++
= +
N
N N
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example We now consider the case when both variances are
unknown and given inverse Gamma (conditionally conjugate) vague priors
We sample from using two strategies The MCMC algorithm with N=1000 and the local EKF proposal as
importance distribution Average acceptance rate 043 An algorithm updating the components one at a time using an MH step
of invariance distribution with the same proposal
In both cases we update the parameters according to Both algorithms are run with the same computational effort
47
2 2v wσ σ
( )11000 11000 |p θx y
( )1 1 k k k kp x x x yminus +|
( )1500 1500| p θ x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Example MH one at a time missed the mode and estimates
If X1n and θ are very correlated the MCMC algorithm is very
inefficient We will next discuss the Marginal Metropolis Hastings
21500
21500
2
| 137 ( )
| 99 ( )
10
v
v
v
MH
PF MCMC
σ
σ
σ
asymp asymp minus =
y
y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Review of Metropolis Hastings Algorithm As a review of MH the proposal distribution is a Markov Chain with
kernel density 119902 119909 119910 =q(y|x) and target distribution 120587(119909)
It can be easily shown that and
under weak assumptions
Algorithm Generic Metropolis Hastings Sampler Initialization Choose an arbitrary starting value 1199090 Iteration 119905 (119905 ge 1)
1 Given 119909119905minus1 generate 2 Compute
3 With probability 120588(119909119905minus1 119909) accept 119909 and set 119909119905 = 119909 Otherwise reject 119909 and set 119909119905 = 119909119905minus1
( 1) )~ ( tx xq x minus
( ) ( )( 1)
( 1)( 1) 1
( ) (min 1
))
(
tt
t tx
xx q x xx
x xqπρ
π
minusminus
minus minus
=
( ) ~ ( )iX x as iπ rarr infin
( ) ( ) ( )Transition Kernel
x x K x x dxπ π= int
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Consider the target
We use the following proposal distribution
Then the acceptance probability becomes
We will use SMC approximations to compute
( ) ( ) ( )1 1 1 1 1| | |n n n n np p pθθ θ= x y y x y
( ) ( )( ) ( ) ( ) 1 1 1 1 | | |n n n nq q p
θθ θ θ θ=x x x y
( ) ( ) ( )( )( ) ( ) ( )( )( ) ( ) ( )( ) ( ) ( )
1 1 1 1
1 1 1 1
1
1
| min 1
|
min 1
|
|
|
|
n n n n
n n n n
n
n
p q
p q
p p q
p p qθ
θ
θ
θ
θ θ
θ θ
θ
θ θ θ
θ θ
=
x y x x
x y x x
y
y
( ) ( )1 1 1 |n n np and pθ θy x y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm Step 1
Step 2
Step 3 With probability
( ) ( )
( ) ( )
1 1( 1)
1 1 1
( 1) ( 1)
~ | ( 1)
|
n ni
n n n
Given i i p
Sample q i
Run an SMC to obtain p p
θ
θθ
θ
θ θ θ
minusminus minus
minus
X y
y x y
( )1 1 1 ~ |n n nSample pθX x y
( ) ( ) ( ) ( ) ( ) ( )
1
1( 1)
( 1) |
( 1) | ( 1min 1
)n
ni
p p q i
p p i q iθ
θ
θ θ θ
θ θ θminus
minus minus minus
y
y
( ) ( ) ( ) ( )
1 1 1 1( )
1 1 1 1( ) ( 1)
( ) ( )
( ) ( ) ( 1) ( 1)
n n n ni
n n n ni i
Set i X i p X p
Otherwise i X i p i X i p
θ θ
θ θ
θ θ
θ θ minus
=
= minus minus
y y
y y
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Marginal Metropolis Hastings Algorithm This algorithm (without sampling X1n) was proposed as an
approximate MCMC algorithm to sample from p (θ | y1n) in (Fernandez-Villaverde amp Rubio-Ramirez 2007)
When Nge1 the algorithm admits exactly p (θ x1n | y1n) as invariant distribution (Andrieu D amp Holenstein 2010) A particle version of the Gibbs sampler also exists
The higher N the better the performance of the algorithm N scales roughly linearly with n
Useful when Xn is moderate dimensional amp θ high dimensional Admits the plug and play property (Ionides et al 2006)
Jesus Fernandez-Villaverde amp Juan F Rubio-Ramirez 2007 On the solution of the growth model
with investment-specific technological change Applied Economics Letters 14(8) pages 549-553 C Andrieu A Doucet R Holenstein Particle Markov chain Monte Carlo methods Journal of the Royal
Statistical Society Series B (Statistical Methodology) Volume 72 Issue 3 pages 269ndash342 June 2010 IONIDES E L BRETO C AND KING A A (2006) Inference for nonlinear dynamical
systems Proceedings of the Nat Acad of Sciences 103 18438-18443 doi Supporting online material Pdf and supporting text
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras) 53
OPEN PROBLEMS
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Open Problems Thus SMC can be successfully used within MCMC
The major advantage of SMC is that it builds automatically efficient
very high dimensional proposal distributions based only on low-dimensional proposals
This is computationally expensive but it is very helpful in challenging problems
Several open problems remain Developing efficient SMC methods when you have E = R10000 Developing a proper SMC method for recursive Bayesian parameter
estimation Developing methods to avoid O(N2) for smoothing and sensitivity Developing methods for solving optimal control problems Establishing weaker assumptions ensuring uniform convergence results
54
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-
Bayesian Scientific Computing Spring 2013 (N Zabaras)
Other Topics In standard SMC we only sample Xn at time n and we donrsquot allow
ourselves to modify the past
It is possible to modify the algorithm to take this into account
55
bull Arnaud DOUCET Mark BRIERS and Steacutephane SEacuteNEacuteCAL Efficient Block Sampling Strategies for Sequential Monte Carlo Methods Journal of Computational and Graphical Statistics Volume 15 Number 3 Pages 1ndash19
- Slide Number 1
- Contents
- References
- References
- References
- References
- References
- Slide Number 8
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- What about if all distributions pn nge1 are defined on X
- SMC For Static Models
- SMC For Static Models
- SMC For Static Models
- Algorithm SMC For Static Problems
- SMC For Static Models Choice of L
- Slide Number 18
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Online Bayesian Parameter Estimation
- Artificial Dynamics for q
- Using MCMC
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- SMC with MCMC for Parameter Estimation
- Recursive MLE Parameter Estimation
- Robbins-Monro Algorithm for MLE
- Importance Sampling Estimation of Sensitivity
- Degeneracy of the SMC Algorithm
- Degeneracy of the SMC Algorithm
- Marginal Importance Sampling for Sensitivity Estimation
- Marginal Importance Sampling for Sensitivity Estimation
- SMC Approximation of the Sensitivity
- Example Linear Gaussian Model
- Example Linear Gaussian Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- Example Stochastic Volatility Model
- SMC with MCMC for Parameter Estimation
- Gibbs Sampling Strategy
- Gibbs Sampling Strategy
- Configurational Biased Monte Carlo
- MCMC
- Example
- Example
- Example
- Review of Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Marginal Metropolis Hastings Algorithm
- Slide Number 53
- Open Problems
- Other Topics
-