sampling methods -- iihic/8803-fall-09/slides/8803-09-lec18.pdf · henrik i. christensen (rim@gt)...

Introduction MCMC Gibbs Sampling Slice Sampling Hybrid MC Summary

Sampling Methods – II

Henrik I. Christensen

Robotics & Intelligent Machines @ GTGeorgia Institute of Technology,

Atlanta, GA [email protected]

Henrik I. Christensen (RIM@GT) Sampling Methods – II 1 / 23

mailto:[email protected]


Outline

1 Introduction

2 Markov Chain Monte Carlo

3 Gibbs Sampling

4 Slice Sampling

5 Hybrid Monte-Carlo

6 Summary



Introduction

Last time we talked about sampling methods

Generation of distribution estimates based on sampling of the inputspace

Discussed rejection and importance sampling

A problem is typically rejection rates and generalization to higherdimensionality spaces

Today discussion of methods that generalizes to higher dimensionalspaces.



Outline

1 Introduction


3 Gibbs Sampling

4 Slice Sampling


6 Summary



Markov Chain Monte Carlo

We will sample a proposed distribution

We will maintain a record of samples - z(τ) and the proposaldistribution q(z |z(τ))

Assume we have p(z)/p̃(z)/Zp

Assume we can evaluate p̃(z)

Generate a candidate sample z∗ and accept if a criteria is satisfied.



Metropolis Algorithm

Assume q(zA|zB) = q(zB |zA)

Acceptance criteria is then

A(z∗, z(τ)) = min

(1,

p̃(z∗)

p̃(z(τ))

)Generate a random number - u ∈ (0, 1)

Update

z(τ+1) =

{z∗ if A(z∗, z(τ)) > u

z(τ) otherwise

Ie. if a new update is better than the old one use it or stick to theearlier estimate

The basic Monte Carlo is a limited random walk and as such not overefficient



Markov Chains

Assume we have a series of random variables - z(1), z(2), z(3), ..., z(M)

First order Markov Chain is defined by conditional independence

p(z(m+1)|z(1), z(2), ..., z(m)) = p(z(m+1)|z(m))

The marginal probability is then given by the transition probabilitiesand the initial prior

p(z(m+1)) =∑z(m)

p(zm+1|z(m))p(z(m))



Markov Chain Properties

A MC is called homogeneous when all p(.|.) are the same

A distribution is invariant/stationary if the distribution remainsinvariant i.e.

p∗(z) =∑z ′

p(z |z ′)p∗(z ′)

A condition for ensuring invariance is that the transition probabilitiesare detail balanced:

p∗(z)p(z ′|z) = p∗(z ′)p(z |z ′)

We require that the desired distribution is invariant and converges tothis distribution as m →∞The property is called ergodicity and the final distribution is termedthe equilibrium



Outline

1 Introduction


3 Gibbs Sampling

4 Slice Sampling


6 Summary



Gibbs Sampling

Gibbs Sampling a widely applicable MCMC algorithm

Consider a distribution p(z) = p(z1, z2, ..., zM)

In each step one of the variables is optimized conditioned on theother variables.

Example - Consider p(z1, z2, z3)

Optimized by consideration /sampling of

p(z1|z(τ)2 , z

(τ)3 ) p(z2|z(τ)

1 , z(τ)3 ) p(z3|z(τ)

1 , z(τ)2 )

Continue until convergence



Gibbs Example

z1

z2

L

l



Gibbs Sampling in Graphical Models

Initialize variables in parent tree and traverse tree/graph



Outline

1 Introduction


3 Gibbs Sampling

4 Slice Sampling


6 Summary



Slice Sampling

Metropolis is sensitive to sampling step size

Slice sampling combines sampling to explore step size.

p̃(z)

z(τ) z

u

(a)

p̃(z)

z(τ) z

uzmin zmax

(b)



Outline

1 Introduction


3 Gibbs Sampling

4 Slice Sampling


6 Summary



Hybrid Monte-Carlo

The Metropolis algorithm has step size issues

Introduction of a method with adaptive step size and low reject rates

Adoption of a dynamic systems approach to optimization



Dynamical Systems

In physics the Hamiltonian expresses the total energy of a system

If we consider a particle in motion we have momentum described as

r =dz

dτ

We describe the space of derivative/state as the phase space

We can rewrite the probability as

p(z) = 1/Zp exp(−E (z))

Acceleration / rate of change is defined as

dr

dτ= −∂E (z)

∂z

Kinetic energy is k(r) = 1/2||r ||2



Hamiltonian model

The Hamiltonian is then

H(z , r) = E (z) + K (r)

The coupled systems is then

dzi

dτ=

∂H

∂ridridτ

= −∂H

∂zi



Hamiltonian model

The Hamiltonian is constant energy but can trade-off z and r

We can control the motion of the dynamic system. As an example rcould be drawn as a sample from p(z).

In reality this is parallel to Newton - Rapson optimization wheregradient information is used to control step size.



Leapfrog Discritization

Discretization - alternative variables

ri (τ + ε/2) = ri (τ)− ε

2

∂E

∂zi(z(τ))

zi (τ + ε) = zi (τ) + εrI (τ + ε/2)

ri (τ + ε) = ri (τ + ε/2)− ε

2

∂E

∂zi(z(τ + ε))



Hybrid Monte-Carlo

Consider a state (z , r) and a updated state of (z∗, r∗)

We could then accept the candidate when

min(1, exp(H(z , r)− H(z∗, r∗)))

Given the hamiltonian is supposed to be constant a strategy is tomake a ’random’ change before the leapfrog integration and thenconsider the update.



Outline

1 Introduction


3 Gibbs Sampling

4 Slice Sampling


6 Summary



Summary

MCMC is about tracking of state during sampling

How can we use current estimates to update variables as iterativeupdating

Consideration of strategies to update

Metropolis - basic random walkSlicing - a way to update step sizesGibbs Sampling - stepwise updatingHybrid MCMC - a way to integrate gradient information


sampling methods -- iihic/8803-fall-09/slides/8803-09-lec18.pdf · henrik i. christensen (rim@gt)...

Documents