langevin-type sampling methods - github pages · 3. hamiltonian monte carlo (hmc) neal, radford m....

35
Based on summer literature review School of Physics, Peking University Ziming Liu Advisor: Zheng Zhang, UCSB ECE 1 Langevin-type sampling methods

Upload: others

Post on 05-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Based on summer literature review

School of Physics, Peking University

Ziming Liu

Advisor: Zheng Zhang, UCSB ECE1

Langevin-type sampling methods

Page 2: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

The big party

2

ComputerScience Math

Physics

Robustness

QuantumField

Theory

StatisticalPhysics

QuantumInformation

Information Theory

Tensor Network

QuantumDynamics

TensorAnalysis

PartialDifferentialEquation

The goal of this talk

Complexity

MachineLearning

Hamiltonian Monte CarloRenormalization Group

Ising model

Matrix product state

Tensor trainTensorized neural network

Schrodinger equationKohn-sham equationentanglementClassical-quantum correspondence

Quantum-inspired Hamiltonian Monte Carlo

Page 3: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Overview

3

1. Introduction to Bayesian models

2. Introduction to Langevin dynamics (1st,2nd,3rd-order)

3. Hamiltonian Monte Carlo (HMC)

Page 4: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

4

1. Introduction to Bayesian models

Page 5: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Maxwell-Boltzmann distribution

5

Description: for isothermal systemSingle-particle system (temperature 𝑇)

For state 𝑥, energy 𝐸(𝑥), then probability density 𝑝 𝑥 ∝ exp(−𝐸 𝑥

𝑘𝐵𝑇)

Theory

• Maximum entropy principle for isolated system (canonical ensemble)

• Minimum free energy principle for isothermal system

Ideal gas 𝐸 =𝑞2

2𝑚𝑝 𝑞 ∝ exp(−

𝑞2

2𝑚𝑘𝐵𝑇)

Static isothermal atmosphere

𝐸 = 𝑈(𝑥) 𝑝 𝑥 ∝ exp(−𝑈(𝑥)

𝑘𝐵𝑇)

Gas ina well

𝐸 = 𝑈 𝑥 +𝑞2

2𝑚 𝑝 𝑥, 𝑞 ∝ exp(−𝑈 𝑥 +

𝑞2

2𝑚𝑘𝐵𝑇

)

Link between pdf and energy func

Page 6: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Bayesian model

6

What ?

𝑝 𝜃 𝐷 ∝ 𝑝 𝐷 𝜃 𝑝(𝜃)

posterior likelihood prior

𝜃: model parameters

Link to regression models

𝑝 𝜃 𝐷 = exp −𝑈 𝜃

𝑈 𝜃 = − log 𝑝 𝐷 𝜃 − log(𝑝 𝜃 )

Regression error Regularization term

𝜃∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑈(𝜃)𝜃

𝜃∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝(𝜃|𝐷)𝜃

Maximum Posterior EstimationGlobal Optima

Page 7: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Bayesian model

7

Why ?

𝜃∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑈(𝜃)𝜃

𝜃∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝(𝜃|𝐷)𝜃

Maximum a Posterior EstimationGlobal Optima

Limitations of MAP1 No uncertainty quantification (point estimation)2 Risk of overfitting !

MAP Bayesian

Page 8: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

8

https://zhuanlan.zhihu.com/p/33563623

Page 9: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Markov Chain Monte Carlo (MCMC)

9

𝑃𝑠 𝜃 = 𝑃(𝜃|𝐷)Steady distribution Posterior distribution

Bayesian modelMarkov Chain

Given a steady distribution, how to construct a Markov Chain?

Steady distribution Detailed balance

(Simplified to)

1

2 3

Detailed balance:

1

2 3

Steady distributionBut no db:

Page 10: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Detailed balance

10

𝑁1 𝑜𝑟 𝑝1 𝑁2 𝑜𝑟 𝑝2

𝑄(1 → 2) =1/2

𝑄(2 → 1) =1/2

Page 11: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Metropolis-Hastings (MH)

11

The MH algorithm for sampling from a target distribution 𝑝(𝑥), using transition kernel 𝑄, consists of the following steps:• For 𝑡 = 1, 2,⋯

• Sample 𝑦 from 𝑄(𝑦|𝑥𝑡). Think of 𝑦 as a proposed value of 𝑥𝑡+1.• Compute acceptance probability

𝐴(𝑥𝑡 → 𝑦) = min(1,𝑝 𝑦 𝑄(𝑥𝑡|𝑦)

𝑝 𝑥𝑡 𝑄(𝑦|𝑥𝑡))

• With probability A accept the proposed value, and set 𝑥𝑡+1 = 𝑦.Otherwise, set 𝑥𝑡+1 = 𝑥𝑡.

𝑥𝑡 𝑦

𝑝(𝑥): number of particles at state 𝑥

𝑄(𝑦|𝑥): transition rate from 𝑥 to 𝑦

𝑝(𝑦)

𝑝(𝑥𝑡)𝑄(𝑦|𝑥𝑡)

𝑄(𝑥𝑡|𝑦)

𝑝(𝑥)𝑄(𝑦|𝑥): number of particles transitfrom 𝑥 to 𝑦

𝑝 𝑥 𝑄 𝑦 𝑥 𝐴(𝑥 → 𝑦): number of accepted particles transit from 𝑥 to 𝑦

=

Detailedbalance

Page 12: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Metropolis algorithmThe Metropolis algorithm for sampling from a target distribution 𝑝(𝑥),transit through random walk, consists of the following steps:• For 𝑡 = 1, 2,⋯

• Random walk to 𝑦 from 𝑥𝑡. Think of 𝑦 as a proposed value of 𝑥𝑡+1.• Compute acceptance probability

𝐴(𝑥𝑡 → 𝑦) = min(1,𝑝 𝑦

𝑝 𝑥𝑡)

• With probability A accept the proposed value, and set 𝑥𝑡+1 = 𝑦.Otherwise, set 𝑥𝑡+1 = 𝑥𝑡.

12

Comment: random walk is symmetric, so 𝑄 𝑦 𝑥 = 𝑄(𝑥|𝑦)

In thermodynamics models, 𝑃 𝑥 ∼ exp −𝑈 𝑥 /𝑇 (Boltzmann distribution)

𝑈(𝑥)

Step sizelarge: low acceptance rate

small: correlated & not independent

Page 13: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

13

2. Introduction toLangevin Dynamics

Page 14: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Zoo of Langevin dynamics

14

Stochastic Gradient Langevin Dynamics (cite=718)

Stochastic Gradient Hamiltonian Monte Carlo (cite=300)

Stochastic sampling using Nose-Hoover thermostat (cite=140)

Stochastic sampling using Fisher information (cite=207)

Welling, Max, and Yee W. Teh. "Bayesian learning via

stochastic gradient Langevin dynamics." Proceedings

of the 28th international conference on machine

learning (ICML-11). 2011.

Chen, Tianqi, Emily Fox, and Carlos Guestrin.

"Stochastic gradient hamiltonian monte

carlo." International conference on machine

learning. 2014.

Ding, Nan, et al. "Bayesian sampling using

stochastic gradient thermostats." Advances in

neural information processing systems. 2014.

Ahn, Sungjin, Anoop Korattikara, and Max Welling.

"Bayesian posterior sampling via stochastic gradient

Fisher scoring." arXiv preprint arXiv:1206.6380 (2012).

1storder, general

1storder, gaussian

2ndorder

3rdorder

Page 15: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

1st order Langevin dynamics

15

(also known as Brownian motion or Wiener Process)

𝑑𝑥 = −∇ 𝑓 𝑥 𝑑𝑡 + 𝛽−12𝑑𝑊(𝑡) 𝜌𝑠 ∝ exp(−𝛽𝑓(𝑥))

Energy function (bayesian) / loss function (optimization)

m

The properties of the mediumA heat bath (temperature 𝑻)Hit the ball every 𝑡0 (憋大招)

transfer momentum 𝑝 ∼ 𝑒𝑥𝑝(−𝑝2

2𝑇𝑡0)

Overdamped (coefficient 𝜸 large)Small relaxation time

Physical Intuition𝑓 𝑥 = 𝑐𝑜𝑛𝑠𝑡

①The ball gains a momentum 𝑝 fromparticles (fluctuating) around it.

② It travels in the damping medium𝑚 ሷ𝑥 = −𝛾 ሶ𝑥

→ ሶ𝑥 =𝑝

𝑚exp −

𝛾

𝑚𝑡 , 𝑥 =

𝑝

𝛾(1 − exp(−

𝛾

𝑚𝑡))

③Overdamped condition, then exp −𝛾

𝑚𝑡0 → 0.

So at time 𝑡, the total displacement is 𝑝

𝛾∝ 𝑝.

i.e. 𝑑𝑥 ∝1

𝛾exp −

𝑝2

2𝑇𝑡0∝ 𝑇𝑑𝑊(𝑡0) (𝛽 =

1

𝑇)

Page 16: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

2nd order Langevin dynamics

16

𝑑 Ԧ𝑥 = Ԧ𝑝𝑑𝑡

𝑑 Ԧ𝑝 = −∇𝑓 Ԧ𝑥 𝑑𝑡 − 𝐴 Ԧ𝑝𝑑𝑡 + 2𝐴𝑇𝑑𝑊

𝑓( Ԧ𝑥)

Ԧ𝑝

ConservativeForce

DampingForce

Thermal“Force”

𝑃𝑠 Ԧ𝑥, Ԧ𝑝 ∝ exp(−(𝑝2

2+ 𝑈( Ԧ𝑥))/𝑇)

Invariant measure:

Page 17: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

17

𝜙𝑡

Fokker Planck Eq for 2nd order LD

One-dim random walk (不变原理)

𝜙𝑥 𝜙𝑥+1𝜙𝑥−1

𝑡

𝑡 + 1𝜕𝜙

𝜕𝑡=1

2𝜙𝑥−1 + 𝜙𝑥+1 − 2𝜙𝑥 ≈

1

2

𝜕2𝜙

𝜕𝑥2

Dynamical Equations

Fokker-Planck Equations

Page 18: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

18

𝑑 Ԧ𝜃 = Ԧ𝑝𝑑𝑡

𝑑 Ԧ𝑝 = −∇𝑈 Ԧ𝜃 𝑑𝑡 − 𝜁 Ԧ𝑝𝑑𝑡 + 2𝐴𝑇𝑑𝑊

𝑑𝜁 =𝑝2

𝑛− 𝑇0 𝑑𝑡

When 𝒑↑ → 𝒑𝟐

𝒏∼ 𝑻 > 𝑻𝟎 → 𝜻 ↑ →more friction on 𝒑 → 𝒑 ↓

Negative feedback loop

3rd order Langevin dynamics (special)

Thermal term (thermostat)

Page 19: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

3rd order Langevin dynamics (general)

19

𝑑𝑞 = 𝑀−1𝑑𝑝

𝑑𝑝 = −∇𝑈 𝑞 𝑑𝑡 + 𝜎𝐹 Δ𝑡𝑀12𝑑𝑊 − 𝜁𝑝𝑑𝑡 + 𝜎𝐴𝑀

12𝑑𝑊

𝑑𝜁 =1

𝜇𝑝𝑇𝑀−1𝑝 − 𝑁𝑑𝑘𝐵𝑇 𝑑𝑡 − 𝛾𝜁𝑑𝑡 + 2𝑘𝐵𝑇𝛾𝑑𝑊

Invariant measure: exp −𝛽𝑈(𝑞) exp(−𝛽𝑝𝑇𝑀−1𝑝/2) exp −𝜇 𝜁 − ො𝛾 2/2

ො𝛾 = 𝛽(𝜎𝐹2 + 𝜎𝐴

2)/2

Page 20: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

20

Slides from: https://ergodic.org.uk/~bl/Data/Slides/MD4.pdf

Page 21: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

3rd order Langevin dynamics

21

𝑑𝑞 = 𝑀−1𝑑𝑝

𝑑𝑝 = −∇𝑈 𝑞 𝑑𝑡 + 𝜎𝐹 Δ𝑡𝑀12𝑑𝑊 − 𝜁𝑝𝑑𝑡 + 𝜎𝐴𝑀

12𝑑𝑊

𝑑𝜁 =1

𝜇𝑝𝑇𝑀−1𝑝 − 𝑁𝑑𝑘𝐵𝑇 𝑑𝑡 − 𝛾𝜁𝑑𝑡 + 2𝑘𝐵𝑇𝛾𝑑𝑊

Thermostat

exp −𝛽𝑈(𝑞) exp(−𝛽𝑝𝑇𝑀−1𝑝/2)Hamiltonian dynamics

exp(−𝛽𝑝𝑇𝑀−1𝑝/2)exp(−𝜇𝜁2/2)

OU process for 𝜁 exp(−𝜇𝜁2/2)

Noise for 𝑝 exp −𝜇𝜁2

2→ exp −

𝜇(𝜁 − ො𝛾)2

2

Invariant measure: exp −𝛽𝑈(𝑞) exp(−𝛽𝑝𝑇𝑀−1𝑝/2) exp −𝜇 𝜁 − ො𝛾 2/2

ො𝛾 = 𝛽(𝜎𝐹2 + 𝜎𝐴

2)/2

Bu

ildin

g B

lock

s

Page 22: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

22

3. Hamiltonian Monte Carlo(HMC)

Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of

markov chain monte carlo 2.11 (2011): 2.

Betancourt, Michael. "A conceptual introduction to Hamiltonian Monte Carlo." arXiv preprint arXiv:1701.02434 (2017).

Page 23: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

2nd order Langevin dynamics

23

𝑑 Ԧ𝑥 = Ԧ𝑝𝑑𝑡

𝑑 Ԧ𝑝 = −∇𝑓 Ԧ𝑥 𝑑𝑡 − 𝐴 Ԧ𝑝𝑑𝑡 + 2𝐴𝑇𝑑𝑊

𝑓( Ԧ𝜃)

Ԧ𝑝

ConservativeForce

DampingForce

Thermal“Force”

𝑃𝑠 Ԧ𝑥, Ԧ𝑝 ∝ exp(−(𝑝2

2+ 𝑈( Ԧ𝑥))/𝑇)

Invariant measure:

𝑑 Ԧ𝑥 = Ԧ𝑝𝑑𝑡

𝑑 Ԧ𝑝 = −∇𝑓 Ԧ𝑥 𝑑𝑡

𝐴 = 0

Page 24: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Hamiltonian dynamics

24

𝑑 Ԧ𝑥 = 𝑀−1 Ԧ𝑝𝑑𝑡

𝑑 Ԧ𝑝 = −∇𝑈( Ԧ𝑥)𝑑𝑡

Definition of momentum

Momentum theorem

𝑓( Ԧ𝑥): conservative force

𝐻 𝑥, 𝑝 =1

2𝑝𝑇𝑀−1𝑝 + 𝑈(𝑥)

Kinetic energy

Potential energy

𝑑𝐻 = 𝑝𝑇𝑀−1𝑑𝑝 + 𝑑𝑈 𝑥 = −𝑑𝑥𝑇∇𝑈 𝑥 + 𝑑𝑈 𝑥 = 0

Energy conservation

Hamiltonian

Hamiltonian equations

Page 25: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Steady distribution

25

𝑑 Ԧ𝑥 = 𝑀−1 Ԧ𝑝𝑑𝑡

𝑑 Ԧ𝑝 = −∇𝑈( Ԧ𝑥)𝑑𝑡

Hamiltonian Equations

Steady distribution:

𝑝𝑠 𝑥, 𝑞 ∝ exp −𝑈 Ԧ𝑥 −1

2𝑞𝑇𝑀−1𝑞

𝑝𝑠 𝑥 ∝ exp −𝑈 Ԧ𝑥 = 𝑝(𝑥|𝐷)

Fokker Planck Equation:

𝜕𝑡𝑝 +𝜕𝑝

𝜕𝑥

𝑇𝜕𝐻

𝜕𝑥+

𝜕𝑝

𝜕𝑞

𝑇𝜕𝐻

𝜕𝑞= 0

(Also known as Liouville’s Theorem in physics)

Page 26: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Example: 1d spring-mass system

26

𝐸 =1

2𝑘𝑥2 +

𝑞2

2𝑚

q

x

𝑥 = 𝐴𝑠𝑖𝑛(𝜔𝑡 + 𝜙0)

𝑞 = 𝑚𝜔𝐴𝑐𝑜𝑠(𝜔𝑡 + 𝜙0)

(𝜔 =𝑘

𝑚)

No ergodicity ?

Page 27: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Example: 1d spring-mass system interacting with a heat bath

27

①momentum resampling (𝑚 = 𝑘𝐵𝑇 = 1)𝑞 ∼ N(0,1)

②travel on an energy level for a certain time (L steps)

q

x① ①

Maxwell-Boltzmann distribution

𝑝 𝑞 ∝ exp(−𝑞2

2𝑚𝑘𝐵𝑇)

Ensemble Time

Page 28: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

2nd LD & HMC

28

q

x① ①

Continuous scattering Discrete scattering (憋大招)

Page 29: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Algorithm

29

Momentum resampling

Hamiltonian dynamics(Leap frog scheme)

Metropolis-Hastings

Page 30: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Euler vs leap frog

30

Leapfrog method

Euler’s method

𝑞 𝑡 + 𝜖 = 𝑞 𝑡 − 𝜖𝜕𝑈

𝜕𝑥(𝑥(𝑡))

𝑥 𝑡 + 𝜖 = 𝑥 𝑡 + 𝜖𝑞(𝑡)

𝑚

𝑞 𝑡 +𝜖

2= 𝑞 𝑡 −

𝜖

2

𝜕𝑈

𝜕𝑥(𝑥(𝑡))

𝑞 𝑡 + 𝜖 = 𝑞 𝑡 +𝜖

2−𝜖

2

𝜕𝑈

𝜕𝑥(𝑥(𝑡 + 𝜖))

𝑥 𝑡 + 𝜖 = 𝑥 𝑡 + 𝜖𝑞(𝑡 +

𝜖2)

𝑚

𝑥(𝑡 + 𝜖)

𝑞(𝑡 + 𝜖)

1𝜖

𝑚

−𝑘𝜖

𝑚1

𝑥(𝑡)

𝑞(𝑡)=

𝑥(𝑡 + 𝜖)

𝑞(𝑡 + 𝜖)=

1 0

−𝑘𝜖

2𝑚1

1𝜖

𝑚

0 1

1 0

−𝑘𝜖

2𝑚1

𝑥(𝑡)

𝑞(𝑡)×

e.g. 𝑈 𝑥 =1

2𝑘𝑥2

det>1, not preserving volume !

det=1, preserving volume !

Page 31: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

Euler vs leap frog

31

Euler’s method : diverge Leapfrog : stable

Page 32: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

MCMC & HMC

32

𝑈(𝑥)

Random walk MCMC: position 𝑥

𝑈(𝑥)

HMC: position 𝑥 + momentum 𝑞

𝑃 𝑥 ∼ exp −𝑈 𝑥 /𝑇

𝑃 𝑥 ∼ exp −(𝑈 𝑥 + 𝐾 𝑞 )/𝑇

Hamiltonian dynamics → energy conservation

Always accept ! (if step size→0)

}

Page 33: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

MCMC & HMC

33

Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of

markov chain monte carlo 2.11 (2011): 2.

Page 34: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

HMC limitations

34

①Ill-conditioned distributions ②Multimodal distributions

③Discontinuous ④spiky

⑤Large training dataset

Need different massesin different directions hard to escape from one mode

Large energy gapAcceptance rate low

Large gradientsAcceptance rate low

Expensive gradients computation

Page 35: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):

HMC variants

35

Riemannian HMC

Girolami, Mark, Ben Calderhead, and

Siu A. Chin. "Riemannian manifold

hamiltonian monte carlo." arXiv

preprint arXiv:0907.1100 (2009).

Magnetic HMC

Tripuraneni, Nilesh, et al. "Magnetic

Hamiltonian Monte Carlo." Proceedings

of the 34th International Conference on

Machine Learning-Volume 70. JMLR.

org, 2017.

Wormhole HMC

Lan, Shiwei, Jeffrey Streets, and Babak

Shahbaba. "Wormhole hamiltonian

monte carlo." Twenty-Eighth AAAI

Conference on Artificial Intelligence.

2014.

Continuous tempered HMC

Graham, Matthew M., and Amos J.

Storkey. "Continuously tempered

hamiltonian monte carlo." arXiv preprint

arXiv:1704.03338 (2017).

Stochastic Gradient HMC

Chen, Tianqi, Emily Fox, and Carlos

Guestrin. "Stochastic gradient hamiltonian

monte carlo." International conference on

machine learning. 2014.

Stochastic Gradient Thermostat

Ding, Nan, et al. "Bayesian sampling

using stochastic gradient

thermostats." Advances in neural

information processing systems. 2014.

Relativistic Monte Carlo

Lu, Xiaoyu, et al. "Relativistic

monte carlo." arXiv preprint

arXiv:1609.04388 (2016).

Optics HMC

Afshar, Hadi Mohasel, and Justin

Domke. "Reflection, refraction, and

hamiltonian monte carlo." Advances

in Neural Information Processing

Systems. 2015.