a short history of mcmc

82
A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Christian P. Robert and George Casella Universit´ e Paris-Dauphine, IuF, & CRESt and University of Florida April 2, 2011

Upload: christian-robert

Post on 11-May-2015

5.218 views

Category:

Education


4 download

DESCRIPTION

Talk given in Bristol, April 02, 2011, at the Julian Besag's memorial

TRANSCRIPT

Page 1: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

A Short History of Markov Chain Monte Carlo:Subjective Recollections from Incomplete Data

Christian P. Robert and George Casella

Universite Paris-Dauphine, IuF, & CREStand University of Florida

April 2, 2011

Page 2: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

In memoriam, Julian Besag, 1945–2010

Page 3: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Introduction

Introduction

Markov Chain Monte Carlo (MCMC) methods around for almostas long as Monte Carlo techniques, even though impact onStatistics not been truly felt until the late 1980s / early 1990s .

Contents: Distinction between Metropolis-Hastings basedalgorithms and those related with Gibbs sampling, and brief entryinto “second-generation MCMC revolution”.

Page 4: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Introduction

Introduction

Markov Chain Monte Carlo (MCMC) methods around for almostas long as Monte Carlo techniques, even though impact onStatistics not been truly felt until the late 1980s / early 1990s .

Contents: Distinction between Metropolis-Hastings basedalgorithms and those related with Gibbs sampling, and brief entryinto “second-generation MCMC revolution”.

Page 5: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Introduction

A few landmarks

Realization that Markov chains could be used in a wide variety ofsituations only came to “mainstream statisticians” with Gelfandand Smith (1990) despite earlier publications in the statisticalliterature like Hastings (1970) and growing awareness in spatialstatistics (Besag, 1986)

Several reasons:

I lack of computing machinery

I lack of background on Markov chains

I lack of trust in the practicality of the method

Page 6: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Introduction

A few landmarks

Realization that Markov chains could be used in a wide variety ofsituations only came to “mainstream statisticians” with Gelfandand Smith (1990) despite earlier publications in the statisticalliterature like Hastings (1970) and growing awareness in spatialstatistics (Besag, 1986)Several reasons:

I lack of computing machinery

I lack of background on Markov chains

I lack of trust in the practicality of the method

Page 7: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Bombs before the revolution

Monte Carlo methods born in LosAlamos, New Mexico, during WWII,mostly by physicists working on atomicbombs and eventually producing theMetropolis algorithm in the early1950’s.

[Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]

Page 8: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Bombs before the revolution

Monte Carlo methods born in LosAlamos, New Mexico, during WWII,mostly by physicists working on atomicbombs and eventually producing theMetropolis algorithm in the early1950’s.

[Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]

Page 9: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Monte Carlo genesisMonte Carlo method usually traced toUlam and von Neumann:

I Stanislaw Ulam associates ideawith an intractable combinatorialcomputation attempted in 1946about “solitaire”

I Idea was enthusiastically adoptedby John von Neumann forimplementation on neutrondiffusion

I Name “Monte Carlo“ beingsuggested by Nicholas Metropolis

[Eckhardt, 1987]

Page 10: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Monte Carlo genesisMonte Carlo method usually traced toUlam and von Neumann:

I Stanislaw Ulam associates ideawith an intractable combinatorialcomputation attempted in 1946about “solitaire”

I Idea was enthusiastically adoptedby John von Neumann forimplementation on neutrondiffusion

I Name “Monte Carlo“ beingsuggested by Nicholas Metropolis

[Eckhardt, 1987]

Page 11: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Monte Carlo genesisMonte Carlo method usually traced toUlam and von Neumann:

I Stanislaw Ulam associates ideawith an intractable combinatorialcomputation attempted in 1946about “solitaire”

I Idea was enthusiastically adoptedby John von Neumann forimplementation on neutrondiffusion

I Name “Monte Carlo“ beingsuggested by Nicholas Metropolis

[Eckhardt, 1987]

Page 12: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Monte Carlo genesisMonte Carlo method usually traced toUlam and von Neumann:

I Stanislaw Ulam associates ideawith an intractable combinatorialcomputation attempted in 1946about “solitaire”

I Idea was enthusiastically adoptedby John von Neumann forimplementation on neutrondiffusion

I Name “Monte Carlo“ beingsuggested by Nicholas Metropolis

[Eckhardt, 1987]

Page 13: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Monte Carlo with computers

Very close “coincidence” withappearance of very firstcomputer, ENIAC, born Feb.1946, on which von Neumannimplemented Monte Carlo in1947

Same year Ulam and von Neumann (re)invented inversion andaccept-reject techniquesIn 1949, very first symposium on Monte Carlo and very first paper

[Metropolis and Ulam, 1949]

Page 14: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Monte Carlo with computers

Very close “coincidence” withappearance of very firstcomputer, ENIAC, born Feb.1946, on which von Neumannimplemented Monte Carlo in1947

Same year Ulam and von Neumann (re)invented inversion andaccept-reject techniquesIn 1949, very first symposium on Monte Carlo and very first paper

[Metropolis and Ulam, 1949]

Page 15: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

The Metropolis et al. (1953) paper

Very first MCMC algorithm associatedwith the second computer, MANIAC,Los Alamos, early 1952.

Besides Metropolis, Arianna W.Rosenbluth, Marshall N. Rosenbluth,Augusta H. Teller, and Edward Tellercontributed to create the Metropolisalgorithm...

Page 16: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Motivating problem

Computation of integrals of the form

I =

∫F (p, q) exp−E(p, q)/kTdpdq∫

exp−E(p, q)/kTdpdq,

with energy E defined as

E(p, q) =1

2

N∑i=1

N∑j=1

j 6=i

V (dij),

and N number of particles, V a potential function and dij thedistance between particles i and j.

Page 17: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Boltzmann distribution

Boltzmann distribution exp−E(p, q)/kT parameterised bytemperature T , k being the Boltzmann constant, with anormalisation factor

Z(T ) =

∫exp−E(p, q)/kTdpdq

not available in closed form.

Page 18: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Computational challenge

Since p and q are 2N -dimensional vectors, numerical integration isimpossible

Plus, standard Monte Carlo techniques fail to correctlyapproximate I: exp−E(p, q)/kT is very small for mostrealizations of random configurations (p, q) of the particle system.

Page 19: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Metropolis algorithm

Consider a random walk modification of the N particles: for each1 ≤ i ≤ N , values

x′i = xi + αξ1i and y′i = yi + αξ2i

are proposed, where both ξ1i and ξ2i are uniform U(−1, 1). Theenergy difference between new and previous configurations is ∆Eand the new configuration is accepted with probability

1 ∧ exp−∆E/kT ,

and otherwise the previous configuration is replicated∗

∗counting one more time in the average of the F (pt, pt)’s over the τ movesof the random walk.

Page 20: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Convergence

Validity of the algorithm established by proving

1. irreducibility

2. ergodicity, that is convergence to the stationary distribution.

Second part obtained via discretization of the space: Metropolis etal. note that the proposal is reversible, then establish thatexp−E/kT is invariant.Application to the specific problem of the rigid-sphere collisionmodel. The number of iterations of the Metropolis algorithmseems to be limited: 16 steps for burn-in and 48 to 64 subsequentiterations (that still required four to five hours on the Los AlamosMANIAC).

Page 21: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Convergence

Validity of the algorithm established by proving

1. irreducibility

2. ergodicity, that is convergence to the stationary distribution.

Second part obtained via discretization of the space: Metropolis etal. note that the proposal is reversible, then establish thatexp−E/kT is invariant.

Application to the specific problem of the rigid-sphere collisionmodel. The number of iterations of the Metropolis algorithmseems to be limited: 16 steps for burn-in and 48 to 64 subsequentiterations (that still required four to five hours on the Los AlamosMANIAC).

Page 22: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Convergence

Validity of the algorithm established by proving

1. irreducibility

2. ergodicity, that is convergence to the stationary distribution.

Second part obtained via discretization of the space: Metropolis etal. note that the proposal is reversible, then establish thatexp−E/kT is invariant.Application to the specific problem of the rigid-sphere collisionmodel. The number of iterations of the Metropolis algorithmseems to be limited: 16 steps for burn-in and 48 to 64 subsequentiterations (that still required four to five hours on the Los AlamosMANIAC).

Page 23: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Physics and chemistry

The method of Markov chain Monte Carlo immediatelyhad wide use in physics and chemistry.

[Geyer & Thompson, 1992]

I Hammersley and Handscomb, 1967

I Piekaar and Clarenburg, 1967

I Kennedy and Kutil, 1985

I Sokal, 1989

I &tc...

Page 24: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Physics and chemistry

Statistics has always been fuelled by energetic mining ofthe physics literature.

[Clifford, 1993]

I Hammersley and Handscomb, 1967

I Piekaar and Clarenburg, 1967

I Kennedy and Kutil, 1985

I Sokal, 1989

I &tc...

Page 25: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Hastings, 1970

A fair generalisation

In Biometrika 1970, Hastings defines MCMC methodology forfinite and reversible Markov chains, the continuous case beingdiscretised:

Generic acceptance probability for a move from state i to state j is

αij =sij

1 + πiπj

qijqji

,

where sij is a symmetric function.

Page 26: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Hastings, 1970

A fair generalisation

In Biometrika 1970, Hastings defines MCMC methodology forfinite and reversible Markov chains, the continuous case beingdiscretised:Generic acceptance probability for a move from state i to state j is

αij =sij

1 + πiπj

qijqji

,

where sij is a symmetric function.

Page 27: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Hastings, 1970

State of the art

Note

Generic form that encompasses both Metropolis et al. (1953) andBarker (1965).Peskun’s ordering not yet discovered: Hastings mentions that littleis known about the relative merits of those two choices (eventhough) Metropolis’s method may be preferable.Warning against high rejection rates as indicative of a poor choiceof transition matrix, but not mention of the opposite pitfall of lowrejection.

Page 28: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Hastings, 1970

What else?!Items included in the paper are

I a Poisson target with a ±1 random walk proposal,

I a normal target with a uniform random walk proposal mixed with itsreflection (i.e. centered at −X(t) rather than X(t)),

I a multivariate target where Hastings introduces Gibbs sampling,updating one component at a time and defining the composedtransition as satisfying the stationary condition because eachcomponent does leave the target invariant

I a reference to Erhman, Fosdick and Handscomb (1960) as apreliminary if specific instance of this Metropolis-within-Gibbssampler

I an importance sampling version of MCMC,

I some remarks about error assessment,

I a Gibbs sampler for random orthogonal matrices

Page 29: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Hastings, 1970

Three years later

Peskun (1973) compares Metropolis’ and Barker’s acceptanceprobabilities and shows (again in a discrete setup) that Metropolis’is optimal (in terms of the asymptotic variance of any empiricalaverage).

Proof direct consequence of Kemeny and Snell (1960) onasymptotic variance. Peskun also establishes that this variance canimprove upon the iid case if and only if the eigenvalues of P−Aare all negative, when A is the transition matrix corresponding tothe iid simulation and P the transition matrix corresponding to theMetropolis algorithm, but he concludes that the trace of P−A isalways positive.

Page 30: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Hastings, 1970

Three years later

Peskun (1973) compares Metropolis’ and Barker’s acceptanceprobabilities and shows (again in a discrete setup) that Metropolis’is optimal (in terms of the asymptotic variance of any empiricalaverage).

Proof direct consequence of Kemeny and Snell (1960) onasymptotic variance. Peskun also establishes that this variance canimprove upon the iid case if and only if the eigenvalues of P−Aare all negative, when A is the transition matrix corresponding tothe iid simulation and P the transition matrix corresponding to theMetropolis algorithm, but he concludes that the trace of P−A isalways positive.

Page 31: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Julian’s early works (1)

Early 1970’s, Hammersley, Clifford, and Besag were working on thespecification of joint distributions from conditional distributionsand on necessary and sufficient conditions for the conditionaldistributions to be compatible with a joint distribution.

Page 32: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Julian’s early works (1)

Early 1970’s, Hammersley, Clifford, and Besag were working on thespecification of joint distributions from conditional distributionsand on necessary and sufficient conditions for the conditionaldistributions to be compatible with a joint distribution.

[Hammersley and Clifford, 1971]

Page 33: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Julian’s early works (1)

Early 1970’s, Hammersley, Clifford, and Besag were working on thespecification of joint distributions from conditional distributionsand on necessary and sufficient conditions for the conditionaldistributions to be compatible with a joint distribution.

What is the most general form of the conditional probabilityfunctions that define a coherent joint function? And what will thejoint look like?

[Besag, 1972]

Page 34: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Hammersley-Clifford theorem

Theorem (Hammersley-Clifford)

Joint distribution of vector associated with a dependence graphmust be represented as product of functions over the cliques of thegraphs, i.e., of functions depending only on the componentsindexed by the labels in the clique.

[Cressie, 1993; Lauritzen, 1996]

Page 35: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Hammersley-Clifford theorem

Theorem (Hammersley-Clifford)

A probability distribution P with positive and continuous density fsatisfies the pairwise Markov property with respect to an undirectedgraph G if and only if it factorizes according to G , i.e., (F ) ≡ (G)

[Cressie, 1993; Lauritzen, 1996]

Page 36: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Hammersley-Clifford theorem

Theorem (Hammersley-Clifford)

Under the positivity condition, the joint distribution g satisfies

g(y1, . . . , yp) ∝p∏j=1

g`j (y`j |y`1 , . . . , y`j−1, y′`j+1

, . . . , y′`p)

g`j (y′`j|y`1 , . . . , y`j−1

, y′`j+1, . . . , y′`p)

for every permutation ` on 1, 2, . . . , p and every y′ ∈ Y .

[Cressie, 1993; Lauritzen, 1996]

Page 37: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

An apocryphal theorem

The Hammersley-Clifford theorem was never published by itsauthors, but only through Grimmet (1973), Preston (1973),Sherman (1973), Besag (1974). The authors were dissatisfied withthe positivity constraint: The joint density could only be recoveredfrom the full conditionals when the support of the joint was madeof the product of the supports of the full conditionals (withobvious counter-examples. Moussouris’ counter-example put a fullstop to their endeavors.

[Hammersley, 1974]

Page 38: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

To Gibbs or not to Gibbs?

Julian Besag should certainly be credited to a large extent of the(re?-)discovery of the Gibbs sampler.

Page 39: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

To Gibbs or not to Gibbs?

Julian Besag should certainly be credited to a large extent of the(re?-)discovery of the Gibbs sampler.

The simulation procedure is to consider the sitescyclically and, at each stage, to amend or leave unalteredthe particular site value in question, according to aprobability distribution whose elements depend upon thecurrent value at neighboring sites (...) However, thetechnique is unlikely to be particularly helpful in manyother than binary situations and the Markov chain itselfhas no practical interpretation.

[Besag, 1974]

Page 40: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Broader perspective

In 1964, Hammersley and Handscomb wrote a (the first?)textbook on Monte Carlo methods: they coverThey cover such topics as

I “Crude Monte Carlo“;

I importance sampling;

I control variates; and

I “Conditional Monte Carlo”, which looks surprisingly like amissing-data Gibbs completion approach.

They state in the Preface

We are convinced nevertheless that Monte Carlo methodswill one day reach an impressive maturity.

Page 41: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Clicking in

After Peskun (1973), MCMC mostly dormant in mainstreamstatistical world for about 10 years, then several papers/bookshighlighted its usefulness in specific settings:

I Geman and Geman (1984)

I Besag (1986)

I Strauss (1986)

I Ripley (Stochastic Simulation, 1987)

I Tanner and Wong (1987)

I Younes (1988)

Page 42: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Enters the Gibbs sampler

Geman and Geman (1984), building on Metropolis et al. (1953),Hastings (1970), and Peskun (1973), constructed a Gibbs samplerfor optimisation in a discrete image processing problem withoutcompletion.Responsible for the name Gibbs sampling, because method used forthe Bayesian study of Gibbs random fields linked to the physicistJosiah Willard Gibbs (1839–1903)Back to Metropolis et al., 1953: the Gibbs sampler is used as asimulated annealing algorithm and ergodicity is proven on thecollection of global maxima

Page 43: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Besag (1986) integrates GS for SA...

...easy to construct the transition matrix Q, of a discretetime Markov chain, with state space Ω and limitdistribution (4). Simulated annealing proceeds byrunning an associated time inhomogeneous Markov chainwith transition matrices QT , where T is progressivelydecreased according to a prescribed “schedule” to a valueclose to zero.

[Besag, 1986]

Page 44: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...and links with Metropolis-Hastings...

There are various related methods of constructing amanageable QT (Hastings, 1970). Geman and Geman(1984) adopt the simplest, which they term the ”Gibbssampler” (...) time reversibility, a common ingredient inthis type of problem (see, for example, Besag, 1977a), ispresent at individual stages but not over complete cycles,though Peter Green has pointed out that it returns if QTis taken over a pair of cycles, the second of which visitspixels in reverse order

[Besag, 1986]

Page 45: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...seeing the larger picture,...

As Geman and Geman (1984) point out, any property ofthe (posterior) distribution P (x|y) can be simulated byrunning the Gibbs sampler at “temperature” T = 1.Thus, if xi maximizes P (xi|y), then it is the mostfrequently occurring colour at pixel i in an infiniterealization of the Markov chain with transition matrix Qof Section 2.3. The xi’s can therefore be simultaneouslyestimated from a single finite realization of the chain. Itis not yet clear how long the realization needs to be,particularly for estimation near colour boundaries, but theamount of computation required is generally prohibitivefor routine purposes

[Besag, 1986]

Page 46: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...seeing the larger picture,...

P (x|y) can be simulated using the Gibbs sampler, assuggested by Grenander (1983) and by Geman andGeman (1984). My dismissal of such an approach forroutine applications was somewhat cavalier:purpose-built array processors could become relativelyinexpensive (...) suppose that, for 100 complete cyclessay, images have been collected from the Gibbs sampler(or by Metropolis’ method), following a “settling-in”period of perhaps another 100 cycles, which should caterfor fairly intricate priors (...) These 100 images shouldoften be adequate for estimating properties of theposterior (...) and for making approximate associatedconfidence statements, as mentioned by Mr Haslett.

[Besag, 1986]

Page 47: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...if not going fully Bayes!

...a neater and more efficient procedure [for parameterestimation] is to adopt maximum ”pseudo-likelihood”estimation (Besag, 1975)

Page 48: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...if not going fully Bayes!

...a neater and more efficient procedure [for parameterestimation] is to adopt maximum ”pseudo-likelihood”estimation (Besag, 1975)

I have become increasingly enamoured with the Bayesianparadigm

[Besag, 1986]

Page 49: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...if not going fully Bayes!

...a neater and more efficient procedure [for parameterestimation] is to adopt maximum ”pseudo-likelihood”estimation (Besag, 1975)

I have become increasingly enamoured with the Bayesianparadigm

[Besag, 1986]

The pair (xi, βi) is then a (bivariate) Markov field andcan be reconstructed as a bivariate process by themethods described in Professor Besag’s paper.

[Clifford, 1986]

Page 50: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...if not going fully Bayes!

...a neater and more efficient procedure [for parameterestimation] is to adopt maximum ”pseudo-likelihood”estimation (Besag, 1975)

I have become increasingly enamoured with the Bayesianparadigm

[Besag, 1986]

The simulation-based estimator EpostΨ(X) will differfrom the m.a.p. estimator Ψ(x).

[Silverman, 1986]

Page 51: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Discussants of Besag (1986)

Impressive who’s who: D.M. Titterington, P. Clifford, P. Green, P.Brown, B. Silverman, F. Critchley, F. Kelly, K. Mardia, C.Jennison, J. Kent, D. Spiegelhalter, H. Wynn, D. and S. Geman, J.Haslett, J. Kay, H. Kunsch, P. Switzer, B. Torsney, &tc

Page 52: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

A comment on Besag (1986)

While special purpose algorithms will determine theutility of the Bayesian methods, the general purposemethods-stochastic relaxation and simulation of solutionsof the Langevin equation (Grenander, 1983; Geman andGeman, 1984; Gidas, 1985a; Geman and Hwang, 1986)have proven enormously convenient and versatile. We areable to apply a single computer program to every newproblem by merely changing the subroutine thatcomputes the energy function in the Gibbs representationof the posterior distribution.

[Geman and McClure, 1986]

Page 53: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Another one

It is easy to compute exact marginal and joint posteriorprobabilities of currently unobserved features, conditionalon those clinical findings currently available(Spiegelhalter, 1986a,b), the updating taking the form of‘propagating evidence’ through the network (...) it wouldbe interesting to see if the techniques described tonight,which are of intermediate complexity, may have anyapplications in this new and exciting area [causalnetworks].

[Spiegelhalter, 1986]

Page 54: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

The candidate’s formula

Representation of the marginal likelihood as

m(x)π(θ)f(x|θ)π(θ|x)

or of the marginal predictive as

pn(y?|y) = f(y?|θ)πn(θ|y)/πn+1(θ|y, y?)

[Besag, 1989]

Why candidate?

“Equation (2) appeared without explanation in a DurhamUniversity undergraduate final examination script of 1984.Regrettably, the student’s name is no longer known to me.”

Page 55: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

The candidate’s formula

Representation of the marginal likelihood as

m(x)π(θ)f(x|θ)π(θ|x)

or of the marginal predictive as

pn(y?|y) = f(y?|θ)πn(θ|y)/πn+1(θ|y, y?)

[Besag, 1989]

Why candidate?

“Equation (2) appeared without explanation in a DurhamUniversity undergraduate final examination script of 1984.Regrettably, the student’s name is no longer known to me.”

Page 56: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Implications

I Newton and Raftery (1994) used this representation to derivethe [infamous] harmonic mean approximation to the marginallikelihood

I Gelfand and Dey (1994)

I Geyer and Thompson (1995)

I Chib (1995)

Page 57: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Implications

I Newton and Raftery (1994)

I Gelfand and Dey (1994) also relied on this formula for thesame purpose in a more general perspective

I Geyer and Thompson (1995)

I Chib (1995)

Page 58: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Implications

I Newton and Raftery (1994)

I Gelfand and Dey (1994)

I Geyer and Thompson (1995) derived MLEs by a Monte Carloapproximation to the normalising constant

I Chib (1995)

Page 59: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Implications

I Newton and Raftery (1994)

I Gelfand and Dey (1994)

I Geyer and Thompson (1995)

I Chib (1995) uses this representation to build a MCMCapproximation to the marginal likelihood

Page 60: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Final steps to

Impact

“This is surely a revolution.”[Clifford, 1993]

Geman and Geman (1984) is one more spark that led to theexplosion, as it had a clear influence on Gelfand, Green, Smith,Spiegelhalter and others.Sparked new interest in Bayesian methods, statistical computing,algorithms, and stochastic processes through the use of computingalgorithms such as the Gibbs sampler and the Metropolis–Hastingsalgorithm.

Page 61: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Final steps to

Impact

“[Gibbs sampler] use seems to have been isolated in the spatialstatistics community until Gelfand and Smith (1990)”

[Geyer, 1990]

Geman and Geman (1984) is one more spark that led to theexplosion, as it had a clear influence on Gelfand, Green, Smith,Spiegelhalter and others.Sparked new interest in Bayesian methods, statistical computing,algorithms, and stochastic processes through the use of computingalgorithms such as the Gibbs sampler and the Metropolis–Hastingsalgorithm.

Page 62: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Final steps to

Data augmentation

Tanner and Wong (1987) has essentialy the same ingredients asGelfand and Smith (1990): simulating from conditionals issimulating from the joint

Lower impact:

I emphasis on missing data problems (hence data augmentation)

I MCMC approximation to the target at every iteration

π(θ|x) ≈ 1

K

K∑k=1

π(θ|x, zt,k) , zt,k ∼ πt−1(z|x) ,

I too close to Rubin’s (1978) multiple imputation

I theoretical backup based on functional analysis (Markov kernel hadto be uniformly bounded and equicontinuous)

Page 63: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Final steps to

Data augmentation

Tanner and Wong (1987) has essentialy the same ingredients asGelfand and Smith (1990): simulating from conditionals issimulating from the jointLower impact:

I emphasis on missing data problems (hence data augmentation)

I MCMC approximation to the target at every iteration

π(θ|x) ≈ 1

K

K∑k=1

π(θ|x, zt,k) , zt,k ∼ πt−1(z|x) ,

I too close to Rubin’s (1978) multiple imputation

I theoretical backup based on functional analysis (Markov kernel hadto be uniformly bounded and equicontinuous)

Page 64: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

Epiphany

In June 1989, at a Bayesian workshop in Sherbrooke,Quebec, Adrian Smith exposed for the first time (?)the generic features of Gibbs sampler, exhibiting a tenline Fortran program handling a random effect model

Yij = θi + εij , i = 1, . . . ,K, j = 1, . . . , J,

θi ∼ N(µ, σ2θ) εij ∼ N(0, σ2ε)

by full conditionals on µ, σθ, σε...[Gelfand and Smith, 1990]

This was enough to convince the whole audience!

Page 65: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

Garden of Eden

In early 1990s, researchers found that Gibbs and then Metropolis -Hastings algorithms would crack almost any problem!Flood of papers followed applying MCMC:

I linear mixed models (Gelfand et al., 1990; Zeger and Karim, 1991;Wang et al., 1993, 1994)

I generalized linear mixed models (Albert and Chib, 1993)

I mixture models (Tanner and Wong, 1987; Diebolt and X., 1990,1994; Escobar and West, 1993)

I changepoint analysis (Carlin et al., 1992)

I point processes (Grenander and Møller, 1994)

I &tc

Page 66: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

Garden of Eden

In early 1990s, researchers found that Gibbs and then Metropolis -Hastings algorithms would crack almost any problem!Flood of papers followed applying MCMC:

I genomics (Stephens and Smith, 1993; Lawrence et al., 1993;Churchill, 1995; Geyer and Thompson, 1995)

I ecology (George and X, 1992; Dupuis, 1995)

I variable selection in regression (George and mcCulloch, 1993)

I spatial statistics (Raftery and Banfield, 1991)

I longitudinal studies (Lange et al., 1992)

I &tc

Page 67: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”[Clifford, 1993]

I Geyer and Thompson, 1992, relied on MCMC methods for MLestimation

I Smith and Roberts, 1993

I Besag and Green, 1993

I Tierney, 1994

I Liu, Wong and Kong, 1994,95

I Mengersen and Tweedie, 1996

Page 68: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”[Clifford, 1993]

I Geyer and Thompson, 1992,

I Smith and Roberts, 1993 discussed convergence diagnoses andapplications, incl. mixtures for Gibbs and Metropolis–Hastings

I Besag and Green, 1993

I Tierney, 1994

I Liu, Wong and Kong, 1994,95

I Mengersen and Tweedie, 1996

Page 69: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”[Clifford, 1993]

I Geyer and Thompson, 1992,

I Smith and Roberts, 1993

I Besag and Green, 1993 stated the desideratas forconvergences, and connect MCMC with auxiliary andantithetic variables

I Tierney, 1994

I Liu, Wong and Kong, 1994,95

I Mengersen and Tweedie, 1996

Page 70: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”[Clifford, 1993]

I Geyer and Thompson, 1992,

I Smith and Roberts, 1993

I Besag and Green, 1993

I Tierney, 1994 laid out all of the assumptions needed toanalyze the Markov chains and then developed theirproperties, in particular, convergence of ergodic averages andcentral limit theorems

I Liu, Wong and Kong, 1994,95

I Mengersen and Tweedie, 1996

Page 71: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”[Clifford, 1993]

I Geyer and Thompson, 1992,

I Smith and Roberts, 1993

I Besag and Green, 1993

I Tierney, 1994

I Liu, Wong and Kong, 1994,95 analyzed the covariancestructure of Gibbs sampling, and were able to formallyestablish the validity of Rao-Blackwellization in Gibbssampling

I Mengersen and Tweedie, 1996

Page 72: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”[Clifford, 1993]

I Geyer and Thompson, 1992,

I Smith and Roberts, 1993

I Besag and Green, 1993

I Tierney, 1994

I Liu, Wong and Kong, 1994,95

I Mengersen and Tweedie, 1996 set the tone for the study ofthe speed of convergence of MCMC algorithms to the targetdistribution

Page 73: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”[Clifford, 1993]

I Geyer and Thompson, 1992,

I Smith and Roberts, 1993

I Besag and Green, 1993

I Tierney, 1994

I Liu, Wong and Kong, 1994,95

I Mengersen and Tweedie, 1996

I Gilks, Clayton and Spiegelhalter, 1993

I &tc...

Page 74: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Convergence diagnoses

Convergence diagnoses

Can we really tell when a complicated Markov chain hasreached equilibrium? Frankly, I doubt it.

[Clifford, 1993]

Explosion of methods

I Gelman and Rubin (1991)

I Besag and Green (1992)

I Geyer (1992)

I Raftery and Lewis (1992)

I Cowles and Carlin (1996) coda

I Brooks and Roberts (1998)

I &tc

Page 75: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Particle systems

Particles, again

Iterating importance sampling is about as old as Monte Carlomethods themselves!

[Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]

Found in the molecular simulation literature of the 50’s withself-avoiding random walks and signal processing

[Marshall, 1965; Handschin and Mayne, 1969]

Use of the term “particle” dates back to Kitagawa (1996), and Carpenter

et al. (1997) coined the term “particle filter”.

Page 76: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Particle systems

Particles, again

Iterating importance sampling is about as old as Monte Carlomethods themselves!

[Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]

Found in the molecular simulation literature of the 50’s withself-avoiding random walks and signal processing

[Marshall, 1965; Handschin and Mayne, 1969]

Use of the term “particle” dates back to Kitagawa (1996), and Carpenter

et al. (1997) coined the term “particle filter”.

Page 77: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Particle systems

Bootstrap filter and sequential Monte Carlo

Gordon, Salmon and Smith (1993) introduced the bootstrap filterwhich, while formally connected with importance sampling,involves past simulations and possible MCMC steps (Gilks andBerzuini, 2001).Sequential imputation was developped in Kong, Liu and Wong(1994), while Liu and Chen (1995) first formally pointed out theimportance of resampling in “sequential Monte Carlo”, a term theycoined

Page 78: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Particle systems

pMC versus pMCMC

I Recycling of past simulations legitimate to build betterimportance sampling functions as in population Monte Carlo

[Iba, 2000; Cappe et al, 2004; Del Moral et al., 2007]

I Recent synthesis by Andrieu, Doucet, and Hollenstein (2010)using particles to build an evolving MCMC kernel pθ(y1:T ) instate space models p(x1:T )p(y1:T |x1:T ), along with Andrieu’sand Roberts’ (2009) use of approximations in MCMCacceptance steps

[Kennedy and Kulti, 1985]

Page 79: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Reversible jump

Reversible jump

Generaly considered as the second Revolution.

Formalisation of a Markov chain moving acrossmodels and parameter spaces allows for theBayesian processing of a wide variety of modelsand to the success of Bayesian model choice

Definition of a proper balance condition on cross-model Markovkernels gives a generic setup for exploring variable dimensionspaces, even when the number of models under comparison isinfinite.

[Green, 1995]

Page 80: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Perfect sampling

Perfect sampling

Seminal paper of Propp and Wilson (1996) showed how to useMCMC methods to produce an exact (or perfect) simulation fromthe target.

Outburst of papers, particularly from Jesper Møller and coauthors,but the excitement somehow dried out [except in dedicated areas]as construction of perfect samplers is hard and coalescence timesvery high...

[Møller and Waagepetersen, 2003]

Page 81: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Perfect sampling

Perfect sampling

Seminal paper of Propp and Wilson (1996) showed how to useMCMC methods to produce an exact (or perfect) simulation fromthe target.

Outburst of papers, particularly from Jesper Møller and coauthors,but the excitement somehow dried out [except in dedicated areas]as construction of perfect samplers is hard and coalescence timesvery high...

[Møller and Waagepetersen, 2003]

Page 82: A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Envoi

To be continued...

...standing on the shoulders of giants