lecture 3 introduction to monte carlo markov chain (mcmc) methods

Lecture 3

Introduction to Monte Carlo Markov chain (MCMC) methods

Lecture Contents

• How does WinBUGS fit a regression?

• Gibbs Sampling

• Convergence and burnin

• How many iterations?

• Logistic regression example

Linear regression

• Let us revisit our simple linear regression model

• To this model we added the following priors in WinBUGS

• Ideally we would sample from the joint posterior distribution

),0(~ 2

10

Ne

eweightheight

i

iii

3610

12

1100

10,10 where

),(~

),,0(~),,0(~

mm

mNmN

)|,,( 210 yp

Linear Regression ctd.

• In this case we can sample from the joint posterior as described in the last lecture

• However this is not the case for all models and so we will now describe other simulation-based methods that can be used.

• These methods come from a family of methods called Markov chain Monte Carlo (MCMC) methods and here we will focus on a method called Gibbs Sampling.

MCMC Methods

• Goal: To sample from joint posterior distribution.

• Problem: For complex models this involves multidimensional integration

• Solution: It may be possible to sample from conditional posterior distributions,

• It can be shown that after convergence such a sampling approach generates dependent samples from the joint posterior distribution.

)|,,( 210 yp

),,|(),,,|(),,,|( 1022

012

10 ypypyp

Gibbs Sampling

• When we can sample directly from the conditional posterior distributions then such an algorithm is known as Gibbs Sampling.

• This proceeds as follows for the linear regression example:

• Firstly give all unknown parameters starting values,

• Next loop through the following steps:

).0(),0(),0( 210

Gibbs Sampling ctd.

• Sample from

.)1( generate to))1(),1(,|(

from then and )1( generate to))0(),1(,|(

from then and )1( generate to))0(),0(,|(

210

2

12

01

02

10

yp

yp

yp

These steps are then repeated with the generatedvalues from this loop replacing the starting values.The chain of values produced by this procedure areknown as a Markov chain, and it is hoped that thischain converges to its equilibrium distribution whichis the joint posterior distribution.

Calculating the conditional distributions

• In order for the algorithm to work we need to sample from the conditional posterior distributions.

• If these distributions have standard forms then it is easy to draw random samples from them.

• Mathematically we write down the full posterior and assume all parameters are constants apart from the parameter of interest.

• We then try to match the resulting formulae to a standard distribution.

Matching distributional forms

• If a parameter θ follows a Normal(μ,σ2) distribution then we can write

• Similarly if θ follows a Gamma(α,β) distribution then we can write

22

2

and 2

1 where

)exp()(

ba

constbap

ba

bp a

and 1 where

)exp()(

Step 1: β0

Nxy

NNm

xyN

mb

N

m

N

m

constxyN

m

xymm

yppyp

iii

iii

iii

iii

2

100

12

1

20

2

1

20

22

02

012202

021

21022

0

20

0

2100

210

,)(1

~, as

)(11

111

gives powers Matching

)(1

))1

((exp

))(2

1exp(

1)

2exp(

1

),,|()(),,|(

00

0

0

Step 2: β1

i ii i

i ii ii

iii

i i

i ii i

ii

ii i

iii

xx

xyxNm

yxx

mb

x

m

x

m

constxyx

m

xymm

yppyp

2

2

2

011

02

1

2

2

1

2

1

2

2

1

22

2

12

102202

2

121

21022

1

21

1

2101

201

,~, as

))((11

111

gives powers Matching

)(1

))1

((exp

))(2

1exp(

1)

2exp(

1

),,|()(),,|(

11

1

1

Step 3: 1/σ2

i iN

iii

N

iii

eba

bayp

xy

xy

yppyp

221

2

102

2102

12

12

2

210222

1

2

210

210

2

, where

),(~),,|/1(

gives termsMatching

)((1

exp1

))(2

1exp(

1exp

1

),,|()/1(),,|/1(

Algorithm Summary

Repeat the following three steps

• 1. Generate β0 from its Normal conditional distribution.

• 2. Generate β1 from its Normal conditional distribution.

• 3. Generate 1/σ2 from its Gamma conditional distribution

Convergence and burn-in

Two questions that immediately spring to mind are:

1. We start from arbitrary starting values so when can we safely say that our samples are from the correct distribution?

2. After this point how long should we run the chain for and store values?

Checking Convergence

This is the researchers responsibility!

• Convergence is to a target distribution (the required posterior), not to a single value as in ML methods.

• Once convergence has been reached, samples should look like a random scatter about a stable mean value.

Convergence

• Convergence occurs here at around 100 iterations.

beta[1]

iteration

1 250 500 750 1000

7.0

7.5

8.0

8.5

9.0

Checking convergence 2

• One approach (in WinBUGS) is to run many long chains with widely differing starting values.

• WinBUGS also has the Brooks-Gelman-Rubin diagnostic which is based on the ratio of between-within chain variances (ANOVA). This diagnostic should converge to 1.0 on convergence.

• MLwiN has other diagnostics that we will cover on Wednesday.

Demo of multiple chains in WinBUGS

• Here we transfer to the computer for a demonstration with the regression example of multiple chains (also mention node info)

beta0 chains 1:2

iteration

1 25 50 75 100

100.0

150.0

200.0

250.0

300.0

Demo of multiple chains in WinBUGS

• Average 80% interval within-chains (blue) and pooled 80% interval between chains (green) – converge to stable values

• Ratio pooled:average interval width (red) – converge to 1.

beta0 chains 1:2

iteration

1 50 100 150 200

0.0

0.5

1.0

1.5

Convergence in more complex models

• Convergence in linear regression is (almost) instantaneous.

• Here is an example of slower convergence

carmean chains 1:2

iteration

1 500

0.0 1.0 2.0 3.0 4.0

carmean chains 1:2

iteration

1 250 500 750 1000

-25.0

0.0

25.0

50.0

75.0

100.0

How many iterations after convergence?

• After convergence, further iterations are needed to obtain samples for posterior inference.

• More iterations = more accurate posterior estimates.• MCMC chains are dependent samples and so the

dependence or autocorrelation in the chain will influence how many iterations we need.

• Accuracy of the posterior estimates can be assessed by the Monte Carlo standard error (MCSE) for each parameter.

• Methods for calculating MCSE are given in later lectures.

Inference using posterior samples from MCMC runs

• A powerful feature of MCMC and the Bayesian approach is that all inference is based on the joint posterior distribution.

• We can therefore address a wide range of substantive questions by appropriate summaries of the posterior.

• Typically report either the mean or median of the posterior samples for each parameter of interest as a point estimate

• 2.5% and 97.5% percentiles of the posterior sample for each parameter give a 95% posterior credible interval (interval within which the parameter lies with probability 0.95)

Derived Quantities

• Once we have a sample from the posterior we can answer lots of questions simply by investigating this sample.

Examples:What is the probability that θ>0?

What is the probability that θ1> θ2?

What is a 95% interval for θ1/(θ1+ θ2)?See later for examples of these sorts of

derived quantities.

Logistic regression example

• In the practical that follows we will look at the following dataset of rat tumours and fit a logistic regression model to it:

Dose level Number of rats Number with tumors

0 14 4

1 34 4

2 34 2

Logistic regression model

• A standard Bayesian logistic regression model for this data can be written as follows:

• WinBUGS can fit this model but can we write out the conditional posterior distributions and use Gibbs Sampling?

),0(~),,0(~

)(

),(~

1100

10

mNmN

doseplogit

pnBinomialy

ii

iii

Conditional distribution for β0

?~),|(

)exp(1

1

)exp(1

)exp()

2exp(

1

),|()(),|(

10

1010

10

0

20

0

10010

yp

xx

x

mm

yppyp

i

yn

i

y

i

i

iii

This distribution is not a standard distribution and so we cannot simply simulate from a standardrandom number generator. However both WinBUGS and MLwiN can fit this model using MCMC. We will however not see how until day 5.

Hints for the next practical

• In the next practical you will be creating WinBUGS code for a logistic regression model.

• In this practical you get less help and so I would suggest that looking at the Seeds example in the WinBUGS examples may help. The seeds example is more complicated than what you require but will be helpful for showing the necessary WinBUGS statements.

lecture 3 introduction to monte carlo markov chain (mcmc) methods

Documents

convergence convergence

posterior samples

posterior inference

required posterior

gibbs sampling convergence

standard distribution

checking convergence

chain of values