priors, normal models, computing posteriors

40
Priors, Normal Models, Computing Posteriors st5219: Bayesian hierarchical modelling lecture 2.4

Upload: zytka

Post on 23-Feb-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Priors, Normal Models, Computing Posteriors. st5219 : Bayesian hierarchical modelling lecture 2.4. An all purpose sampling tool. Monte Carlo: requires knowing the distribution---often don’t - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Priors, Normal Models, Computing Posteriors

Priors, Normal Models,

Computing Posteriorsst5219: Bayesian hierarchical modelling

lecture 2.4

Page 2: Priors, Normal Models, Computing Posteriors

Monte Carlo: requires knowing the distribution---often don’t

Importance Sampling: requires being able to sample from something vaguely like the posterior---often can’t

Markov chain Monte Carlo can almost always be used

An all purpose sampling tool

Page 3: Priors, Normal Models, Computing Posteriors

Discrete time stochastic process obeying the Markov property

Denote the distribution of t+1 given t as π(t+1 | t)

We consider on some part of ℝd

t+1 may have a stationary distribution, meaning that if s comes from the stationary distribution then so does t for s<t

Markov chains

The big idea: if you could set up a Markov chain that had a stationary distribution

equal to the posterior, you could sample just by simulating a Markov chain and

then using its trajectory as your sample from the posterior

Page 4: Priors, Normal Models, Computing Posteriors

Obviously if π(t+1 | t) = f( |data) then you would just be sampling the posterior

When Monte Carlo is not possible, it’s really easy to set up a Markov chain with this property using the Metropolis-Hastings algorithm

Metropolis et al (1953) J Chem Phys 21:1087--92 Hastings (1970) Biometrika 57:97--109

Markov chain Monte Carlo

Page 5: Priors, Normal Models, Computing Posteriors

Metropolis-Hastings algorithm

Board

work

Page 6: Priors, Normal Models, Computing Posteriors

Suppose you can calculate f( |data) only to a constant of proportionality

Eg f(data| )f( ) No problem: the constant of proportionality

cancels

1: constants

Page 7: Priors, Normal Models, Computing Posteriors

You can choose almost whatever you want for q(*→ ). A common choice is N(,Σ), with Σ an arbitrary (co)variance

Note that for 1-D, you haveand so the qs cancel

This cancelling of qs iscommon to all symmetric distributions around the current value

2: choice of q

Board work

Page 8: Priors, Normal Models, Computing Posteriors

If you can propose from the posterior itself, so that q(*→ )=f( |data), then α=1 and you always accept proposals

So Monte Carlo is a special case of MCMC

3: special case Monte Carlo

Page 9: Priors, Normal Models, Computing Posteriors

If using normal proposals, how to choose the standard deviation or bandwidth?

Aim for 20% to 40% of proposals accepted and you’re close to optimal

Too small: very slow movement Too big: very slow movement Goldilocks: fast movement

4: choice of bandwidth

Page 10: Priors, Normal Models, Computing Posteriors

If you know the marginal posterior of a parameter or block of parameter given the other parameters, you can just propose from that marginal

This gives α=1 This is called Gibbs sampling---nothing

special, just a good MCMC algorithm

5: special case Gibbs sampler

Page 11: Priors, Normal Models, Computing Posteriors

It is common to start with an arbitrary 0 To stop this biasing your estimates, usually

discard samples from a “burn in” period This lets the chain forget where you started

it If you start near posterior, short burn in ok If you start far from posterior, longer burn in

needed

6: burn in

Page 12: Priors, Normal Models, Computing Posteriors
Page 13: Priors, Normal Models, Computing Posteriors

Running multiple chains in parallel allows:◦ you to check convergence to the same

distribution even from initial values far from each other

◦ you can utilise X multiple processors (eg on a server) to get a sample X times as big in the same amount of time

Make sure you start with different seeds though (eg not set.seed(666) all the time)

7: multiple chains

Page 14: Priors, Normal Models, Computing Posteriors

If 2+ parameters are tightly correlated, then sampling one at a time will not work efficiently

Several options:◦ reparametrise the model so that the posteriors

are more orthogonal◦ use a multivariate proposal distribution that

accounts for the correlation

8: correlation of parameters

Page 15: Priors, Normal Models, Computing Posteriors

The cowboy approach is to look at a trace plot of the chain only (Butcher’s test)

More formal methods exist (see tutorial 2)

9: assessing convergence

Page 16: Priors, Normal Models, Computing Posteriors

As before logpost=function(cu){ cu$logp=dbinom(98,727,cu$p*cu$sigma,log=TRUE) +dbeta(cu$p,1,1,log=TRUE) +dbeta(cu$sigma,630,136,log=TRUE) cu}

New rejecter=function(cu){ reject=FALSE if(cu$p<0)reject=TRUE; if(cu$p>1)reject=TRUE if(cu$sigma<0)reject=TRUE;if(cu$sigma>1)reject=TRUE reject}

An example: H1N1 again

Page 17: Priors, Normal Models, Computing Posteriors

current=list(p=0.5,sigma=0.5) current=logposterior(current) NDRAWS=10000 dump=list(p=rep(0,NDRAWS),sigma=rep(0,NDRAWS)) for(iteration in 1:NDRAWS) { old=current current$p=rnorm(1,current$p,0.1) current$sigma=rnorm(1,current$sigma,0.1) REJECT=rejecter(current) if(!REJECT) { current=logposterior(current)

An example: H1N1 again

Page 18: Priors, Normal Models, Computing Posteriors

if(!REJECT) { current=logposterior(current) accept_prob=current$logp-old$logp lu=log(runif(1)) if(lu>accept_prob)REJECT=TRUE } if(REJECT)current=old dump$p[iteration]=current$p dump$sigma[iteration]=current$sigma }

An example: H1N1 again

Page 19: Priors, Normal Models, Computing Posteriors

Using that routine

Page 20: Priors, Normal Models, Computing Posteriors

Using that routine

Page 21: Priors, Normal Models, Computing Posteriors

Using that routine

Page 22: Priors, Normal Models, Computing Posteriors

The choice of bandwidth is arbitrary Asymptotically, doesn’t matter But in practice, need to choose right...

Bandwidths

Page 23: Priors, Normal Models, Computing Posteriors

Using bandwidths = 1

Page 24: Priors, Normal Models, Computing Posteriors

Using bandwidths = 1

Page 25: Priors, Normal Models, Computing Posteriors

Using bandwidths = 1

Page 26: Priors, Normal Models, Computing Posteriors

Using bandwidths = 0.01

Page 27: Priors, Normal Models, Computing Posteriors

Using bandwidths = 0.01

Page 28: Priors, Normal Models, Computing Posteriors

Using bandwidths = 0.01

Page 29: Priors, Normal Models, Computing Posteriors

Using bandwidths = 0.001

Page 30: Priors, Normal Models, Computing Posteriors

Using bandwidths = 0.001

Page 31: Priors, Normal Models, Computing Posteriors

Using bandwidths = 0.001

Page 32: Priors, Normal Models, Computing Posteriors

Same dataset, but now the non-informative prior:◦ p ~ U(0,1)◦σ ~ U(0,1)

Another example

Page 33: Priors, Normal Models, Computing Posteriors

Using bandwidths = 0.001

Page 34: Priors, Normal Models, Computing Posteriors

Using bandwidths = 0.01

Page 35: Priors, Normal Models, Computing Posteriors

Using bandwidths = 0.1

Page 36: Priors, Normal Models, Computing Posteriors

Using bandwidths = 1

Page 37: Priors, Normal Models, Computing Posteriors

Using bandwidths = 0.1

Page 38: Priors, Normal Models, Computing Posteriors

Example 2 Why does this not work? Tightly correlated posterior Plus weird shape Very hard to design a local movement rule

to encourage swift mixing through the joint posterior distribution

Page 39: Priors, Normal Models, Computing Posteriors

SummaryMonte Carlo:use whenever you can: but rarely are able

Importance sampling:if you can find a distribution quite close to the posterior

MCMC:• good general purpose tool• sometimes an art to get working effectively

Page 40: Priors, Normal Models, Computing Posteriors

Next week: everything you already know how to do, differentlyVersions of:• t-tests• regression• etc

After that:• hierarchical modelling• BUGS