priors, normal models, computing posteriors
DESCRIPTION
Priors, Normal Models, Computing Posteriors. st5219 : Bayesian hierarchical modelling lecture 2.4. An all purpose sampling tool. Monte Carlo: requires knowing the distribution---often don’t - PowerPoint PPT PresentationTRANSCRIPT
Priors, Normal Models,
Computing Posteriorsst5219: Bayesian hierarchical modelling
lecture 2.4
Monte Carlo: requires knowing the distribution---often don’t
Importance Sampling: requires being able to sample from something vaguely like the posterior---often can’t
Markov chain Monte Carlo can almost always be used
An all purpose sampling tool
Discrete time stochastic process obeying the Markov property
Denote the distribution of t+1 given t as π(t+1 | t)
We consider on some part of ℝd
t+1 may have a stationary distribution, meaning that if s comes from the stationary distribution then so does t for s<t
Markov chains
The big idea: if you could set up a Markov chain that had a stationary distribution
equal to the posterior, you could sample just by simulating a Markov chain and
then using its trajectory as your sample from the posterior
Obviously if π(t+1 | t) = f( |data) then you would just be sampling the posterior
When Monte Carlo is not possible, it’s really easy to set up a Markov chain with this property using the Metropolis-Hastings algorithm
Metropolis et al (1953) J Chem Phys 21:1087--92 Hastings (1970) Biometrika 57:97--109
Markov chain Monte Carlo
Metropolis-Hastings algorithm
Board
work
Suppose you can calculate f( |data) only to a constant of proportionality
Eg f(data| )f( ) No problem: the constant of proportionality
cancels
1: constants
You can choose almost whatever you want for q(*→ ). A common choice is N(,Σ), with Σ an arbitrary (co)variance
Note that for 1-D, you haveand so the qs cancel
This cancelling of qs iscommon to all symmetric distributions around the current value
2: choice of q
Board work
If you can propose from the posterior itself, so that q(*→ )=f( |data), then α=1 and you always accept proposals
So Monte Carlo is a special case of MCMC
3: special case Monte Carlo
If using normal proposals, how to choose the standard deviation or bandwidth?
Aim for 20% to 40% of proposals accepted and you’re close to optimal
Too small: very slow movement Too big: very slow movement Goldilocks: fast movement
4: choice of bandwidth
If you know the marginal posterior of a parameter or block of parameter given the other parameters, you can just propose from that marginal
This gives α=1 This is called Gibbs sampling---nothing
special, just a good MCMC algorithm
5: special case Gibbs sampler
It is common to start with an arbitrary 0 To stop this biasing your estimates, usually
discard samples from a “burn in” period This lets the chain forget where you started
it If you start near posterior, short burn in ok If you start far from posterior, longer burn in
needed
6: burn in
Running multiple chains in parallel allows:◦ you to check convergence to the same
distribution even from initial values far from each other
◦ you can utilise X multiple processors (eg on a server) to get a sample X times as big in the same amount of time
Make sure you start with different seeds though (eg not set.seed(666) all the time)
7: multiple chains
If 2+ parameters are tightly correlated, then sampling one at a time will not work efficiently
Several options:◦ reparametrise the model so that the posteriors
are more orthogonal◦ use a multivariate proposal distribution that
accounts for the correlation
8: correlation of parameters
The cowboy approach is to look at a trace plot of the chain only (Butcher’s test)
More formal methods exist (see tutorial 2)
9: assessing convergence
As before logpost=function(cu){ cu$logp=dbinom(98,727,cu$p*cu$sigma,log=TRUE) +dbeta(cu$p,1,1,log=TRUE) +dbeta(cu$sigma,630,136,log=TRUE) cu}
New rejecter=function(cu){ reject=FALSE if(cu$p<0)reject=TRUE; if(cu$p>1)reject=TRUE if(cu$sigma<0)reject=TRUE;if(cu$sigma>1)reject=TRUE reject}
An example: H1N1 again
current=list(p=0.5,sigma=0.5) current=logposterior(current) NDRAWS=10000 dump=list(p=rep(0,NDRAWS),sigma=rep(0,NDRAWS)) for(iteration in 1:NDRAWS) { old=current current$p=rnorm(1,current$p,0.1) current$sigma=rnorm(1,current$sigma,0.1) REJECT=rejecter(current) if(!REJECT) { current=logposterior(current)
An example: H1N1 again
if(!REJECT) { current=logposterior(current) accept_prob=current$logp-old$logp lu=log(runif(1)) if(lu>accept_prob)REJECT=TRUE } if(REJECT)current=old dump$p[iteration]=current$p dump$sigma[iteration]=current$sigma }
An example: H1N1 again
Using that routine
Using that routine
Using that routine
The choice of bandwidth is arbitrary Asymptotically, doesn’t matter But in practice, need to choose right...
Bandwidths
Using bandwidths = 1
Using bandwidths = 1
Using bandwidths = 1
Using bandwidths = 0.01
Using bandwidths = 0.01
Using bandwidths = 0.01
Using bandwidths = 0.001
Using bandwidths = 0.001
Using bandwidths = 0.001
Same dataset, but now the non-informative prior:◦ p ~ U(0,1)◦σ ~ U(0,1)
Another example
Using bandwidths = 0.001
Using bandwidths = 0.01
Using bandwidths = 0.1
Using bandwidths = 1
Using bandwidths = 0.1
Example 2 Why does this not work? Tightly correlated posterior Plus weird shape Very hard to design a local movement rule
to encourage swift mixing through the joint posterior distribution
SummaryMonte Carlo:use whenever you can: but rarely are able
Importance sampling:if you can find a distribution quite close to the posterior
MCMC:• good general purpose tool• sometimes an art to get working effectively
Next week: everything you already know how to do, differentlyVersions of:• t-tests• regression• etc
After that:• hierarchical modelling• BUGS