priors, normal models , computing posteriors

Priors, Normal Models,Computing Posteriors

st5219: Bayesian hierarchical modellinglecture 2.3

Why posteriors need to be computed If you can find a conjugate prior for your data-

parameter model, it’s very sensible to use that and to coerce your prior into that form

Eg you were going to have a normal prior but then noticed the gamma is conjugate in your model, then convert your normal prior into a gamma and use that

If not, you need to do some computation In the past: this was a big problem that

precluded Bayesian analysis outside of the ivory tower

Now: personal computers make this no problem lah

What’s the challenge?

Frequentist Bayesian For ML estimation: need

to find maximum of a surface or curve

Approaches calculus deterministic search

algorithms using (i) gradient or (ii) heights only

stochastic algorithms, eg simulated annealing, cross entropy

For Bayesian estimation: need to find the whole posterior

Optimisation is fairly simple

Quadrature is quite difficult

The curse of dimensionality

Toy problems Real problems Illustrate concepts Simple data, few

complications Often just use

calculus Usually few

parameters :o)

Solving real problems

Complex data, always loads of complications

Cannot do differentiation

Usually lots of parameters

:o(Lots of parameters prohibit grid searches

Simulations Assume you’re

interested in the lengths of elephant tusks

There will be some distribution of these

How do you try to estimate that distribution?

Why not?

Simulations Easy to sample

statistical distributions(more than elephants)

Can build up large samples quickly

Most popular method is MCMC

Others are Monte Carlo Importance sampling

If your sample is of size 10 000 or 100 000 then it IS the distribution you’re

sampling from

1.Sample the posterior

2.Calculate stuff you’re interested inlike the posterior mean (mean of sample), quantiles etc

Monte Carlo Capital of Monaco Casino Roulette wheel gave

inspiration to USscientists working onthe Manhattan project

Read Metropolis (1987)http://library.lanl.gov/cgi-bin/getfile?

00326866.pdf

When to use Monte Carlo If your posterior distribution is known, and you

can simulate from it, but it’s awkward to do anything else

Example: you have data (between sex difference in prey items per h within songlark nests) you assume are a random draw from a N(μ,σ²) distribution

xˉ=3.18 s=4.42 n=22 Prior: (μ,σ²) ~

NSIG(0,1/100,1/100,1/100)

Example Monte Carlo code library(geoR) ; set.seed(666) mu_0=0;kappa_0=1;nu_0=1;sigma2_0=100 ndraws=100000 prior_sigma2=rinvchisq(ndraws,nu_0,sigma2_0) prior_mu=rnorm(ndraws,mu_0,prior_sigma2/kappa_0) xbar=3.18 s=4.42 n=22

mu_n=(kappa_0/(kappa_0+n))*mu_0 + (n/(kappa_0+n))*xbar kappa_n = kappa_0+n nu_n = nu_0 + n sigma2_n = (nu_0*sigma2_0 + (n-1)*s*s

+ kappa_0*n*((xbar-mu_0)^2)/(kappa_0+n))/nu_n posterior_sigma2=rinvchisq(ndraws,nu_n,sigma2_n) posterior_mu=rnorm(ndraws,mu_n,posterior_sigma2/kappa_n) plot(posterior_mu[q],sqrt(posterior_sigma2[q]),

pch=16,col=rgb(0,0,0,0.1), xlab=expression(mu),ylab=expression(sigma))

Plot of posterior

Marginal posteriors summariser=function(x){print(paste( "Posterior mean is ",signif(mean(x),3), ", 95%CI is (",signif(quantile(x,0.025),3), ",",signif(quantile(x,0.975),3),")“,sep=""),quote=FALSE)}

summariser(posterior_mu)[1] Posterior mean is 3.04, 95%CI is (0.776,5.3)

summariser(posterior_sigma2)[1] Posterior mean is 24.7, 95%CI is (13.6,44.5)

summariser(sqrt(posterior_sigma2))[1] Posterior mean is 4.91, 95%CI is (3.69,6.67)

summariser(sqrt(posterior_sigma2)/posterior_mu)[1] Posterior mean is 1.37, 95%CI is (0.913,5.79)

Wondrousness of Bayesianism Once you have a sample from the posterior:

the world is your oyster! Want a confidence interval for the variance? Want a confidence interval for the standard

deviation? Want a confidence interval for the coefficient

of variation? How to do this in the frequentist paradigm?

See Koopmans et al (1964) Biometrika 51:25--32It’s not nice

Problem with Monte Carlo Usually an exact form for the posterior cannot

be got Two alternative simulation-based approaches

present themselves: Importance sampling Markov chain Monte Carlo

Importance Sampling The idea of importance sampling is that

instead of sampling from the actual posterior, you simulate from some other distribution and “correct” the sample by weighting it

If You want f(|data) You can simulate and evaluate the density of q() You can evaluate the density of f( |data) up to a

constant of proportionality you can use importance sampling.

Importance sampling recipe1. Generate a sample of theta from q()2. For each sample i, associate a weight

wi f(i|data)/q(i)3. Scale the ws such that wi =14. The posterior mean of any function g() of

interest isE[g()] = wi g(i)

5. You can get things like confidence intervals by working with the CDF of the marginals

Example: H1N1 Naïve version: Simulate (p,σ) from U(0,1) ², ie

q(p,σ)=1 over [0,1]² Weight by posterior, scale to sum to one

Code set.seed(666) logpost=function(p,sigma){ dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)}

NDRAWS=10000 sampled=list(p=runif(NDRAWS),sigma=runif(NDRAWS)) logposteriors=logpost(sampled$p,sampled$sigma) logposteriors=logposteriors-max(logposteriors) weights=exp(logposteriors) weights=weights/sum(weights) plot(sampled$p,sampled$sigma,cex=20*weights, pch=16,xlab='p', ylab=expression(sigma),xlim=0:1,ylim=0:1)

Naïve sample

Better code set.seed(666) logpost=function(p,sigma){ dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)}

NDRAWS=10000 sampled=list(p=runif(NDRAWS),sigma=rbeta(NDRAWS,630,136))

logposteriors=logpost(sampled$p,sampled$sigma) logposteriors=logposteriors-max(logposteriors) weights=exp(logposteriors-dbeta(sampled$sigma,630,136,log=TRUE))

weights=weights/sum(weights)

Better sample

Even better code set.seed(666) logpost=function(p,sigma){ dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)}

NDRAWS=10000 sampled=list(p=rbeta(NDRAWS,20,100),sigma=rbeta(NDRAWS,630,136))

logposteriors=logpost(sampled$p,sampled$sigma) logposteriors=logposteriors-max(logposteriors) weights=exp(logposteriors-dbeta(sampled$sigma,630,136,log=TRUE)-dbeta(sampled$p,20,100,log=TRUE))

weights=weights/sum(weights)

Better sample

Problems with importance sampling1. Choice of sampling distribution q: an obvious

choice, the prior, leads to a simple weight function (the likelihood) but if the prior is over dispersed relative to the posterior, lots of wasted samples

2. Alternatives would be to do it iteratively, trying the prior first, then tuning q to be nearer to the area that looks like the posterior and iterating until a good distribution is found

3. Ideally, sample directly from the posterior: weights 1/m

4. Dealing with a weighted sample is a bit of a nuisance

An alternative Finding the posterior is simpler if the

computer will automatically move in the right direction even given a rubbish guess at q

Markov chain Monte Carlo is an ideal way to achieve this

Next section!!!

priors, normal models , computing posteriors

Documents