priors, normal models , computing posteriors
DESCRIPTION
Priors, Normal Models , Computing Posteriors. st5219 : Bayesian hierarchical modelling lecture 2.3. Why posteriors need to be computed. If you can find a conjugate prior for your data-parameter model, it’s very sensible to use that and to coerce your prior into that form - PowerPoint PPT PresentationTRANSCRIPT
Priors, Normal Models,Computing Posteriors
st5219: Bayesian hierarchical modellinglecture 2.3
Why posteriors need to be computed If you can find a conjugate prior for your data-
parameter model, it’s very sensible to use that and to coerce your prior into that form
Eg you were going to have a normal prior but then noticed the gamma is conjugate in your model, then convert your normal prior into a gamma and use that
If not, you need to do some computation In the past: this was a big problem that
precluded Bayesian analysis outside of the ivory tower
Now: personal computers make this no problem lah
What’s the challenge?
Frequentist Bayesian For ML estimation: need
to find maximum of a surface or curve
Approaches calculus deterministic search
algorithms using (i) gradient or (ii) heights only
stochastic algorithms, eg simulated annealing, cross entropy
For Bayesian estimation: need to find the whole posterior
Optimisation is fairly simple
Quadrature is quite difficult
The curse of dimensionality
Toy problems Real problems Illustrate concepts Simple data, few
complications Often just use
calculus Usually few
parameters :o)
Solving real problems
Complex data, always loads of complications
Cannot do differentiation
Usually lots of parameters
:o(Lots of parameters prohibit grid searches
Simulations Assume you’re
interested in the lengths of elephant tusks
There will be some distribution of these
How do you try to estimate that distribution?
Why not?
Simulations Easy to sample
statistical distributions(more than elephants)
Can build up large samples quickly
Most popular method is MCMC
Others are Monte Carlo Importance sampling
If your sample is of size 10 000 or 100 000 then it IS the distribution you’re
sampling from
1.Sample the posterior
2.Calculate stuff you’re interested inlike the posterior mean (mean of sample), quantiles etc
Monte Carlo Capital of Monaco Casino Roulette wheel gave
inspiration to USscientists working onthe Manhattan project
Read Metropolis (1987)http://library.lanl.gov/cgi-bin/getfile?
00326866.pdf
When to use Monte Carlo If your posterior distribution is known, and you
can simulate from it, but it’s awkward to do anything else
Example: you have data (between sex difference in prey items per h within songlark nests) you assume are a random draw from a N(μ,σ²) distribution
xˉ=3.18 s=4.42 n=22 Prior: (μ,σ²) ~
NSIG(0,1/100,1/100,1/100)
Example Monte Carlo code library(geoR) ; set.seed(666) mu_0=0;kappa_0=1;nu_0=1;sigma2_0=100 ndraws=100000 prior_sigma2=rinvchisq(ndraws,nu_0,sigma2_0) prior_mu=rnorm(ndraws,mu_0,prior_sigma2/kappa_0) xbar=3.18 s=4.42 n=22
mu_n=(kappa_0/(kappa_0+n))*mu_0 + (n/(kappa_0+n))*xbar kappa_n = kappa_0+n nu_n = nu_0 + n sigma2_n = (nu_0*sigma2_0 + (n-1)*s*s
+ kappa_0*n*((xbar-mu_0)^2)/(kappa_0+n))/nu_n posterior_sigma2=rinvchisq(ndraws,nu_n,sigma2_n) posterior_mu=rnorm(ndraws,mu_n,posterior_sigma2/kappa_n) plot(posterior_mu[q],sqrt(posterior_sigma2[q]),
pch=16,col=rgb(0,0,0,0.1), xlab=expression(mu),ylab=expression(sigma))
Plot of posterior
Marginal posteriors summariser=function(x){print(paste( "Posterior mean is ",signif(mean(x),3), ", 95%CI is (",signif(quantile(x,0.025),3), ",",signif(quantile(x,0.975),3),")“,sep=""),quote=FALSE)}
summariser(posterior_mu)[1] Posterior mean is 3.04, 95%CI is (0.776,5.3)
summariser(posterior_sigma2)[1] Posterior mean is 24.7, 95%CI is (13.6,44.5)
summariser(sqrt(posterior_sigma2))[1] Posterior mean is 4.91, 95%CI is (3.69,6.67)
summariser(sqrt(posterior_sigma2)/posterior_mu)[1] Posterior mean is 1.37, 95%CI is (0.913,5.79)
Wondrousness of Bayesianism Once you have a sample from the posterior:
the world is your oyster! Want a confidence interval for the variance? Want a confidence interval for the standard
deviation? Want a confidence interval for the coefficient
of variation? How to do this in the frequentist paradigm?
See Koopmans et al (1964) Biometrika 51:25--32It’s not nice
Problem with Monte Carlo Usually an exact form for the posterior cannot
be got Two alternative simulation-based approaches
present themselves: Importance sampling Markov chain Monte Carlo
Importance Sampling The idea of importance sampling is that
instead of sampling from the actual posterior, you simulate from some other distribution and “correct” the sample by weighting it
If You want f(|data) You can simulate and evaluate the density of q() You can evaluate the density of f( |data) up to a
constant of proportionality you can use importance sampling.
Importance sampling recipe1. Generate a sample of theta from q()2. For each sample i, associate a weight
wi f(i|data)/q(i)3. Scale the ws such that wi =14. The posterior mean of any function g() of
interest isE[g()] = wi g(i)
5. You can get things like confidence intervals by working with the CDF of the marginals
Example: H1N1 Naïve version: Simulate (p,σ) from U(0,1) ², ie
q(p,σ)=1 over [0,1]² Weight by posterior, scale to sum to one
Code set.seed(666) logpost=function(p,sigma){ dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)}
NDRAWS=10000 sampled=list(p=runif(NDRAWS),sigma=runif(NDRAWS)) logposteriors=logpost(sampled$p,sampled$sigma) logposteriors=logposteriors-max(logposteriors) weights=exp(logposteriors) weights=weights/sum(weights) plot(sampled$p,sampled$sigma,cex=20*weights, pch=16,xlab='p', ylab=expression(sigma),xlim=0:1,ylim=0:1)
Naïve sample
Naïve sample
Better code set.seed(666) logpost=function(p,sigma){ dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)}
NDRAWS=10000 sampled=list(p=runif(NDRAWS),sigma=rbeta(NDRAWS,630,136))
logposteriors=logpost(sampled$p,sampled$sigma) logposteriors=logposteriors-max(logposteriors) weights=exp(logposteriors-dbeta(sampled$sigma,630,136,log=TRUE))
weights=weights/sum(weights)
Better sample
Better sample
Even better code set.seed(666) logpost=function(p,sigma){ dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)}
NDRAWS=10000 sampled=list(p=rbeta(NDRAWS,20,100),sigma=rbeta(NDRAWS,630,136))
logposteriors=logpost(sampled$p,sampled$sigma) logposteriors=logposteriors-max(logposteriors) weights=exp(logposteriors-dbeta(sampled$sigma,630,136,log=TRUE)-dbeta(sampled$p,20,100,log=TRUE))
weights=weights/sum(weights)
Better sample
Better sample
Problems with importance sampling1. Choice of sampling distribution q: an obvious
choice, the prior, leads to a simple weight function (the likelihood) but if the prior is over dispersed relative to the posterior, lots of wasted samples
2. Alternatives would be to do it iteratively, trying the prior first, then tuning q to be nearer to the area that looks like the posterior and iterating until a good distribution is found
3. Ideally, sample directly from the posterior: weights 1/m
4. Dealing with a weighted sample is a bit of a nuisance
An alternative Finding the posterior is simpler if the
computer will automatically move in the right direction even given a rubbish guess at q
Markov chain Monte Carlo is an ideal way to achieve this
Next section!!!