priors, normal models , computing posteriors

29
Priors, Normal Models, Computing Posteriors st5219: Bayesian hierarchical modelling lecture 2.3

Upload: lana

Post on 23-Feb-2016

57 views

Category:

Documents


0 download

DESCRIPTION

Priors, Normal Models , Computing Posteriors. st5219 : Bayesian hierarchical modelling lecture 2.3. Why posteriors need to be computed. If you can find a conjugate prior for your data-parameter model, it’s very sensible to use that and to coerce your prior into that form - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Priors, Normal Models , Computing  Posteriors

Priors, Normal Models,Computing Posteriors

st5219: Bayesian hierarchical modellinglecture 2.3

Page 2: Priors, Normal Models , Computing  Posteriors

Why posteriors need to be computed If you can find a conjugate prior for your data-

parameter model, it’s very sensible to use that and to coerce your prior into that form

Eg you were going to have a normal prior but then noticed the gamma is conjugate in your model, then convert your normal prior into a gamma and use that

If not, you need to do some computation In the past: this was a big problem that

precluded Bayesian analysis outside of the ivory tower

Now: personal computers make this no problem lah

Page 3: Priors, Normal Models , Computing  Posteriors

What’s the challenge?

Frequentist Bayesian For ML estimation: need

to find maximum of a surface or curve

Approaches calculus deterministic search

algorithms using (i) gradient or (ii) heights only

stochastic algorithms, eg simulated annealing, cross entropy

For Bayesian estimation: need to find the whole posterior

Optimisation is fairly simple

Quadrature is quite difficult

Page 4: Priors, Normal Models , Computing  Posteriors

The curse of dimensionality

Toy problems Real problems Illustrate concepts Simple data, few

complications Often just use

calculus Usually few

parameters :o)

Solving real problems

Complex data, always loads of complications

Cannot do differentiation

Usually lots of parameters

:o(Lots of parameters prohibit grid searches

Page 5: Priors, Normal Models , Computing  Posteriors

Simulations Assume you’re

interested in the lengths of elephant tusks

There will be some distribution of these

How do you try to estimate that distribution?

Page 6: Priors, Normal Models , Computing  Posteriors

Why not?

Page 7: Priors, Normal Models , Computing  Posteriors

Simulations Easy to sample

statistical distributions(more than elephants)

Can build up large samples quickly

Most popular method is MCMC

Others are Monte Carlo Importance sampling

If your sample is of size 10 000 or 100 000 then it IS the distribution you’re

sampling from

1.Sample the posterior

2.Calculate stuff you’re interested inlike the posterior mean (mean of sample), quantiles etc

Page 8: Priors, Normal Models , Computing  Posteriors

Monte Carlo Capital of Monaco Casino Roulette wheel gave

inspiration to USscientists working onthe Manhattan project

Read Metropolis (1987)http://library.lanl.gov/cgi-bin/getfile?

00326866.pdf

Page 9: Priors, Normal Models , Computing  Posteriors

When to use Monte Carlo If your posterior distribution is known, and you

can simulate from it, but it’s awkward to do anything else

Example: you have data (between sex difference in prey items per h within songlark nests) you assume are a random draw from a N(μ,σ²) distribution

xˉ=3.18 s=4.42 n=22 Prior: (μ,σ²) ~

NSIG(0,1/100,1/100,1/100)

Page 10: Priors, Normal Models , Computing  Posteriors

Example Monte Carlo code library(geoR) ; set.seed(666) mu_0=0;kappa_0=1;nu_0=1;sigma2_0=100 ndraws=100000 prior_sigma2=rinvchisq(ndraws,nu_0,sigma2_0) prior_mu=rnorm(ndraws,mu_0,prior_sigma2/kappa_0) xbar=3.18 s=4.42 n=22

mu_n=(kappa_0/(kappa_0+n))*mu_0 + (n/(kappa_0+n))*xbar kappa_n = kappa_0+n nu_n = nu_0 + n sigma2_n = (nu_0*sigma2_0 + (n-1)*s*s

+ kappa_0*n*((xbar-mu_0)^2)/(kappa_0+n))/nu_n posterior_sigma2=rinvchisq(ndraws,nu_n,sigma2_n) posterior_mu=rnorm(ndraws,mu_n,posterior_sigma2/kappa_n) plot(posterior_mu[q],sqrt(posterior_sigma2[q]),

pch=16,col=rgb(0,0,0,0.1),  xlab=expression(mu),ylab=expression(sigma))

Page 11: Priors, Normal Models , Computing  Posteriors

Plot of posterior

Page 12: Priors, Normal Models , Computing  Posteriors

Marginal posteriors summariser=function(x){print(paste( "Posterior mean is ",signif(mean(x),3), ", 95%CI is (",signif(quantile(x,0.025),3), ",",signif(quantile(x,0.975),3),")“,sep=""),quote=FALSE)}

summariser(posterior_mu)[1] Posterior mean is 3.04, 95%CI is (0.776,5.3)

summariser(posterior_sigma2)[1] Posterior mean is 24.7, 95%CI is (13.6,44.5)

summariser(sqrt(posterior_sigma2))[1] Posterior mean is 4.91, 95%CI is (3.69,6.67)

summariser(sqrt(posterior_sigma2)/posterior_mu)[1] Posterior mean is 1.37, 95%CI is (0.913,5.79)

Page 13: Priors, Normal Models , Computing  Posteriors

Wondrousness of Bayesianism Once you have a sample from the posterior:

the world is your oyster! Want a confidence interval for the variance? Want a confidence interval for the standard

deviation? Want a confidence interval for the coefficient

of variation? How to do this in the frequentist paradigm?

See Koopmans et al (1964) Biometrika 51:25--32It’s not nice

Page 14: Priors, Normal Models , Computing  Posteriors

Problem with Monte Carlo Usually an exact form for the posterior cannot

be got Two alternative simulation-based approaches

present themselves: Importance sampling Markov chain Monte Carlo

Page 15: Priors, Normal Models , Computing  Posteriors

Importance Sampling The idea of importance sampling is that

instead of sampling from the actual posterior, you simulate from some other distribution and “correct” the sample by weighting it

If You want f(|data) You can simulate and evaluate the density of q() You can evaluate the density of f( |data) up to a

constant of proportionality you can use importance sampling.

Page 16: Priors, Normal Models , Computing  Posteriors
Page 17: Priors, Normal Models , Computing  Posteriors

Importance sampling recipe1. Generate a sample of theta from q()2. For each sample i, associate a weight

wi f(i|data)/q(i)3. Scale the ws such that wi =14. The posterior mean of any function g() of

interest isE[g()] = wi g(i)

5. You can get things like confidence intervals by working with the CDF of the marginals

Page 18: Priors, Normal Models , Computing  Posteriors

Example: H1N1 Naïve version: Simulate (p,σ) from U(0,1) ², ie

q(p,σ)=1 over [0,1]² Weight by posterior, scale to sum to one

Page 19: Priors, Normal Models , Computing  Posteriors

Code set.seed(666) logpost=function(p,sigma){ dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)}

NDRAWS=10000 sampled=list(p=runif(NDRAWS),sigma=runif(NDRAWS)) logposteriors=logpost(sampled$p,sampled$sigma) logposteriors=logposteriors-max(logposteriors) weights=exp(logposteriors) weights=weights/sum(weights) plot(sampled$p,sampled$sigma,cex=20*weights, pch=16,xlab='p', ylab=expression(sigma),xlim=0:1,ylim=0:1)

Page 20: Priors, Normal Models , Computing  Posteriors

Naïve sample

Page 21: Priors, Normal Models , Computing  Posteriors

Naïve sample

Page 22: Priors, Normal Models , Computing  Posteriors

Better code set.seed(666) logpost=function(p,sigma){ dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)}

NDRAWS=10000 sampled=list(p=runif(NDRAWS),sigma=rbeta(NDRAWS,630,136))

logposteriors=logpost(sampled$p,sampled$sigma) logposteriors=logposteriors-max(logposteriors) weights=exp(logposteriors-dbeta(sampled$sigma,630,136,log=TRUE))

weights=weights/sum(weights)

Page 23: Priors, Normal Models , Computing  Posteriors

Better sample

Page 24: Priors, Normal Models , Computing  Posteriors

Better sample

Page 25: Priors, Normal Models , Computing  Posteriors

Even better code set.seed(666) logpost=function(p,sigma){ dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)}

NDRAWS=10000 sampled=list(p=rbeta(NDRAWS,20,100),sigma=rbeta(NDRAWS,630,136))

logposteriors=logpost(sampled$p,sampled$sigma) logposteriors=logposteriors-max(logposteriors) weights=exp(logposteriors-dbeta(sampled$sigma,630,136,log=TRUE)-dbeta(sampled$p,20,100,log=TRUE))

weights=weights/sum(weights)

Page 26: Priors, Normal Models , Computing  Posteriors

Better sample

Page 27: Priors, Normal Models , Computing  Posteriors

Better sample

Page 28: Priors, Normal Models , Computing  Posteriors

Problems with importance sampling1. Choice of sampling distribution q: an obvious

choice, the prior, leads to a simple weight function (the likelihood) but if the prior is over dispersed relative to the posterior, lots of wasted samples

2. Alternatives would be to do it iteratively, trying the prior first, then tuning q to be nearer to the area that looks like the posterior and iterating until a good distribution is found

3. Ideally, sample directly from the posterior: weights 1/m

4. Dealing with a weighted sample is a bit of a nuisance

Page 29: Priors, Normal Models , Computing  Posteriors

An alternative Finding the posterior is simpler if the

computer will automatically move in the right direction even given a rubbish guess at q

Markov chain Monte Carlo is an ideal way to achieve this

Next section!!!