st5219: bayesian hierarchical modelling lecture 2.1

22
PRIORS, NORMAL MODELS, COMPUTING POSTERIORS st5219: Bayesian hierarchical modelling lecture 2.1

Upload: henry-eaton

Post on 31-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: St5219: Bayesian hierarchical modelling lecture 2.1

PRIORS, NORMAL MODELS, COMPUTING POSTERIORS

st5219: Bayesian hierarchical modellinglecture 2.1

Page 2: St5219: Bayesian hierarchical modelling lecture 2.1

Plan for lecture

Priors: how to choose them, different types

The normal distribution in Bayesianism Tutorial 1: over to you Computing posteriors:

Monte Carlo Importance Sampling Markov chain Monte Carlo

Page 3: St5219: Bayesian hierarchical modelling lecture 2.1

What is random?

FREQUENTISM BAYESIANISM

Something with a long run frequency distribution

E.g. coin tosses Patients in a clinical

trial “Measurement”

errors?

Everything What you don’t know

is random Unobserved data,

parameters, unknown states, hypotheses

Observed data still arise from probability modelKnock on effects on how to estimate things

and assess hypotheses

Page 4: St5219: Bayesian hierarchical modelling lecture 2.1

This week: practical issues

CHOOSING A PRIOR DOING COMPUTATIONS

Very misunderstood “How did you choose

your priors?” Please never answer

“Oh, I just made them up”

For data analysis, you need strong rationale for choice of prior

(later)

Page 5: St5219: Bayesian hierarchical modelling lecture 2.1

An example to illustrate priors

Following infection: body creates antibodies

These target pathogen and remain in the blood

Antibodies can provide data on historic disease exposure

Page 6: St5219: Bayesian hierarchical modelling lecture 2.1

H1N1 in Singapore

Cook, Chen, Lim (2010) Emerg Inf Dis DOI:10.3201/EID.1610.100840

Page 7: St5219: Bayesian hierarchical modelling lecture 2.1

Serology of H1N1

Singapore study longitudinal

Chen et al (2010) J Am Med Assoc 303:1383--91

Page 8: St5219: Bayesian hierarchical modelling lecture 2.1

Measurements

Observation in (xij,2xij) for individual i, observation j

Define “seroconversion” to be a “four-fold” rise in antibody levels, i.e.

yi = 1 if xi2 ≥ 4xi1 and 0 otherwise Out of 727 participants with follow up, we

have 98 seroconversions

Q: what proportion were infected?

Page 9: St5219: Bayesian hierarchical modelling lecture 2.1

AIDS ≠ influenza A H1N1

Seroconversion “test” not perfect: something about 80%

Infection rate should be higher than seroconversion rate

Board

work

Page 10: St5219: Bayesian hierarchical modelling lecture 2.1

Bayesian approach

Need some priors Last time: “U(0,1) good way to represent

lack of knowledge of a probability” Before we collected the JAMA data, we

didn’t know what p would be, and a prior p~U(0,1) makes sense

But there are data out there on σ!

Page 11: St5219: Bayesian hierarchical modelling lecture 2.1

Other data

Zambon et al (2001) Arch Intern Med 161:2116--22

Page 12: St5219: Bayesian hierarchical modelling lecture 2.1

Other data

m = 791 y = 629

This can give you

a prior!!!

σ~Be(630,163)

Board

work

Page 13: St5219: Bayesian hierarchical modelling lecture 2.1

Kinds of priors

NON-INFORMATIVE INFORMATIVE

p~U(0,1) σ²~U(0,∞) μ~U(- ∞, ∞) β~N(0,1000²) Should give you no

information about that parameter except what is in the data

σ ~Be(630,163) μ ~N(15.2,6.8²) Lets you supplement

natural information content of the data when not enough information on that aspect

Can give information on other parameters indirectly

Page 14: St5219: Bayesian hierarchical modelling lecture 2.1

How to choose?Scenario 1. You are trying to reach an optimal decision in the presence of uncertainty: use whatever information you can, even if subjective, via informative priors

Scenario 2. You are trying to estimate parameters for a scientific data analysis (you cannot or don’t want to use external data): use non-informative prior

Scenario 3. You are trying to estimate parameters for a scientific data analysis (you have good external data): use non-informative priors for those bits you have no data for or in which you want your own data to speak for themselves; use informative priors elsewhere

Page 15: St5219: Bayesian hierarchical modelling lecture 2.1

Whence came that Be(630,163)?

Step 1: uniform prior for σ

Step 2: fit model to Zambon data

Step 3: posterior for that becomes prior for main analysis Boar

d work

Page 16: St5219: Bayesian hierarchical modelling lecture 2.1

Conjugacy

The beta distribution is conjugate to a binomial model, in that if you start with a beta prior and use it in a binomial model for p and x, you end with a beta posterior of known form

I.e. if p~Be(a,b) and x~Bin(n,p), p|x~Be(a+x,b+n-x)

Other conjugate priors exist forsimple models, e.g. ...

Board

work

Page 17: St5219: Bayesian hierarchical modelling lecture 2.1

Why is it ok to take posteriors and turn them into priors?

It’s the incremental nature of accumulated knowledge

Eg Zambon study:

Stage Prior Data (y,m)=

Posterior

0 Be(1,1) (0,0) Be(1,1)

1 Be(1,1) (1,1) Be(2,1)

2 Be(2,1) (1,2) Be(2,2)

3 Be(2,2) (1,3) Be(2,3)

4 Be(2,3) (2,4) Be(3,4)

Page 18: St5219: Bayesian hierarchical modelling lecture 2.1

Effective sample sizes

You can think of the parameters of the beta(a,b) as representing a best guess of the proportion, a/(a+b) a “sample size” that the prior is equivalent to

(a+b) This is an easy way to transform published

results into beta priors: take the point estimate (MLE, say) and the sample size and transform to get a and b.

(So a uniform prior is like adding one positive and one negative value to your data set: is this fair???)

Page 19: St5219: Bayesian hierarchical modelling lecture 2.1

Other converting methods Take a point estimate and CI and convert

to 2 parameters to represent your prior. Eg the infectious period is a popular

parameter in infectious disease epidemiology: the average time from infection to recovery

For no good reason, often assumed to be exponential with mean λ, say

Fraser et al (2009) Science 324:1557--61 suggest estimate ofgeneration period of 1.91 with95%CI (1.3,2.71)

Board

work

Page 20: St5219: Bayesian hierarchical modelling lecture 2.1

Two final thoughts on priors

I mentioned U(-∞, ∞) as a non-informative prior.What’s the density function for U(- ∞, ∞)?

Board

work

Page 21: St5219: Bayesian hierarchical modelling lecture 2.1

Two final thoughts on priors

A prior such as U(-∞, ∞) is called an improper prior as it does not have a proper density function.

Improper priors sometimes give proper posteriors: depending on the integral of the likelihood.

Not an improper prior is a proper one

Page 22: St5219: Bayesian hierarchical modelling lecture 2.1

Two final thoughts on priors

Just because a prior is flat in one representation does not mean it is flat in another

Eg for an exponential model (for survival analysis say)

Board

work