weakly informative priors - columbia universitygelman/presentations/wipnew2... · 2014-04-23 ·...

35
Weakly informative priors Andrew Gelman Department of Statistics and Department of Political Science Columbia University 23 Apr 2014 Collaborators (in order of appearance): Gary King, Frederic Bois, Aleks Jakulin, Vince Dorie, Sophia Rabe-Hesketh, Jingchen Liu, Yeojin Chung, Matt Schofield, Ben Goodrich, . . . Andrew Gelman Weakly informative priors

Upload: others

Post on 27-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Weakly informative priors

Andrew GelmanDepartment of Statistics and Department of Political Science

Columbia University

23 Apr 2014

Collaborators (in order of appearance):Gary King, Frederic Bois, Aleks Jakulin, Vince Dorie, SophiaRabe-Hesketh, Jingchen Liu, Yeojin Chung, Matt Schofield,

Ben Goodrich, . . .

Andrew Gelman Weakly informative priors

Page 2: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Weakly informative priors for . . .

I Two applied examples

1. Identifying a three-component mixture (1990)2. Population variation in toxicology (1996)

I Some default priors:

3. Logistic regression (2008)4. Hierarchical models (2006, 2011)5. Covariance matrices (2011)6. Mixture models (2011)

Andrew Gelman Weakly informative priors

Page 3: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

1. Identifying a three-component mixture

I Maximum likelihood estimate blows up

I Bayes posterior with flat prior blows up too!

Andrew Gelman Weakly informative priors

Page 4: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Priors!

I Mixture component 1: mean has N(−0.1, 0.12) prior,standard deviation has inverse-χ2

4(0.12) prior

I Mixture component 2: mean has N(+0.1, 0.12) prior,standard deviation has inverse-χ2

4(0.12) prior

I Mixture component 3: mean has N(0, 0.32) prior, standarddeviation has inverse-χ2

4(0.22) prior

I Three mixture parameters have a Dirichlet (19, 19, 4) prior

Andrew Gelman Weakly informative priors

Page 5: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Separating into Republicans, Democrats, and open seats

I Beyond “weakly informative” by using incumbency information

Andrew Gelman Weakly informative priors

Page 6: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

2. Weakly informative priors forpopulation variation in toxicology

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(16), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(16), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Andrew Gelman Weakly informative priors

Page 7: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Concepts

I Wip instead of noninformative prior or informative prior

I You’re using wips already!

I Prior as a placeholder

I Model as a placeholder

I “Life is what happens to you while you’re busy making otherplans”

I Full Bayes vs. Bayesian point estimates

I Hierarchical modeling as a unifying framework

Andrew Gelman Weakly informative priors

Page 8: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

3. Logistic regression

−6 −4 −2 0 2 4 60.0

0.2

0.4

0.6

0.8

1.0 y = logit−1(x)

x

logi

t−1(x

)

slope = 1/4

Andrew Gelman Weakly informative priors

Page 9: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

A clean example

−10 0 10 20

0.0

0.2

0.4

0.6

0.8

1.0

estimated Pr(y=1) = logit−1(−1.40 + 0.33 x)

x

y slope = 0.33/4

Andrew Gelman Weakly informative priors

Page 10: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

The problem of separation

−6 −4 −2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

slope = infinity?

x

y

Andrew Gelman Weakly informative priors

Page 11: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Separation is no joke!

glm (vote ~ female + black + income, family=binomial(link="logit"))

1960 1968

coef.est coef.se coef.est coef.se

(Intercept) -0.14 0.23 (Intercept) 0.47 0.24

female 0.24 0.14 female -0.01 0.15

black -1.03 0.36 black -3.64 0.59

income 0.03 0.06 income -0.03 0.07

1964 1972

coef.est coef.se coef.est coef.se

(Intercept) -1.15 0.22 (Intercept) 0.67 0.18

female -0.09 0.14 female -0.25 0.12

black -16.83 420.40 black -2.63 0.27

income 0.19 0.06 income 0.09 0.05

Andrew Gelman Weakly informative priors

Page 12: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Regularization in action!

Andrew Gelman Weakly informative priors

Page 13: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Weakly informative priors for logistic regression

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Andrew Gelman Weakly informative priors

Page 14: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Expected predictive loss, avg over a corpus of datasets

0 1 2 3 4 5

0.29

0.30

0.31

0.32

0.33

scale of prior

−lo

g te

st li

kelih

ood

(1.79)GLM

BBR(l)

df=2.0

df=4.0

df=8.0BBR(g)

df=1.0

df=0.5

Andrew Gelman Weakly informative priors

Page 15: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

What does this mean for YOU?

I Sparse data

I Big Data need Big Model

Andrew Gelman Weakly informative priors

Page 16: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Another example

Dose #deaths/#animals

−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

Andrew Gelman Weakly informative priors

Page 17: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Maximum likelihood and Bayesian estimates

Dose

Pro

babi

lity

of d

eath

0 10 20

0.0

0.5

1.0

glmbayesglm

I Which is truly conservative?

I The sociology of shrinkage

Andrew Gelman Weakly informative priors

Page 18: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

4. Inference for hierarchical variance parameters

0 5 10 15 20 25 30

Marginal likelihood for σα

σα

Andrew Gelman Weakly informative priors

Page 19: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Hierarchical variance parameters: 1. Full Bayes

I What is a good “weakly informative prior”?I log σα ∼ Uniform(−∞,∞)I σα ∼ Uniform(0,∞)I σα ∼ Inverse-gamma(0.001, 0.001)I σα ∼ Cauchy+(0,A)

I Polson and Scott (2011):“The half-Cauchy occupies a sensible ‘middle ground’ . . . itperforms very well near the origin, but does not lead to drasticcompromises in other parts of the parameter space.”

Andrew Gelman Weakly informative priors

Page 20: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Problems with inverse-gamma prior

I Inv-gamma prior cuts off at 0

Andrew Gelman Weakly informative priors

Page 21: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Problems with uniform prior

I Uniform prior doesn’t cut off the long tail

Andrew Gelman Weakly informative priors

Page 22: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Hierarchical variance parameters: 2. Point estimation

I [Estimate ± standard error] as approximation to full Bayes

I Point estimation as goal in itself

I Problems with boundary estimate, σ̂α = 0

Andrew Gelman Weakly informative priors

Page 23: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

The problem of boundary estimates: 8-schools example

Andrew Gelman Weakly informative priors

Page 24: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

The problem of boundary estimates: simulation

Andrew Gelman Weakly informative priors

Page 25: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

The problem of boundary estimates: simulation

I Box and Tiao (1973):

I All variance parameters want to become lost in the noise

I When does it hurt to estimate σ̂α = 0?

Andrew Gelman Weakly informative priors

Page 26: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Point estimate of a hierarchical variance parameter

I Desirable properties:I Point estimate should never be 0I But . . . no nonzero lower boundI Estimate should respect the likelihoodI Bias and variance should be as good as mleI Should be easy to computeI Should be Bayesian

Andrew Gelman Weakly informative priors

Page 27: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Boundary-avoiding point estimate!

0 5 10 15 20 25 30

Posterior for σα with Gamma(2,∞) prior

σα

likelihood

prior

posterior

Andrew Gelman Weakly informative priors

Page 28: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Boundary-avoiding weakly-informative point estimate

0 5 10 15 20 25 30

Posterior for σα with Gamma(2, 1/25) prior

σα

likelihood

prior

posterior

Andrew Gelman Weakly informative priors

Page 29: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Gamma (not inverse-gamma) prior on σα

0 5 10 15 20 25 30

Posterior for σαwith uniform prior

σα

likelihood

prior

0 5 10 15 20 25 30

Posterior for σα with Gamma(2,∞) prior

σα

likelihood

prior

posterior

0 5 10 15 20 25 30

Posterior for σα with Gamma(2, 1/25) prior

σα

likelihood

prior

posterior

Posterior mode is shifted at most one standard error from theboundary

Andrew Gelman Weakly informative priors

Page 30: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

5. Boundary estimate of group-level correlation

Point estimates of correlation ρ̂ from a hierarchicalvarying-intercept, varying-slope model:

Maximum marginal likelihood estimate

Freq

uenc

y (o

ut o

f 100

0 si

mul

atio

ns)

1 0 1

050

100 Sampling distribution of ^ (true value is = 0)

Hierarchical correlation parameter, 1 0 1

Prof

ile m

argi

nal l

ikelih

ood

for

100 draws of the marginal likelihood p(y| ), eachcorresponding to a different random draw of y

I Boundary-avoiding prior: ρ ∼ Beta(2, 2)

I Better statistical properties than mle

Andrew Gelman Weakly informative priors

Page 31: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Weakly informative priors for covariance matrix

I Boundary-avoiding priorI Wishart (not inverse-Wishart) priorI Generalization of gamma

I Full Bayes:I Scaled prior on Diag*Q*Diag

I Connection to prior on regression coefficients

Andrew Gelman Weakly informative priors

Page 32: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

6. Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative prior: use a hierarchical model for thescale parameters

Andrew Gelman Weakly informative priors

Page 33: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

General theory for wips

I (Unknown) true prior, ptrue(θ) = N(θ|µ0, σ20)

I Your subjective prior, psubj = N(θ|µ1, σ21)

I Weakly-informative prior, pwip = N(θ|µ1, (κσ1)2), with κ > 1

I Tradeoffs if κ is too low or too high

Andrew Gelman Weakly informative priors

Page 34: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

Specifying wips using nested models

I Thiago Martins, Daniel Simpson, Andrea Riebler, Havard Rue,Sigrunn Sorbye

I Large model has parameter θ

I Define p(θ) with reference to some base model such as θ = 0

I Penalized complexity priors

I User-defined scaling

Andrew Gelman Weakly informative priors

Page 35: Weakly informative priors - Columbia Universitygelman/presentations/wipnew2... · 2014-04-23 · Weakly informative priors Andrew Gelman Department of Statistics and Department of

What have we learned?

I Models need structure but not too much structure

I Conservatism in statistics

I Priors for full Bayes vs. priors for point estimation

I Formalize the losses in supplying too much or too little priorinfo

Andrew Gelman Weakly informative priors