bayesian auxiliary variable models for binary and …...bayesian auxiliary variable models for...
TRANSCRIPT
![Page 1: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/1.jpg)
Bayesian auxiliary variable models for binary andmultinomial regression
(Bayesian Analysis, 2006)
Authors: Chris HolmesLeonhard Held
As interpreted by: Rebecca Ferrell
UW Statistics 572, Final Talk
May 27, 2014
1
![Page 2: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/2.jpg)
Holmes & Held set out to take regression models for categoricaloutcomes and ...
2
![Page 3: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/3.jpg)
Holmes & Held set out to take regression models for categoricaloutcomes and ...
2
![Page 4: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/4.jpg)
Outline
I IntroductionI Intro to probit and logistic regression in Bayesian contextI Quick overview of the Gibbs sampler
I Probit regressionI Review popular way of doing Bayesian probit regression from
1993 by Albert & Chib (A&C)I Compare Holmes & Held (H&H) probit approach with A&C
I Logistic regressionI H&H’s modifications to make ideas work for logistic regressionI Empirical performance of sampling strategies
I DiscussionI Extension to model uncertainty (no time!)I Extension to multiple outcomes (no time!)I Concluding thoughts
3
![Page 5: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/5.jpg)
Binary data setup
Classical framework with n binary responses yi and covariates xi :
yi ∼ Bernoulli(pi )
pi = g−1(ηi ), g−1 : R→ (0, 1)
ηi = xiβ, i = 1, . . . , n
xi = ( xi1 . . . xip )
β = ( β1 . . . βp )T
Put a prior on the unknown coefficients:
β ∼ π(β)
Inferential goal: compute posterior π(β | y) ∝ p(y | β)π(β)
4
![Page 6: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/6.jpg)
Why are binary regression models hard to Bayesify?
I No conjugate priors – will need to use MCMC samplingI (Max. likelihood needs iterative methods, asymptotics)
I Previous approaches involve sampling from an approximationto the posterior, need tuning, or otherwise rely ondata-dependent accept-reject steps (e.g. Gamerman, 1997;Chen & Dey, 1998)
I Adaptive-rejection sampling (Dellaportas & Smith, 1993) onlyupdates individual coefficients, resulting in poor mixing whencoefficients are correlated
Wishlist for automatic and efficient Bayesian inference:
I MCMC samples from exact posterior distribution
I No tuning of proposal distributions or low accept-reject rates
I Reasonable mixing even with correlated parameters
5
![Page 7: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/7.jpg)
Intro to Gibbs sampling
Setup: we don’t know posterior distribution π(β | y), but do knoweach conditional posterior π(βi | β−i , y).
Gibbs sampling iterates over conditional posteriors to produce asample from a Markov chain with stationary distribution π(β | y):
(1) Initialize β(0) = (β(0)1 , . . . , β
(0)p )
(2) Draw β(1)1 ∼ π(β1 | β(0)
−1, y)
(3) Draw β(1)2 ∼ π(β2 | β(1)1 ,β
(0)−{1,2}, y) . . .
(4) . . . Draw β(1)p ∼ π(βp | β(1)
−p, y)
(5) Done with sample observation β(1), now repeat (2) - (4)
Combine Gibbs steps into blocks: e.g. if distribution ofπ(β1, β2 | β−{1,2}, y) is available, can use in place of the individualconditionals in (2) and (3).
6
![Page 8: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/8.jpg)
Probit regression
A&C auxiliary variable approach: introduce unobserved auxiliaryvariables zi and re-write the probit model as
yi = 1[zi>0]
zi= xiβ + εi
εi ∼ N(0, 1)
β ∼ N(b, v)
Equivalent to probit model with yi ∼ Bernoulli (pi = Φ(xiβ)):
pi = P(zi > 0 | β) = P(xiβ + εi > 0 | β)
= 1− Φ(−xiβ) = Φ(xiβ)
7
![Page 9: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/9.jpg)
Probit the A&C way: iterative Gibbs steps
From joint posterior, obtain nice block conditional distributions ofthe parameters to iterate through in Gibbs steps:
π(β, z | y) ∝ p(y | z)p(z | β)π(β), so:
(1) π(β | z, y) ∝ p(z | β)π(β) = π(β)︸ ︷︷ ︸N(b,v)
∏ni=1 p(zi | β)︸ ︷︷ ︸
N(xiβ,1)
= multivariate normal
(2) π(z | β, y) ∝ p(y | z)p(z | β) =∏n
i=1 p(yi | zi )p(zi | β)
=n∏
i=1
(1[zi>0]1[yi=1] + 1[zi≤0]1[yi=0]
)φ(zi − xiβ)︸ ︷︷ ︸
π(zi |β,yi )∼=truncated normal
= product of truncated univariate normals
8
![Page 10: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/10.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
z
β
9
![Page 11: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/11.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
Iterative steps
z
β
●
9
![Page 12: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/12.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
Iterative steps
z
β
●
9
![Page 13: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/13.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
Iterative steps
z
β
● ●
9
![Page 14: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/14.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
Iterative steps
z
β
●
9
![Page 15: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/15.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
Iterative steps
z
β
●
●
9
![Page 16: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/16.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
Iterative steps
z
β
●
●
●●
●●●
●
●
●
9
![Page 17: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/17.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
z
β
9
![Page 18: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/18.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
Joint steps
z
β
●
9
![Page 19: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/19.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
Joint steps
z
β
●
9
![Page 20: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/20.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
Joint steps
z
β
●
●
9
![Page 21: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/21.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
Joint steps
z
β
●
9
![Page 22: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/22.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
Joint steps
z
β
●
●
9
![Page 23: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/23.jpg)
Mixing demonstration
−6 −4 −2 0 2 4 6
−3
−2
−1
01
23
Joint steps
z
β
●
●
●
●
●
●
●
●
●
●
9
![Page 24: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/24.jpg)
Smarter Gibbs sampling for probit?
H&H improve mixing by updating (β, z) jointly: simulate fromπ(z | y), then from π(β | z, y). With π(β) normal:
π(β, z | y)︸ ︷︷ ︸(known form)
= π(β | z, y)︸ ︷︷ ︸normal
π(z | y) implies
π(z | y) ∼ truncated multivariate normal
Truncated multivariate normal very hard to sample from directly,but univariate conditionals can be Gibbsed:
π(zi | z−i , y) ∼=
{N (mi , vi ) 1[zi>0] if yi = 1
N (mi , vi ) 1[zi≤0] if yi = 0
where mi and vi are leave-one-out functions of z−i , data, and prior
10
![Page 25: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/25.jpg)
Probit sampler comparison
Iterative updates from A&C:
I Iterate between block Gibbs updates π(z | β, y), π(β | z, y)
I π(z | β, y) ∼ n independent truncated normals with variance 1
I Blocking, independence need just two matrix updates percycle, should run quickly — implementation in H&H paperappears not to have exploited this for π(z | β, y)
Joint updates from H&H:
I Iterate through n univariate Gibbs updates π(zi | z−i , y), thenone block Gibbs update π(β | z, y)
I π(zi | z−i , y) ∼ truncated normal with variance vi > 1
I Can’t do the zi ’s all at once, need n + 1 matrix calculationsper cycle — but maybe bigger variance can offset slownessthrough better mixing?
11
![Page 26: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/26.jpg)
Efficient Bayesian inference
How might we see if a MCMC sampling algorithm is efficient?
(1) Time elapsed to run M iterations
(2) Effective sample size (ESS) for a single parameter:
ESS =M
1 + 2∑∞
k=1 ρ(k)
where ρ(k) = monotone sample autocorrelation at lag k (Kasset al, 1998)
(3) Average update distance: measure mixing with
1
M − 1
M−1∑i=1
‖β(i+1) − β(i)‖
Testing procedure: compute these metrics on each of 10 runs ofM =10,000 iterations per run (discard 1,000 burn-in)
12
![Page 27: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/27.jpg)
Test data
H&H analyze several stock datasets with binary outcomes:
I Pima Indian data (n = 532, p = 8): outcome is diabetes;covariates include BMI, age, number of pregnancies
I Australian credit data (n = 690, p = 14): outcome is creditapproval; 14 generic covariates
I Heart disease data (n = 270, p = 13): outcome is heartdisease; covariates include age, sex, blood pressure, chest paintype
I German credit data (n = 1000, p = 24): outcome is goodvs. bad credit risk; covariates include checking account status,purpose of loan, gender and marital status
13
![Page 28: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/28.jpg)
Probit performance: median values in 10 runs
14
![Page 29: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/29.jpg)
Probit performance: relative
Standardize for run time: ESS/second, jointESS/second, iterative and Dist./second, joint
Dist./second, iterative
15
![Page 30: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/30.jpg)
From probit to logit
Extend auxiliary variables to logistic regression with another levelto model differing variances of the error terms:
yi = 1[zi>0]
zi= xiβ + εi
εi ∼ N(0, λi )
λi= (2ψi )2, ψi ∼ KS (Kolmogorov-Smirnov)
β ∼ N(b, v)
Equivalent to logit model with yi ∼ Bernoulli (pi = expit(xiβ))because εi has a logistic distribution (Andrews & Mallows, 1974)and CDF of logistic is expit function:
pi = P(zi > 0 | β) = P(εi > −xiβ | β)
= 1− expit(−xiβ) = expit(xiβ)
16
![Page 31: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/31.jpg)
Logistic Gibbs
In similar fashion to probit model, simulate from posteriorconditionals:
π(β, z,λ | y) ∝ p(y | z)︸ ︷︷ ︸truncators
p(z | β,λ)︸ ︷︷ ︸indep. normal
p(λ)︸︷︷︸KS2
π(β)︸ ︷︷ ︸normal
(1) π(β | z,λ, y) ∝ p(z | β,λ)π(β) ∼= normal
(2) π(z | β,λ, y) ∝ p(y | z)p(z | β,λ) ∼= ind. truncated normals
(3) π(λ | β, z, y) ∝ p(z | β,λ)p(λ) ∼= ind. normal× KS2
Last conditional distribution is non-standard, but can be simulatedusing rejection sampling with Generalized Inverse Gaussianproposals and alternating series representation (“squeezing”)
17
![Page 32: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/32.jpg)
Logistic sampler comparisonIterative updates (not analyzed in paper):
I Iterate block Gibbs updates π(β | z,λ, y), π(z | β,λ, y),π(λ | β, z, y)
I Variance of π(z | β,λ, y) is λi with expected value π2/3
Joint updating scheme (z,λ):
I Iterate block Gibbs updatesπ(z,λ | β, y) = π(z | β, y)︸ ︷︷ ︸
trunc ind logistic
π(λ | β, z)︸ ︷︷ ︸normal×KS2
, then π(β | z,λ)︸ ︷︷ ︸normal
I Variance of π(z | β, y) is π2/3 – little gain by marginalizing?
Joint updating scheme (z,β):
I Iterate Gibbs updatesπ(z,β | λ, y) = π(z | λ, y)︸ ︷︷ ︸
trunc mv normal
π(β | z,λ)︸ ︷︷ ︸normal
, then π(λ | β, z)︸ ︷︷ ︸normal×KS2
I Note that π(z | λ, y) will require Gibbsing throughπ(zi | z−i ,λ, y), but variance is > λi
18
![Page 33: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/33.jpg)
Logit performance: median values in 10 runs
19
![Page 34: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/34.jpg)
Logit performance: relative
Standardize for run time: ESS/second, jointESS/second, iterative and Dist./second, joint
Dist./second, iterative
20
![Page 35: Bayesian auxiliary variable models for binary and …...Bayesian auxiliary variable models for binary and multinomial regression (Bayesian Analysis, 2006) Authors: Chris Holmes Leonhard](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed841cf0fa3e705ec0e228b/html5/thumbnails/35.jpg)
Concluding thoughts
I Latent variables can induce convenient conditionaldistributions to make MCMC sampling tractable for Bayesianmodels of binary data
I In these cases, all conditionals can be sampled from withoutMetropolis-Hastings
I Joint updating to increase variance in Gibbs sampling mightmake sense theoretically. . .
I . . . but only the scheme updating (z,λ) jointly in logisticregression was competitive with blocked iterative updates
I Don’t replace independent truncated univariate distributionswith a truncated multivariate normal!
I Auxiliary variable technique H&H introduced for logisticregression extends straightforwardly to Bayesian modeluncertainty situations, polytomous outcomes
21