1 introduction to mcmc methods, the gibbs sampler, and data augmentation

61
1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

Upload: latrell-kilbourn

Post on 31-Mar-2015

227 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

1

Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

Page 2: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

2

Simulation Methods

*

problem:

Bayes theorem allows us to write down unnormalized density which proportional to the posterior for virtually any model,

construct a simulator

dim() > 100

Solutions:

1. direct iid simulator (use asymptotics)

2. Importance sampling

3. MCMC

Page 3: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

3

MCMC Methods

“solution”

exploit special structure of the problem to:

formulate a Markov Chain on parameter space with π as long-run or “equilibrium distribution.

simulate from MC, starting from some point

Use sub-sequence of draws as simulator

Page 4: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

4

MCMC Methods

Start from 0

construct a sequence of r.v. 1 r, , ,

r 1 r r 1 r 1 r

r 1 r r

, , Markovian Property

~ F

under some conditions on F,

r 0 "converges" to

Page 5: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

5

Ergodicity

r1R r

R

rA

A

mi

estimate E g g d ;

lim ergodic propertyˆ

1ˆi) p Pr A d ; p IR

ii) g

This means that we can estimate any aspect of the joint distribution using sequences of draws from MC.

Denote the sequence of draws as:

1 r, , ,

Page 6: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

6

Practical Considerations

Effect of initial conditions-

“burn-in” -- run for B iterations, discard and use only last R-B

Non-iid Simulator-

Is this a problem?

no: LLN works for dep sequences

yes: simulation error larger than iid seq

Method for Constructing the Chain!

Page 7: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

7

Asymptotics

Any simulation-based method relies on asymptotics for justification.

We have made fun of asymptotics for inference problems. Classical Econometrics – “approximate answer” to the wrong question.

We are not using asymptotics to approximate for a fixed sample size.

The sample size is large and under our control!

Page 8: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

8

Simulating from Bivariate Normal

21 2 1 1

1~ N 0,

1

~ N 0,1 and ~ N , 1

In R, we would use the Cholesky root to simulate:

1 1

22 1 2

~ z

z 1 z

2

Lz ; z ~ N 0,I

1 0L

1

Page 9: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

9

Gibbs SamplerA joint distribution can always be factored into a marginal × a conditional. There is also a sense in which the conditional distributions fully summarize the joint.

2 22 1 1 1 2 2~ N , 1 ~ N , 1

A simulator: Start at point

00 1

02

1 0 22 1

1 1 21 2

~ N ,1

~ N ,1

Draw in two steps: 1

Note: this is a Markov Chain. Current point entirely summarizes past.

Page 10: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

10

Gibbs Sampler

A simulator:

Start at point

00 1

02

1 0 22 1

1 1 21 2

~ N ,1

~ N ,1

Draw in two steps: 1

repeat!

Page 11: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

11

Hammersley-Clifford TheoremExistence of GS for bivariate distribution implies that the complete set of conditionals summarize all info in the joint.

H-C Construction:

2 11 2

2 12

1 2

pp ,

pd

p

Why?

Page 12: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

12

Hammersley-Clifford Theorem

1 2

2 1 1 22 2 2

1 2 1 11 2

2

p ,

p p p 1d d d

p , p pp

p

2 1 2 11 2 1 2

2 12

11 2

p pp , p ,

1pd pp

Page 13: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

13

rbiNormGibbs

theta1

the

ta2

-3 -2 -1 0 1 2 3

-3-2

-10

12

3

Gibbs Sampler with Intermediate Moves: Rho = 0.9

B

theta1

the

ta2

-3 -2 -1 0 1 2 3

-3-2

-10

12

3

Gibbs Sampler with Intermediate Moves: Rho = 0.9

B

theta1

the

ta2

-3 -2 -1 0 1 2 3

-3-2

-10

12

3

Gibbs Sampler with Intermediate Moves: Rho = 0.9

B

theta1

the

ta2

-3 -2 -1 0 1 2 3

-3-2

-10

12

3

Gibbs Sampler with Intermediate Moves: Rho = 0.9

B

Page 14: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

14

Intuition for dependence

This is a Markov Chain!

Average step “size” :

21

Page 15: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

15

rbiNormGibbs

0 5 10 15 20

-0.2

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series 1

0 5 10 15 20

-0.2

0.2

0.4

0.6

0.8

1.0

Lag

Series 1 & Series 2

-20 -15 -10 -5 0

-0.2

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series 2 & Series 1

0 5 10 15 20

-0.2

0.2

0.4

0.6

0.8

1.0

Lag

Series 2

non-iid draws!

Who cares?

Loss of Efficiency

Page 16: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

16

Ergodicity

0 5 10 15 20 25 30

0.0

0.4

0.8

Lag

ACF of Theta1

0 200 400 600 800 1000

0.2

0.4

0.6

0.8

Convergence of Sample Correlation

R

r i i11 1 2 2r i 1

r 2 2r ri i1 11 1 2 2r ri 1 i 1

ˆ

iid draws

Gibbs Sampler draws

Page 17: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

17

Relative Numerical EfficiencyDraws from the Gibbs Sampler come from a stationary yet autocorrelated process. We can compute the sampling error of averages of these draws.

Assume we wish to estimate

We would use:

E g

r r1 1R Rr r

g gˆ

2

1 1 2 1 R

1R 2 1 2 R

var g cov g ,g cov g ,gvar ˆ

cov g ,g var g var g

Page 18: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

18

Relative Numerical Efficiency

R 1 R jj1 RRj

var g var gvar 1 2ˆ

Rf

R

Ratio of variance to variance if iid.

m

m 1 jR jm 1

j 1

f 1 2 ˆ

Here we truncate the lag at m. Choice of m?

numEff in bayesm

Page 19: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

19

General Gibbs sampler

’ = (1, 2, …, p)

Sample from: 1,1 = f1(1| 0,2, …, 0,p) 1,2 = f2(2| 1,1, 0,3, …, 0,p)

1,p = fp(p| 1,1, …, 1,p-1)

to obtain the first iterate

where fi = () / () d-i

-i = (1,2, …,i-1, i+1, …,p)

“Blocking”

Page 20: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

20

Different prior for Bayes Regression

Suppose the prior for β does not depend on σ2: p(,2) = p() p(2). That is, prior belief about β does not depend on 2. Why should views about depend on scale of error terms? Only true for data-based prior information NOT for subject matter information!

1p( ) exp ( )' A( )

2

0 212 2 2 0 0

2

sp( ) ( ) exp

2

Page 21: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

21

Different posterior

The posterior for σ2 now depends on β:

1

2 2 1

2 1 2

1

22 1 1

1 02

22 0 01

0

[ |y,X, ] N( ,( X 'X A) )

ˆwith ( X 'X A) ( X 'X A )

ˆ (X 'X) X 'y

s[ | y,X, ] with n

s (y X ) (y X )s

n

Depends on

Page 22: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

22

Different simulation strategy

3) Repeat

2) Draw [2 | y, X, ] (conditional on !)

1) Draw [ | y, X, 2]

Scheme: [y|X, , 2] [] [2]

Page 23: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

23

runiregGibbsruniregGibbs=function(Data,Prior,Mcmc){# # Purpose:# perform Gibbs iterations for Univ Regression Model using# prior with beta, sigma-sq indep# # Arguments:# Data -- list of data # y,X# Prior -- list of prior hyperparameters# betabar,A prior mean, prior precision# nu, ssq prior on sigmasq# Mcmc -- list of MCMC parms# sigmasq=initial value for sigmasq# R number of draws# keep -- thinning parameter# # Output: # list of beta, sigmasq#

Page 24: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

24

runiregGibbs (continued)# Model:# y = Xbeta + e e ~N(0,sigmasq)# y is n x 1# X is n x k# beta is k x 1 vector of coefficients## Priors: beta ~ N(betabar,A^-1)# sigmasq ~ (nu*ssq)/chisq_nu# ## check arguments#.sigmasqdraw=double(floor(Mcmc$R/keep))betadraw=matrix(double(floor(Mcmc$R*nvar/keep)),ncol=nvar)XpX=crossprod(X)Xpy=crossprod(X,y)sigmasq=as.vector(sigmasq)

itime=proc.time()[3]cat("MCMC Iteration (est time to end - min) ",fill=TRUE)flush()

Page 25: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

25

runiregGibbs (continued)

for (rep in 1:Mcmc$R){## first draw beta | sigmasq# IR=backsolve(chol(XpX/sigmasq+A),diag(nvar)) btilde=crossprod(t(IR))%*%(Xpy/sigmasq+A%*%betabar) beta = btilde + IR%*%rnorm(nvar)## now draw sigmasq | beta# res=y-X%*%beta s=t(res)%*%res sigmasq=(nu*ssq + s)/rchisq(1,nu+nobs) sigmasq=as.vector(sigmasq)

Page 26: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

26

runiregGibbs (continued)

##print time to completion and draw # every 100th draw# if(rep%%100 == 0) {ctime=proc.time()[3] timetoend=((ctime-itime)/rep)*(R-rep) cat(" ",rep," (",round(timetoend/60,1),")",fill=TRUE) flush()}

if(rep%%keep == 0) {mkeep=rep/keep; betadraw[mkeep,]=beta; sigmasqdraw[mkeep]=sigmasq}}ctime = proc.time()[3]cat(' Total Time Elapsed: ',round((ctime-itime)/60,2),'\n')

list(betadraw=betadraw,sigmasqdraw=sigmasqdraw)}

Page 27: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

27

R session

set.seed(66)n=100X=cbind(rep(1,n),runif(n),runif(n),runif(n))beta=c(1,2,3,4)sigsq=1.0y=X%*%beta+rnorm(n,sd=sqrt(sigsq))

A=diag(c(.05,.05,.05,.05))betabar=c(0,0,0,0)nu=3ssq=1.0

R=1000

Data=list(y=y,X=X)Prior=list(A=A,betabar=betabar,nu=nu,ssq=ssq)Mcmc=list(R=R,keep=1)

out=runiregGibbs(Data=Data,Prior=Prior,Mcmc=Mcmc)

Page 28: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

28

R session (continued)

Starting Gibbs Sampler for Univariate Regression Model with 100 observations Prior Parms: betabar[1] 0 0 0 0A [,1] [,2] [,3] [,4][1,] 0.05 0.00 0.00 0.00[2,] 0.00 0.05 0.00 0.00[3,] 0.00 0.00 0.05 0.00[4,] 0.00 0.00 0.00 0.05nu = 3 ssq= 1 MCMC parms: R= 1000 keep= 1

Page 29: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

29

R session (continued)

MCMC Iteration (est time to end - min) 100 ( 0 ) 200 ( 0 ) 300 ( 0 ) 400 ( 0 ) 500 ( 0 ) 600 ( 0 ) 700 ( 0 ) 800 ( 0 ) 900 ( 0 ) 1000 ( 0 ) Total Time Elapsed: 0.01

Page 30: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

30

0 200 400 600 800 1000

01

23

45

6

Draws of Beta

ou

t$b

eta

dra

w

Page 31: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

31

0 200 400 600 800 1000

0.8

1.0

1.2

1.4

1.6

Draws of Sigma Squared

ou

t$si

gm

asq

dra

w

Page 32: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

32

Data Augmentation

GS is well-suited for linear models. Extends to conditionally conjugate models, e.g. SUR.

Data Augmentation extends class of models which can be analyzed via GS.

origins: missing data

traditional approach:

obs missingp y y y ,y

obs obs

obs miss miss

p y p y p

p y ,y dy p

Page 33: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

33

Data Augmentation

Solution: regard ymiss as what it is: an unobservable! Tanner and Wong (87)

GS:

miss obs obs miss

miss obs obs

p ,y y p y ,y p

p y y , p y p

miss obs

miss obs

y ,y

y ,y

complete data posterior!

miss" "yunder “ignorable”

missing data assumption

Page 34: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

34

Data Augmentation-Probit Ex

Consider the Binary Probit model:

ii

'i i i i

1 if z 0y

0 otherwise

z x ~ N 0,1

Z is a latent, unobserved variable

0

p y x, p y,z x, dz p y z,x, p z x, dz

f z p z x, dz

Pr y 1 p z x, dz Pr x ' x '

Pr y 0 x '

Integrate out z to obtain likelihood

Page 35: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

35

Data augmentation

All unobservables are objects of inference, including parameters and latent variables. Augment β with z.

For Probit, we desire the joint posterior of latents and β.

p(z, |y) p z ,y p z,y p z ,y p z

Conditional independence of y,β.

Gibbs Sampler:

z ,y

z

Page 36: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

36

Probit conditional distributions

[z|β, y]

This is a truncated normal distribution:

if y = 1, truncation is from below at 0 (z > 0, z=x’β + , > -x’β)

if y = 0, truncation is from above

How do we make these draws? We use the inverse CDF method.

Page 37: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

37

Inverse cdf

If X ~ F U ~ Uniform[0,1] Then F-1(U) = X

0

1

x

Let G be the cdf of X truncated to [a,b]

F(x) F(a)

G(x)F(b) F(a)

Page 38: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

38

Inverse cdf

what is G-1? solve G(x) = y

F(x) F(a)

yF(b) F(a)

F(x) y(F(b) F(a)) F(a)

1x F (y(F(b) F(a)) F(a))

1

Draw u ~ U(0,1)

x F (u(F(b) F(a)) F(a))

Page 39: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

39

rtrun

rtrun=function(mu,sigma,a,b){# function to draw from univariate truncated norm# a is vector of lower bounds for truncation# b is vector of upper bounds for truncation#FA=pnorm(((a-mu)/sigma))FB=pnorm(((b-mu)/sigma))mu+sigma*qnorm(runif(length(mu))*(FB-FA)+FA)}

Page 40: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

40

Probit conditional distributions

[|z,X] [z|X,] []

1

1

1

[ |y,X] Normal( ,(X 'X A) )

ˆ(X 'X A) (X 'X A )

ˆ (X 'X) X 'z

1 1[ | ,A ]~N( ,A )

Standard Bayes regression with unit error variance!

Page 41: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

41

rbprobitGibbsrbprobitGibbs=function(Data,Prior,Mcmc){## purpose: # draw from posterior for binary probit using Gibbs Sampler## Arguments:# Data - list of X,y # X is nobs x nvar, y is nobs vector of 0,1# Prior - list of A, betabar# A is nvar x nvar prior preci matrix# betabar is nvar x 1 prior mean# Mcmc# R is number of draws# keep is thinning parameter## Output:# list of betadraws# Model: y = 1 if w=Xbeta + e > 0 e ~N(0,1)## Prior: beta ~ N(betabar,A^-1)

Page 42: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

42

rbprobitGibbs (continued)# define functions needed#breg1=function(root,X,y,Abetabar) {# Purpose: draw from posterior for linear regression, sigmasq=1.0# # Arguments:# root is chol((X'X+A)^-1)# Abetabar = A*betabar## Output: draw from posterior# # Model: y = Xbeta + e e ~ N(0,I)## Prior: beta ~ N(betabar,A^-1)#cov=crossprod(root,root)betatilde=cov%*%(crossprod(X,y)+Abetabar)betatilde+t(root)%*%rnorm(length(betatilde))}.. (error checking part of code).

Page 43: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

43

rbprobitGibbs (continued)

betadraw=matrix(double(floor(R/keep)*nvar),ncol=nvar)

beta=c(rep(0,nvar))

sigma=c(rep(1,nrow(X)))

root=chol(chol2inv(chol((crossprod(X,X)+A))))

Abetabar=crossprod(A,betabar)

a=ifelse(y == 0,-100, 0)

b=ifelse(y == 0, 0, 100)#

# start main iteration loop

#

itime=proc.time()[3]

cat("MCMC Iteration (est time to end - min) ",fill=TRUE)

flush()

if y = 0, truncate to (-100,0)

if y = 1, truncate to (0, 100)

Page 44: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

44

rbprobitGibbs (continued)

for (rep in 1:R)

{

mu=X%*%beta

z=rtrun(mu,sigma,a,b)

beta=breg1(root,X,z,Abetabar)

}

Page 45: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

45

Binary probit example

## rbprobitGibbs example

##

set.seed(66)

simbprobit=

function(X,beta) {

## function to simulate from binary probit including x variable

y=ifelse((X%*%beta+rnorm(nrow(X)))<0,0,1)

list(X=X,y=y,beta=beta)

}

Page 46: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

46

Binary probit example

nobs=100X=cbind(rep(1,nobs),runif(nobs),runif(nobs),runif(nobs))beta=c(-2,-1,1,2)nvar=ncol(X)simout=simbprobit(X,beta)

Data=list(X=simout$X,y=simout$y)Mcmc=list(R=2000,keep=1)

out=rbprobitGibbs(Data=Data,Mcmc=Mcmc)

cat(" Betadraws ",fill=TRUE)mat=apply(out$betadraw,2,quantile,probs=c(.01,.05,.5,.95,.99))mat=rbind(beta,mat); rownames(mat)[1]="beta"; print(mat)

Page 47: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

47

0 500 1000 1500 2000

-4-2

02

46

Probit Beta Draws

ou

t$b

eta

dra

w

Page 48: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

48

Summary statistics

Betadraws [,1] [,2] [,3] [,4]

beta -2.000000 -1.00000000 1.00000000 2.000000

1% -4.113488 -2.69028853 -0.08326063 1.392206

5% -3.588499 -2.19816304 0.20862118 1.867192

50% -2.504669 -1.04634198 1.17242924 2.946999

95% -1.556600 -0.06133085 2.08300392 4.166941

99% -1.233392 0.34910141 2.43453863 4.680425

Page 49: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

49

Binary probit example

Pr y 1x, x '

Probability | x=(0,.1,0)

0.45 0.50 0.55 0.60

05

1015

20Probability | x=(0,4,0)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

Example from BSM:

Page 50: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

50

Mixtures of normals

i ind ind

i

y ~ N ,

ind ~ Multinomial( pvec)

A general flexible model or a non-parametric method of density approximation?

indi is a augmented variable that points to whichnormal distribution is associated with observation i.ind is an indicator variable that classifies observations one of the length(pvec) components.

i k k kky ~ N ,

Page 51: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

51

Model hierarchy

pvec indk

k

yi

Model [pvec][ind|pvec][k|ind][k|ind,k][Y|k,k]

Conditionals [pvec|ind,priors][ind|pvec,{k,k},y][{k,k}|ind,y,priors]

k

1k k

Priors :

pvec ~ Dirichlet

~ IW ,V

~ N , a

k 1, ,K

Page 52: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

52

Gibbs Sampler for Mixture of Normals

Conditionals [pvec|ind,priors]

[ind|pvec,{k,k},y]

n

k k k k ii 1

pvec ~ Dirichlet

n ; n I ind k

i i i,1 i,K

i k ki,k k

i k kk

ind ~ multinomial ; ' ,...,

y ,pvec

y ,

φ( ) is the multivariate normal density

Page 53: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

53

Gibbs Sampler for Mixtures of Normals

[{k,k}|ind,y,priors]

k

'1

'k k i k

'n

u

Y U; U ; u ~ N 0,

u

k

k k k

1k k k k kn a

Y , ,V ~ IW n ,V S

Y , , ,a ~ N ,

given ind (classification), this is just a MRM!

'* ' * 'k k k k

'

k k

1

k k k k

'k k k

S Y Y

a

n a n Y a

Y Y / n

Page 54: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

54

Identification for Mixtures of Normals

Likelihood for mixture of K normals can have up to K! modes of equal height!

So-called “label” switching problem: I can permute the labels of each component without changing likelihood.

Implies the Gibbs Sampler may not navigate all modes! Who cares?

Joint density or any function of this is identified!

Page 55: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

55

Label-Switching Example

Consider a mixture of two univariate normals that are not very “separated” and with a relatively small amount of data. Density of y is unimodal with mode a 1.5

y .5N 1,1 .5N 2,1

1

1

1

1

1

1

111

11

1

1

11

11

1

11

1

1

1

1

1111

1

1

1

1

1

1

11

1111

1

11111

1

11

1

1

1

1111

11

1

1

1

1

1

11

1

111

1

1

1

1

1

1

11

1

1

1

1

1

1

1111

1

1

1

1

1

1

11

1

1

1

1

1

0 20 40 60 80 100

0.5

1.0

1.5

2.0

2.5

mud

raw

2

2

22

22

22

22

2

2

2

2222

2

2

2

22

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

222

2

2

2

222

2

2

22222

2

2

22

2

2

22

2

22

2

2

222

2

2

22

2

22

2

22

2

2

2

22

2

2

Label-switches

Page 56: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

56

Label-Switching Example

Density of y is identified. Using Gibbs Sampler, we get R draws from posterior of joint density

-1 0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

1 1 2 2p y p y , 1 p y ,

Page 57: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

57

Identification for Mixtures of Normals

We use unconstrained Gibbs Sampler (rnmixGibbs).Others advocate restrictions or post-processing of draws to identify components

Pros: superior mixingfocuses attention on identified quantities

Cons:can’t make inferences about component parmsmust summarize posterior of joint density!

Page 58: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

58

Identification for Mixtures of Normals

In practice, what is the implication of label-switching?

we can’t use:

but we can use

r1k kR r

r1k kR r

E

E

r r r1k k kR r k

E p y p y ,

Page 59: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

59

Multivariate Mix of Norms Ex

1 2 1 3 1

k

1

2

; 2 ; 3 ;3

4

5

1 .5 .5

.5 1

.5

.5 .5 1

1/ 2

pvec 1/ 3

1/ 6 0 100 200 300 400

r

Nor

mal

Com

pone

nt

12

34

56

78

9

Page 60: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

60

Multivariate Mix of Norms Ex

R R K

r r r r r r rk k k k k

r 1 r 1 k 1

1 1ˆ ˆp y p y , ,p p y ,R R

0 5 10 15

0.00

0.10

0.20

0.30

draw 100

Page 61: 1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

61

Bivariate Distributions and Marginals

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

dens

ity -1 0 1 2 3 4 5

02

46

8

True Bivariate Marginal

-1 0 1 2 3 4 5

02

46

8Posterior Mean of Bivariate Marginal