double dirichlet process mixtures

62
1

Upload: morey

Post on 05-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Double Dirichlet Process Mixtures. Sanjib Basu. Northern Illinois University and Rush University Medical Center. Siddhartha Chib. Washington University, St. Louis. Dirichlet process mixtures are active research areas Dirichlet mixtures are it! - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Double Dirichlet Process Mixtures

1

Page 2: Double Dirichlet Process Mixtures

2

Dirichlet process mixtures are active research areas Dirichlet mixtures are it!

The flexibility of DPM models supported its huge popularity in wide variety of areas of application.

DPM models are general and can be argued to have less structure.

Double Dirichlet Process Mixtures add a degree of structure, possibly at the expense of some degree of flexibility, but possibly with better interpretability in some cases

We discuss applications (and limitations) of these semiparametric double mixtures

We compare fit-prediction duality with competing models

Page 3: Double Dirichlet Process Mixtures

3

Other DP extensions

Double Dirichlet process mixtures are a subclass of dependent Dirichlet Process mixtures (MacEachern 1999,……)

Double DP mixture are different from Hierarchical Dirichlet Processes (The et al. 2006 )

Double DPM is simply independent DPMS

Page 4: Double Dirichlet Process Mixtures

4

Motivating Example 1

Luminex measurements on two biomarker proteins from n=156 Patients IL-1β protein C-reactive protein

The biological effects of these two proteins are thought to be not (totally) overlapping.

niNy

yi

i

i

i

i ,...,1,,~2

12

2

1

Page 5: Double Dirichlet Process Mixtures

5

Two Biomarkers (y1 and y2)

Usual DP mixture of normals (Ferguson 1983,…..)

niNy

yi

i

i

i

i ,...,1,,~2

12

2

1

)WishartInv.~(),(~

),(DP~

,...,1,~i.i.d.),(

0020

0

NG

GG

niGii

Questions Should we model the two biomarkers jointly?

Should we cluster the patients based on both biomarkers jointly?

The biomarkers may operate somewhat independently.

Page 6: Double Dirichlet Process Mixtures

6

Double DP mixtures

niNy

y

i

iiii

i

i

i

i ,...,1,,~22

2121

2

12

2

1

)GammaInv.~(),(~

),(DP~

,...,1,~i.i.d.),(

21

20101101

1011

1211

NG

GG

niGii

)GammaInv.~(),(~

),(DP~

,...,1,~i.i.d.),(

22

20202202

2022

2222

NG

GG

niGii

Equicorrelation – corr(y1i, y2i) are assumed to be the same for all i=1,…,n

Clustering based on biomarker 1 and based on biomarker 2 can be different

Page 7: Double Dirichlet Process Mixtures

7

Motivating Example 2: Interrater Agreement

Agreement between 2 Raters (Melia and Diener-West 1994)

Each rater provides an ordinal rating on a scale of 1-5 (lowest to highest invasion)of the extent to which tumor has invaded the eye,n=885

Rater 1

1 2 3 4 5

Rater2

1 291 74 1 1 1

2 186 256 7 7 3

3 2 4 0 2 0

4 3 10 1 14 2

5 1 7 1 8 3

Page 8: Double Dirichlet Process Mixtures

8

Interrater agreement

Kottas, Muller, Quintana (2005) analyzed these data using a flexible DP mixture of Bivariate probit ordinal model which modeled the unstructured joint probabilities

prob(Rater 1=i and Rater2 = j), i=1,…,5, j=1,…,5

One way to quantify interrrater agrrement is to measure departure from the structured model of independence

We consider a (mixture of) Double DP mixtures model here which provides separate DP structures for the two raters. We then measure ``agreement’’ from this model.

Page 9: Double Dirichlet Process Mixtures

9

Motivating Example 3

Mixed model for longitudinal data

),(~),...,(

,

1 iiiiipii

ijiijijij

bZXNyyy

orbZXy

It is common to assume (Bush and MacEachern 1996)

DPGGbi ~,~

Modeling the error covariance i or the error variance (if i =diag(2i)) extends

the normal distribution assumption to normal scale mixtures (t, Logistic,…)

DPGGi ~,~2

Page 10: Double Dirichlet Process Mixtures

10

Putting the two together

One way to combine these two structures is

DPGGb ii ~,~),( 2

Do we expect the random effects bi appearing in the modeling the mean and the error variances to cluster similarly?

The error variance model often is used to extend the distributional assumption.

),(~,~

),(~,~

022222

01111

GDPGG

GDPGGb

i

i

Page 11: Double Dirichlet Process Mixtures

11

Double DPM

I will discuss Fitting Applicability Flexibility Limitations

of such double semiparametric mixtures

I will also compare these models with usual DP models via predictive model comparison criteria

Page 12: Double Dirichlet Process Mixtures

12

Dirichlet process

Dirichlet Process is a probability measure on the space of distributions (probability measures) G.

G ~ Dirichlet Process (G0), where G0 is a probability

Dirichlet Process assigns positive mass to every open set of probabilities on support(G0)

Conjugacy: Y1,…., Yn ~ i.i.d. G, (G) = DP( G0) Then Posterior (G|Y) ~ DP( G0 + nFn) where Fn is the empirical distn.

Polya Urn Scheme

Page 13: Double Dirichlet Process Mixtures

13

Stick breaking and discreteness

G~ DP( G0) implies G is almost surely discrete

,1(~),1)(1(

,1(~),1(

,1(~,

ionrepresntat breaking-Stick

....~,....,

1982) Sethuraman and (Tiwari 1 prob. with

31233

2122

111

021

1}{

Betawwwwq

Betawwwq

Betawwq

Gdii

qGj

j j

Page 14: Double Dirichlet Process Mixtures

14

Bayes estimate from DP

The discrete nature of a random G from a DP leads to some disturbing features, such as this result from Diaconis and Freedman (1986)

Location modelyi = + i, i=1,…n has prior (), such as a normal prior 1,…, n ~ i.i.d. G

G ~ DP(G0) - symmetrized G0 = Cauchy or t-distn

Then the posterior mean is an inconsistent estimate of

Page 15: Double Dirichlet Process Mixtures

15

Dirichlet process mixtures (DPM)

If we marginalize over i, we obtain a semiparametric mixture

where the mixing distribution G is random and follows DP(G0)

)(~

..~,...,

parmsother .covariates possible are

distn. parametricknown a is

,...1),,( assuch ),,,|(~

0

1

GDPG

Gidi

x

f

niNxyfY

n

i

iiiii

)(),,|(~ dGxyfY iii

Page 16: Double Dirichlet Process Mixtures

16

DPM - clusters

Since G is almost surely discrete, 1,…,n form clusters

1= 5 = 8 1unique

2= 3 = 4= 6= 7 2unique

etc.

The number of clusters, and the clusters themselves, are random.

)(~,..~,...,

),,|(~

01 GDPGGidi

xyfY

n

iiii

Page 17: Double Dirichlet Process Mixtures

17

DPM – MCMC

The Polya urn/marginalized sampler (Escobar 1994, Escobar & West 1995) samples i one-at-a-time from

(i | -i, data)

Improvements, known as collapsed samplers, are proposed in MacEachern (1994, 1998) where, instead of sampling i , only the cluster membership of i are sampled.

For non-conjugate DPM (sampling density f(yi |i ) and base measure G0 are not conjugate), various algorithms have been proposed.

Page 18: Double Dirichlet Process Mixtures

18

Finite truncation and Blocked Gibbs

With this finite truncation, it is now a finite mixture model with stick-breaking structure on qj

(1,....,n) and (q1,....,qM) can be updated in blocks (instead of one-at-time as in Polya Urn sampler) which may provide better mixing

1

}{1

}{1 -DP of instead ~,...,

),,|(~

jj

M

jjn

iiii

jjqGqG

xyfY

Page 19: Double Dirichlet Process Mixtures

19

Comments

In each iteration, the Polya urn/marginal sampler cycles thru each observation, and for each, assigns its membership among a new and existing clusters.

The Poly urn sampler is also not straightforward to implement in non-linear (non-conjugate) problems or when the sample size n may not be fixed.

For the blocked sampler, on the other hand, the choice of the truncation M is not well understood.

Page 20: Double Dirichlet Process Mixtures

20

Model comparison in DPM models

Basu and Chib (2003) developed Bayes factor/ marginal likelihood computation method for DPM.

This provided a framework for quantitative comparison of DPM with competing parametric and semi/nonparametric models.

Log-Marginal and log-Bayes factor for Longitudinal Aids trial (n=467) with random coeffs having distn G DPM Student-t Normal DPM -3477 (76) (62) Student-t -3553 (-14) Normal -3539

Page 21: Double Dirichlet Process Mixtures

21

Marginal likelihood of DPM

Based on the Basic marginal identity (Chib 1995) log-posterior()=log-likelihood() + log-prior() - log-marginal

log-marginal = log-likelihood(*) + log-prior(*) – log-posterior(*)

The posterior ordinate of DPM is evaluated via prequential conditioning as in Chib (1995)

The likelihood ordinate of DPM is evaluated from a (collapsed) sequential importance sampler.

Log-Marginal and log-Bayes factor DPM Student-t Normal DPM -3477 (76) (62) Student-t -3553 (-14) Normal -3539

Page 22: Double Dirichlet Process Mixtures

22

Page 23: Double Dirichlet Process Mixtures

23

Page 24: Double Dirichlet Process Mixtures

24

Page 25: Double Dirichlet Process Mixtures

25

Double Dirichlet process mixtures (DDPM)

Marginalization obtains a double semiparametric mixture

where the mixing distributions G and G are

random

)(~)(~

..~,...,..~,...,

parmsother .covariates possible are

distn. parametricknown a is

,...1),,( assuch ),,,,|(~

00

11

GDPGGDPG

GidiGidi

x

f

niNxyfY

nn

i

iiiiiii

)()(),,,|(~ dGdGxyfY iii

Page 26: Double Dirichlet Process Mixtures

26

Two Biomarkers case: y1 and y2

niNy

y

i

iiii

i

i

i

i ,...,1,,~22

2121

2

12

2

1

)GammaInv.~(),(~

),(DP~

,...,1,~i.i.d.),(

21

20101101

1011

1211

NG

GG

niGii

)GammaInv.~(),(~

),(DP~

,...,1,~i.i.d.),(

22

20202202

2022

2222

NG

GG

niGii

Page 27: Double Dirichlet Process Mixtures

27

A simpler model: normal means only

We generate n=50 (i,i) means and then (yi1,yi2) observations from this Double-DPM model

priorWishartGDPGGDPG

ahasGdiiGdii

niNy

y

nn

i

i

i

i

)(~)(~

,...~,.....,,...~,.....,

,..1,,~

00

11

22

1

Page 28: Double Dirichlet Process Mixtures

28

Double DPM

-4 -2 0 2 4 6

-10

-8-6

-4-2

02

mu

phi

psi

-5 0 5

-10

-8-6

-4-2

02

Observations y

y[,1]

y[,2

]

Page 29: Double Dirichlet Process Mixtures

29

-5 0 5 10

-10

-8-6

-4-2

02

Plot of y and mu: Clusters in different symbols

y[,1]

y[,2

]

y

mu

-5 0 5

-10

-8-6

-4-2

02

Plot of y and mu: Clusters in different symbols

y[,1]

y[,2

]

y

mu

Single DPM in the bivariate mean vector Double DPM in mean components

Page 30: Double Dirichlet Process Mixtures

30

Model fitting

We fitted the Double DPM and the Bivariate DPM models to these data.

The Double DPM model can be fit by a two-stage Polya urn sampler or a two-stage blocked Gibbs sampler.

“Collapsing” can become more difficult.

2,1,||1

2,1,)(1

1

1

2

dEn

MADAverage

dEn

MSEAverage

n

i

trueididpostd

n

i

trueididpostd

Page 31: Double Dirichlet Process Mixtures

31

MSE MAD

y1 y2 y1 y2

Data generated from Double DP

Double DP 0.99 1.32 0.72 0.92 Bivariate DP in (y1,y2)

1.02 4.29 0.83 1.73

Data generated from Bivariate DP

Double DP 0.98 1.29 0.81 0.94 Bivariate DP in (y1,y2)

0.98 1.08 0.75 0.81

Page 32: Double Dirichlet Process Mixtures

32

Wallace (asymmetric) criterion for comparing two clusters/partitions

Let S be the number of mean pairs which are in the same cluster in a MCMC posterior draw and also in the true clustering.

Let nk, k=1,..K be the number of means in cluster Ck in the MCMC draw.

Then the Wallace asymmetric criterion for comparing these two clusters is

Average Wallace

Double DP Bivariate DP Data generated from

y1 y2 y1 y2 Double DP 0.89 0.42 0.66 0.48 Bivariate DP in (y1,y2) 0.72 0.24 0.62 0.62

k kk nn

S

2/)1(

Page 33: Double Dirichlet Process Mixtures

33

Measurements on two biomarker proteins by Luminex panels

10 20 30 40 50 60 70 80

14

00

01

60

00

18

00

02

00

00

22

00

0

IL1-beta

CR

P• Frozen parafin embedded tissues, pre and post surgery

• Luminex panel

• Nodal involvement

Page 34: Double Dirichlet Process Mixtures

34

Two biomarker proteins

The bivariate DPM

niNy

yi

i

i

i

i ,...,1,,~2

12

2

1

)WishartInv.~(),(~

),(DP~

,...,1,~i.i.d.),(

0020

0

NG

GG

niGii

vs the Double DPM

niNy

y

i

iiii

i

i

i

i ,...,1,,~22

2121

2

12

2

1

)GammaInv.~(),(~

),(DP~

,...,1,~i.i.d.),(

21

20101101

1011

1211

NG

GG

niGii

)GammaInv.~(),(~

),(DP~

,...,1,~i.i.d.),(

22

20202202

2022

2222

NG

GG

niGii

Page 35: Double Dirichlet Process Mixtures

35

0 2000 6000 10000

-50

050

100

Double DP

mu.

pred

[1]

0 2000 6000 10000

1400

018

000

2200

0

Double DP

mu.

pred

[2]

0 2000 6000 10000

-50

050

100

Bivariate DP

mu.

pred

[1]

0 2000 6000 10000

1400

018

000

2200

0

Bivariate DP

mu.

pred

[2]

µpred

Page 36: Double Dirichlet Process Mixtures

36

ypred

0 10 20 30 40

0.00

0.04

0.08

0.12

Double DP

y.pr

ed[1

]

14000 18000 22000

0.00

000

0.00

015

0.00

030

Double DP

y.pr

ed[1

]

0 10 20 30 40

0.00

0.04

0.08

Bivariate DP

y.pr

ed[1

]

14000 18000 22000

0.00

000

0.00

015

0.00

030

Bivariate DP

y.pr

ed[1

]

Page 37: Double Dirichlet Process Mixtures

37

ypred

y.pred[1]

0 10 20 30 40 50 60

y.pr

ed[2

]

10000

15000

20000

25000

de

nsity

0.000

0.005

0.010

0.015

0.020

0.025

0.030

Double DP

y.pred[1]

0 10 20 30 40 50 60

y.pr

ed[2

]

10000

15000

20000

25000de

nsity

0.000

0.005

0.010

Bivariate DP

Page 38: Double Dirichlet Process Mixtures

38

0 50 100 150

-14

-12

-10

-8

Observation

Lo

g C

PO

Double DPBivariate DP

log CPO = log f(yi| y-i)

LPML = log f(yi| y-i)

Double DP = -1498.67Bivariate DP= -1533.01

Page 39: Double Dirichlet Process Mixtures

39

Model comparison

I prefer to use marginal likelihood/ Bayes factor for model comparison.

The DIC (Deviance Information Criterion) , as proposed in Spiegelhalter et al. (2002) can be problematic for missing data/random-effects/mixture models.

Celeux et al. (2006) proposed many different DICs for missing data models

Page 40: Double Dirichlet Process Mixtures

40

DIC3

I have earlier considered DIC3 (Celeux et al. 2006, Richardson 2002) in missing data and random effects models which is based on the observed likelihood

)()|,,()|(where

,|)|(log2|)|(log43

dypyp

yypEyypEDIC

obsobs

obsobsobsobs

The integration over the latent parameters often has to be obtained numerically.

This is difficult in the present problem

Page 41: Double Dirichlet Process Mixtures

41

DIC9

I am proposing to use DIC9 which is similar to DIC3 but is based on the conditional likelihood

),,|( obsyp

DIC9

Double DP 3018.3 Bivariate DP 3050.3

,|),,|(log2|),,|(log4 ,,,,9 obsobsobsobs yypEyypEDIC

Page 42: Double Dirichlet Process Mixtures

42

Convergence rate results: Ghosal and Van Der Vaart (2001)

Normal location mixtures

Model: Yi ~ i.i.d. p(y) = (y-)dG(), i=1,…,n

G ~ DP(G0), G0 is Normal

Truth: p0(y) = (y-)dF()

Ghosal and Van Der Vaart (2001): Under some regularity conditions,

Hellinger distance (p, p0) 0 “almost surely”

at the rate of (log n)3/2/n

Page 43: Double Dirichlet Process Mixtures

43

Ghosal and Van Der Vaart (2001): results contd.

Bivariate DP location-scale mixture of normals

Yi ~ i.i.d. p(y) = (y-)dH(,), i=1,…,n H ~DP(H0)

Ghosal and Van Der Vaart (2001): If H0 is Normal {a compactly supported distn}, then the convergence rate is

(log n)7/2/n

Double DP location-scale mixture of normals

Yi ~ i.i.d. p(y) = (y-)dG() dG(), i=1,…,n

G ~DP(G0), G ~DP(G0)

Ghosal and Van Der Vaart (2001): If G0 is Normal, G0 is compactly supported and the true density

p0(y) = (y-)dF1() dF2() is also a double mixture, then

Hellinger distance (p, p0) 0 at the rate of (log n)3/2/n

Page 44: Double Dirichlet Process Mixtures

44

Interrater data

Agreement between 2 Raters (Melia and Diener-West 1994)

Each rater provides an ordinal rating on a scale of 1-5 (lowest to highest invasion)of the extent to which tumor has invaded the eye,n=885

Rater 1

1 2 3 4 5

Rater2

1 291 74 1 1 1

2 186 256 7 7 3

3 2 4 0 2 0

4 3 10 1 14 2

5 1 7 1 8 3

Page 45: Double Dirichlet Process Mixtures

45

DPM multivariate ordinal model

Kottas, Muller and Quintana (2005)

)(~

),(),|(~i.i.d.

),P(

i)subject for k Rater2 and j Rater1(

0

22

1

221-k 2111-j 1

GDPG

dGzNZ

ZZ

ZZ

P

i

ii

kiji

Page 46: Double Dirichlet Process Mixtures

46

Interrater agreement

The objective is to measure agreement between raters beyond what is possible by chance.

This is often measured by departure from independence, often specifically in the diagonals

Polychoric correlation of the latent bivariate normal Z has been used as a measure of association.

………………… of the latent bivariate normal mixtures???

Rater 1

1 2 3 4 5

Rater2

1 291

74 1 1 1

2 186

256

7 7 3

3 2 4 0 2 0

4 3 10 1 14 2

5 1 7 1 8 3

Page 47: Double Dirichlet Process Mixtures

47

Latent class model (Agresti & Lang 1993)

C latent classes

C1,...,c c, class tobelongs isubject if

i)subject for k Rater2 and j Rater1(

cjk

ijk

p

Pp

Ratings of the two raters within a class are independent

kcjccjk ppp 21 Rater 1

1 2 3 4 5

Rater2

1 291

74 1 1 1

2 186

256

7 7 3

3 2 4 0 2 0

4 3 10 1 14 2

5 1 7 1 8 3

kcjc

C

ciijk ppcIp 21

1

)(

Page 48: Double Dirichlet Process Mixtures

48

Mixtures of Double DPMs

For each latent class, we model pc1j and pc2k by two separate univariate ordinal probit DPM models

kcjc

C

ciijk ppcIp 21

1

)(

)(~

),(),|(~i.i.d.

,..,1,2,1,5,..1),(p

0

221

,1,

clclcl

clcli

jclcljclclj

GDPG

dGzNZ

CcljZP

Page 49: Double Dirichlet Process Mixtures

49

Computational issue

The ``sample size’’ nc in latent group c is not fixed. This causes problem for the polya-urn/marginal sampler which works with fixed sample size

Do, Muller, Tang (2005) suggested a solution to this problem by jointly sampling the latent il

=(il,il2) and the latent rating class membership i.

kcjc

C

ciijk ppcIp 21

1

)(

Page 50: Double Dirichlet Process Mixtures

50

Estimated cell probabilities 1 2 3 4 5 .3288 .0836 .0011 .0011 .0011 1 .3261

(.2945,.3590) .0869

(.0696,.1078) .0013

(.0001,.0044) .0020

(.0003,.0056) .0007

(0,.0027) .3286

(.3060,,.3524) .0821

(.0701,.09510) .003

(.0008,.007) .0022

(.0007,.0047) .0009

(.0001,.0023) .2102 .2893 .0079 .0079 .0034 2 .2135

(.1858,.2428) .2826

(.2521,.3141) .0082

(.0031,.0152) .0069

(.0022,.0143) .0031

(.0007,.0075) .2098

(.1856,.2334) .2853

(.2700,.2997) .0076

(.0036,.0129) 0103

(..0074,.0138) .0033

(.0017,.0053) .0023 .0045 0 .0023 0 3 .0023

(.004,.0065) .0055

(.0017,.0107) .0016

(.0003,.0038) .0022

(.004,.0062) .0008

(.0,.0032) .0029

(.0009,.0068) .006

(.0025,.0114) .0003

(0,.0009) .0015

(.0002,.0039) .0004

(0,.0011) .0034 .0113 .0011 .0158 .0023 4 .0042

(.0012,.0094) .0102

(.0042,.0187) .0023

(.0004,.006) ..0143

(.0066,.024) .0028

(.0006,.0069) .0031

(.001,.0059) .0124

(.0092,.0157) .0016

(.0002,.0044) .0127

(.0089,.0168) .0033

(.0014,.0057) .0011 .0079 .0011 .009 .0034 5 .0012

(.0001,.0041) .0071

(.0026,.0140) .0019

(.0003,.0054) .0083

(.0034,.0153) .0039

(.0009,.009) .0019

(.0006,.004) .0086

(.0055,.0129) .0011

(.0002,.0032) .0086

(.0053,.0118) .0026

(.0011,.0045)

Page 51: Double Dirichlet Process Mixtures

51

Marginal probability estimates

Latent Group 1

Rater B 1 2 3 4 5

0.9374 .0559 .0049 .0011 .0008

Rater A 1 2 3 4 5

0.6691 .3246 .0036 .0014 .0012

1 2 3 4 51 0.6261 0.03859 0.003273 0.00068 0.0004732 0.3058 0.01675 0.001511 0.000326 0.0002253 0.003405 0.000194 2.38E-05 6.81E-06 5.06E-064 0.001237 0.000121 4.37E-05 1.86E-05 1.79E-055 0.000882 0.000212 7.31E-05 3.12E-05 3.10E-05

Page 52: Double Dirichlet Process Mixtures

52

Marginal probability estimates

Rater B 1 2 3 4 5

0.1432 .7432 .0226 .0706 .0205

Rater A 1 2 3 4 5

0.1562 .7139 .0188 .0661 .0451

Latent Group 2

1 2 3 4 51 0.02196 0.1264 0.002716 0.003709 0.0013822 0.1107 0.5625 0.0138 0.02051 0.0064143 0.002445 0.01194 0.000555 0.003084 0.0007644 0.005113 0.02517 0.003294 0.02575 0.0067475 0.002968 0.01718 0.002219 0.01753 0.005212

Page 53: Double Dirichlet Process Mixtures

53

Joint mean covariance modeling

Trial with n=200 patients who had acute MI within 28 days of baseline and are depressed/low social support

Underwent 6 months of usual care (control) or individual and/or group-based cognitive behavioral counseling (treatment).

Response y = depression (Beck Depression Inventory) measured at 0,182,365,548, 913, 1278 days (but actually at irregular intervals)

Covariate: Treatment, Family history, Age, Sex, BMI,……

Intermittent missing response, missing covariate….

Page 54: Double Dirichlet Process Mixtures

54

0 200 400 600 800 1000 1200

01

02

03

04

05

0

Visit Days

Be

ck d

ep

ress

ion

Inve

nto

ry

Page 55: Double Dirichlet Process Mixtures

55

Model

Model for the mean

)(

)(

22101 iijiijiiij

iijijijij

ttbtbbbZ

bZXyE

Model for the covariance

),(~),..,( 661 iiiii Nyyy Pourahmadi (1999), Pourahmadi and Daniels (2002) use a Cholesky decomposition of the covariance

which allows one to use log-linear model for the variances and “linear regression” for the off-diagonal terms

Page 56: Double Dirichlet Process Mixtures

56

Modeling the covariance

We assume ),..,(diag 2

621 iii

221

2log

ijiijioiiij

iijijij

tbtbbbZ

bZX

1 2 3 4 5 6

55

60

65

70

75

80

visits

Va

ria

nce

Page 57: Double Dirichlet Process Mixtures

57

Mean and variance level random effects

? A joint DPM for the two random effects together which allows clustering at the patient level

nibbbbbbbb iioiiiioii ,..1),,,(and),,( 2121

DPGGbb ii ~,~i.i.d),( ,,

DPGGb

DPGGb

i

i

~,~i.i.d

~,~i.i.d

? Or Double DPM, that is, independent DPM separately for the each of the two random effects which allows separate clustering at the mean and variance level

Most frequentist and parametric Bayesian analyses use the latter independence among the mean and variance level random effects.

Page 58: Double Dirichlet Process Mixtures

58

-7000 -5000 -3000

-2400

-2300

-2200

Double DP

iterations

log-lik

elio

od

-7000 -5000 -3000

-2400

-2300

-2200

Bivariate DP

iterations

log-lik

elio

od

0 50 100 150 200

-25

-20

-15

-10

-5

Log CPO

Observations

Lo

g-C

PO

Double DPBivariate DP

Page 59: Double Dirichlet Process Mixtures

59

Fixed effects estimates

Double DP Bivariate DP Normal .slope -1.12 (-2.39,-0.16) -0.89 (-1.97,0.24) -0.26 (-.65,.28) .change -3.29 (-4.36,-1.77) -2.98 (-4.4,-1.11) -4.057 (-4.72,-3.43)

.slope 0.18 (-.006,.38) 0.19 (.014,.382) -0.16 (-0.32,0.05) .quad -0.005 (-0.027,0.021) -.006 (-.027,.013) 0.033 (0.009,0.051)

Page 60: Double Dirichlet Process Mixtures

60

Pseudo marginal likelihood

log f(yi| y-i) Double DP -2495.290 Bivariate DP -2503.916 Normal -2530.92

Page 61: Double Dirichlet Process Mixtures

61

Summary

Double DP mixtures may add a level of structure to mixture modeling with DP.

They produce interesting “product-clustering”

They are applicable to specific problems that may benefit from this structure