bayesian generalized product partition model

Bayesian Generalized Product Partition Model

By David Dunson and Ju-Hyun Park

Presentation by Eric Wang 2/15/08

Outline• Introduce Product Partition Models (PPM).

• Relate PPM to DP via the Blackwell-MacQueen Polya Urn scheme.

• Introduce predictor dependence into PPM to form Generalized PPM (GPPM).

• Discussion and Results

• Conclusion

Product Partition Model• A PPM is formally defined as

– Where is a partition of .– Let denote the data for subjects in cluster h, h

= 1,…,k.– Therefore, the probability of partition is therefore the

product of all its independent subsets.– The posterior cohesion on after seeing data is also a

PPM,

k

h

k

h

*hhh

* ccf|f1 1

0* )S()S(),y()Sy(

)S,...,S(S **1

*k },...,1{ n}S:{y *hih iy

*S

*S y

)y()S( *hhh fc

(1)

Product Partition Model• A PPM can also be induced hierarchically

– Where if , .

• Taking induces a nonparametric PPM.

• A prior on the weights imposes a particular form on the cohesion: a convenient choice corresponds to the Dirichlet Process.

01

~,~

)(~S,|

GS

fyind

h

k

hhh

ind

i

S

ind

i i

θ

hSi *hSi )',..,(S 1 nSS

k

)',...( 1 k

Relating DP and PPM• In DP, . – G is seen in stick breaking. If it is marginalized out, it yields

the Blackwell-MacQueen (1973) formulation:

– Where is the unique value taken by the ith data.– The joint distribution of the a particular set

is therefore

due to the independence of the data.

)(~ 0GDPG

i

Relating DP and PPM• It can be shown directly that the Blackwell-MacQueen

formulation leads to

• Where is the number of data taking unique value .• is the unique value of the subject in cluster h, re-sorted

by their ids:

• Also, , is a normalizing constant and the cohesion is Then:

hk h

(3)

(2)

thl},...,,...,,...,,,...,{ ,1,

2

,21,2

1

,11,1 21 kh

kkhkh

h

khh

h

khh k

Relating DP and PPM• From slide 3, writing the prior and likelihood together:

• Notice that from (1), G can be marginalized out to get the same form

• Specifically, integrate over all possible unique values which can be taken by for subset h.h

(4)

Relating DP and PPM• Therefore, DP is a special case of PPM with cohesion

and normalizing constant .

• However, (2) follows the premise of DP that data is exhcangeable and does not incorporate dependence on predictors.

• Next, PPMs will be generalized such that predictor dependence is incorporated.

Generalized PPM• The goal of the paper is to formulate (1) such that the cohesion

depends on the subject’s predictor:

• This can be done following a process very similar to the non-predictor case above.

• Once again, the connection between DP and PPM will be used, this will henceforth be referred to as GPPM

• The formulation is interesting because the predictors will be treated as random variables rather than known fixed values (as in KSBP).

GPPM• Consider the following hierarchical model

– Where , constitutes a base measure on and , the parameters of the data and predictor, respectively.

– This model will segment data {1,…,n} into k clusters. As before, denotes that subject i belongs to cluster h.

– and , which denote the unique values of the parameters associated with the subject and its predictor, shown below

*hSi

GPPM• The joint distribution of can be developed in a similar manner to (2):

• The conditional distribution of given predictors is

• For comparison, (2) is shown below:

• The cohesion in (6) is

• (7) meets the criteria originally set out.

(5)

(6)

(2)

(7)

GPPM• Some thoughts on GPPM so far:

– As noted earlier the posterior distribution of PPMs are still in the class of PPMs, but with updated cohesion.

– Similiarly, the posterior of a GPPM will also take the form of a GPPM

– (2) and (6) are quite similar. The extra portion of (6) is the marginalized probability of the predictor .

– If , then the GPPM reverts to the Blackwell-MacQueen formulation, seen clearly in the following theorem.

)y()S( *hhh fc

Generalized Polya Urn Scheme• The following theorem shows that the GPPM can induce a

Blackwell-MacQueen Polya Urn scheme, generalized for predictor dependence:

Generalized Polya Urn Scheme• By the above theorem, data i will do either 1) or 2)

– 1) Draw a previously unseen unique value proportional to the concentration parameter and the base measure on the predictor

– 2) Draw a previously used unique value equal to the parameters of

cluster h proportional to the number of data which have previously chosen that unique value and the marginal likelihoods of its predictor value across the clusters.

• Further, since the predictors are treated as random variables, updating the posteriors on each cluster’s predictor parameters means that GPPM is a flexible, non-parametric way to adapt the distance measure in predictor space.

• In this paper G is always integrated out; however, Dunson alludes to variational techniques which could still be developed in similar fashion following the fast Variational DP proposed by Kurihara et al (2006).

Generalized Polya Urn Scheme• Consider, for example, a Normal-Wishart prior on the predictor as follows

• Where and are multiplicative constants and is a Wishart distribution with degrees of freedom and mean

• Notice that this formulation adds another multiplier to the precision of the predictor distribution. This analogously corresponds to kernel width in KSBP, and encourages tight local clustering in predictor space.

• The marginal distributions on the predictors from Theorem 1 take the forms shown on the next slide.

Generalized Polya Urn Scheme• The marginal distribution of the predictor in the first weight:

• The marginal distribution of the predictor in the second weight has the same functional form but with updated hyperparameters:

Non-central multivariate t-distribution with degrees of freedomMean and scale

2/)(*0

1*0

*0*2/1*

0*2/*

**0

**0

*

)()'(11||)2/()(

)2/)((),,|(p

xxxxxx

px

xxxx

x

xxpxf

where

And is the empirical mean of the predictors in cluster h, without predictor i.

Generalized Polya Urn Scheme• Posterior updating in this model is straightforward using MCMC. The

conditional posterior of the parameters is

• The indicators are updated separately from the cluster parameters . The membership indicators are sampled from it multinomial posterior:

• Next, update the parameters conditioned on and number of clusters k.

where is the base prior updated with the data likelihood

and the weights from Theorem 1

Results• Dunson et al. demonstrates results using the following model on

conditional density regression problems

• Where

• Demonstrate results on 3 datasets:– Simulated Single Gaussian (p = 2)– Simulated Mixture of two Gaussians (p = 2)– Epidemiology data (p = 3)

P-dimensional predictor

Data likelihood

Parameters of cluster h.

Results• Simulated single Gaussian data, 500 data points

– is generated iid from a uniform distribution over (0,1).– Data was simulated using

• Algorithm was run for 10,000 iterations with 1,000 iteration burn-in. Fast mixing and good estimates. Raw Data

x

y

Below are conditional distributions on y for two different values of x. The dotted lines is truth, the solid line is the estimation, and the dashed lines are 99% credibility intervals

Results• Simulated 2 Gaussian results, 500 data points

– is generated iid from a uniform distribution over (0,1).– Data was simulated using

Here, the left column of plots are for a PPM (non-generalized, while the right column plots is the GPPM on the same dataset. Notice much better fitting in the bottom plots, and that the GPPM is not dragged toward 0 as the second peak appears when approaches 0.

PPM GPPM

Results• Epidemiologic Application:• DDE is shown to increase the rate of pre-term birth. Two

predictors and correspond to DDE dose for child i, and mother’s age after normalization, respectively.

• Dataset size was 2,313 subjects.

• MCMC GPPM was run for 30,000 iterations with 10,000 iteration burn-in.

• The results confirmed earlier findings that DDE causes a slightly decreasing trend as DDE level rises.

• These findings are similar to previous KSBP work on the same dataset, but the implementation was simpler.

Results

Dashed lines indicate 99% credibility intervals

Raw Data

Conclusion• A GPPM was formulated beginning with the Blackwell-MacQueen

Polya Urn scheme.

• The GPPM incorporates predictor dependence by treating the predictor as a random variable.– It is similar in spirit to the KSBP, but is able to bypass issues such as kernel

width selection and the inability to implement a continuous distribution in predictor space.

• Future research directions could explore Dunson’s mention of a variational method similar to the formulation proposed in this paper.

bayesian generalized product partition model

Documents