bayesian generalized product partition model
DESCRIPTION
Bayesian Generalized Product Partition Model. By David Dunson and Ju-Hyun Park Presentation by Eric Wang 2/15/08. Outline. Introduce Product Partition Models (PPM). Relate PPM to DP via the Blackwell-MacQueen Polya Urn scheme. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/1.jpg)
Bayesian Generalized Product Partition Model
By David Dunson and Ju-Hyun Park
Presentation by Eric Wang 2/15/08
![Page 2: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/2.jpg)
Outline• Introduce Product Partition Models (PPM).
• Relate PPM to DP via the Blackwell-MacQueen Polya Urn scheme.
• Introduce predictor dependence into PPM to form Generalized PPM (GPPM).
• Discussion and Results
• Conclusion
![Page 3: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/3.jpg)
Product Partition Model• A PPM is formally defined as
– Where is a partition of .– Let denote the data for subjects in cluster h, h
= 1,…,k.– Therefore, the probability of partition is therefore the
product of all its independent subsets.– The posterior cohesion on after seeing data is also a
PPM,
k
h
k
h
*hhh
* ccf|f1 1
0* )S()S(),y()Sy(
)S,...,S(S **1
*k },...,1{ n}S:{y *hih iy
*S
*S y
)y()S( *hhh fc
(1)
![Page 4: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/4.jpg)
Product Partition Model• A PPM can also be induced hierarchically
– Where if , .
• Taking induces a nonparametric PPM.
• A prior on the weights imposes a particular form on the cohesion: a convenient choice corresponds to the Dirichlet Process.
01
~,~
)(~S,|
GS
fyind
h
k
hhh
ind
i
S
ind
i i
θ
hSi *hSi )',..,(S 1 nSS
k
)',...( 1 k
![Page 5: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/5.jpg)
Relating DP and PPM• In DP, . – G is seen in stick breaking. If it is marginalized out, it yields
the Blackwell-MacQueen (1973) formulation:
– Where is the unique value taken by the ith data.– The joint distribution of the a particular set
is therefore
due to the independence of the data.
)(~ 0GDPG
i
![Page 6: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/6.jpg)
Relating DP and PPM• It can be shown directly that the Blackwell-MacQueen
formulation leads to
• Where is the number of data taking unique value .• is the unique value of the subject in cluster h, re-sorted
by their ids:
• Also, , is a normalizing constant and the cohesion is Then:
hk h
(3)
(2)
thl},...,,...,,...,,,...,{ ,1,
2
,21,2
1
,11,1 21 kh
kkhkh
h
khh
h
khh k
![Page 7: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/7.jpg)
Relating DP and PPM• From slide 3, writing the prior and likelihood together:
• Notice that from (1), G can be marginalized out to get the same form
• Specifically, integrate over all possible unique values which can be taken by for subset h.h
(4)
![Page 8: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/8.jpg)
Relating DP and PPM• Therefore, DP is a special case of PPM with cohesion
and normalizing constant .
• However, (2) follows the premise of DP that data is exhcangeable and does not incorporate dependence on predictors.
• Next, PPMs will be generalized such that predictor dependence is incorporated.
![Page 9: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/9.jpg)
Generalized PPM• The goal of the paper is to formulate (1) such that the cohesion
depends on the subject’s predictor:
• This can be done following a process very similar to the non-predictor case above.
• Once again, the connection between DP and PPM will be used, this will henceforth be referred to as GPPM
• The formulation is interesting because the predictors will be treated as random variables rather than known fixed values (as in KSBP).
![Page 10: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/10.jpg)
GPPM• Consider the following hierarchical model
– Where , constitutes a base measure on and , the parameters of the data and predictor, respectively.
– This model will segment data {1,…,n} into k clusters. As before, denotes that subject i belongs to cluster h.
– and , which denote the unique values of the parameters associated with the subject and its predictor, shown below
*hSi
![Page 11: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/11.jpg)
GPPM• The joint distribution of can be developed in a similar manner to (2):
• The conditional distribution of given predictors is
• For comparison, (2) is shown below:
• The cohesion in (6) is
• (7) meets the criteria originally set out.
(5)
(6)
(2)
(7)
![Page 12: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/12.jpg)
GPPM• Some thoughts on GPPM so far:
– As noted earlier the posterior distribution of PPMs are still in the class of PPMs, but with updated cohesion.
– Similiarly, the posterior of a GPPM will also take the form of a GPPM
– (2) and (6) are quite similar. The extra portion of (6) is the marginalized probability of the predictor .
– If , then the GPPM reverts to the Blackwell-MacQueen formulation, seen clearly in the following theorem.
)y()S( *hhh fc
![Page 13: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/13.jpg)
Generalized Polya Urn Scheme• The following theorem shows that the GPPM can induce a
Blackwell-MacQueen Polya Urn scheme, generalized for predictor dependence:
![Page 14: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/14.jpg)
Generalized Polya Urn Scheme• By the above theorem, data i will do either 1) or 2)
– 1) Draw a previously unseen unique value proportional to the concentration parameter and the base measure on the predictor
– 2) Draw a previously used unique value equal to the parameters of
cluster h proportional to the number of data which have previously chosen that unique value and the marginal likelihoods of its predictor value across the clusters.
• Further, since the predictors are treated as random variables, updating the posteriors on each cluster’s predictor parameters means that GPPM is a flexible, non-parametric way to adapt the distance measure in predictor space.
• In this paper G is always integrated out; however, Dunson alludes to variational techniques which could still be developed in similar fashion following the fast Variational DP proposed by Kurihara et al (2006).
![Page 15: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/15.jpg)
Generalized Polya Urn Scheme• Consider, for example, a Normal-Wishart prior on the predictor as follows
• Where and are multiplicative constants and is a Wishart distribution with degrees of freedom and mean
• Notice that this formulation adds another multiplier to the precision of the predictor distribution. This analogously corresponds to kernel width in KSBP, and encourages tight local clustering in predictor space.
• The marginal distributions on the predictors from Theorem 1 take the forms shown on the next slide.
![Page 16: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/16.jpg)
Generalized Polya Urn Scheme• The marginal distribution of the predictor in the first weight:
• The marginal distribution of the predictor in the second weight has the same functional form but with updated hyperparameters:
Non-central multivariate t-distribution with degrees of freedomMean and scale
2/)(*0
1*0
*0*2/1*
0*2/*
**0
**0
*
)()'(11||)2/()(
)2/)((),,|(p
xxxxxx
px
xxxx
x
xxpxf
where
And is the empirical mean of the predictors in cluster h, without predictor i.
![Page 17: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/17.jpg)
Generalized Polya Urn Scheme• Posterior updating in this model is straightforward using MCMC. The
conditional posterior of the parameters is
• The indicators are updated separately from the cluster parameters . The membership indicators are sampled from it multinomial posterior:
• Next, update the parameters conditioned on and number of clusters k.
where is the base prior updated with the data likelihood
and the weights from Theorem 1
![Page 18: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/18.jpg)
Results• Dunson et al. demonstrates results using the following model on
conditional density regression problems
• Where
• Demonstrate results on 3 datasets:– Simulated Single Gaussian (p = 2)– Simulated Mixture of two Gaussians (p = 2)– Epidemiology data (p = 3)
P-dimensional predictor
Data likelihood
Parameters of cluster h.
![Page 19: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/19.jpg)
Results• Simulated single Gaussian data, 500 data points
– is generated iid from a uniform distribution over (0,1).– Data was simulated using
• Algorithm was run for 10,000 iterations with 1,000 iteration burn-in. Fast mixing and good estimates. Raw Data
x
y
Below are conditional distributions on y for two different values of x. The dotted lines is truth, the solid line is the estimation, and the dashed lines are 99% credibility intervals
![Page 20: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/20.jpg)
Results• Simulated 2 Gaussian results, 500 data points
– is generated iid from a uniform distribution over (0,1).– Data was simulated using
Here, the left column of plots are for a PPM (non-generalized, while the right column plots is the GPPM on the same dataset. Notice much better fitting in the bottom plots, and that the GPPM is not dragged toward 0 as the second peak appears when approaches 0.
PPM GPPM
![Page 21: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/21.jpg)
Results• Epidemiologic Application:• DDE is shown to increase the rate of pre-term birth. Two
predictors and correspond to DDE dose for child i, and mother’s age after normalization, respectively.
• Dataset size was 2,313 subjects.
• MCMC GPPM was run for 30,000 iterations with 10,000 iteration burn-in.
• The results confirmed earlier findings that DDE causes a slightly decreasing trend as DDE level rises.
• These findings are similar to previous KSBP work on the same dataset, but the implementation was simpler.
![Page 22: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/22.jpg)
Results
Dashed lines indicate 99% credibility intervals
Raw Data
![Page 23: Bayesian Generalized Product Partition Model](https://reader036.vdocuments.net/reader036/viewer/2022062323/56816060550346895dcf89e7/html5/thumbnails/23.jpg)
Conclusion• A GPPM was formulated beginning with the Blackwell-MacQueen
Polya Urn scheme.
• The GPPM incorporates predictor dependence by treating the predictor as a random variable.– It is similar in spirit to the KSBP, but is able to bypass issues such as kernel
width selection and the inability to implement a continuous distribution in predictor space.
• Future research directions could explore Dunson’s mention of a variational method similar to the formulation proposed in this paper.