nonparametric bayesian statistics - mit · 2016. 5. 16. · •bayesian statistics that is not...
TRANSCRIPT
![Page 1: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/1.jpg)
Nonparametric Bayesian Statistics
Tamara BroderickITT Career Development Assistant Professor Electrical Engineering & Computer Science
MIT
![Page 2: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/2.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
1
![Page 3: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/3.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
1
![Page 4: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/4.jpg)
• Bayesian statistics that is not parametric (wait!) • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
1
![Page 5: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/5.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
1
![Page 6: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/6.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
1
![Page 7: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/7.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
1
![Page 8: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/8.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
[wikipedia.org]
1
![Page 9: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/9.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
[wikipedia.org]
“Wikipedia phenomenon”
1
![Page 10: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/10.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
[wikipedia.org]
1
![Page 11: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/11.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
[wikipedia.org]
[Ed Bowlby, NOAA]
1
![Page 12: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/12.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
[wikipedia.org]
[Ed Bowlby, NOAA]
1
[Escobar, West 1995; Ghosal et al 1999]
![Page 13: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/13.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
[wikipedia.org]
[Ed Bowlby, NOAA]
[Arjas, Gasbarra 1994]
1
[Escobar, West 1995; Ghosal et al 1999]
![Page 14: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/14.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
[wikipedia.org]
[Ed Bowlby, NOAA]
[Arjas, Gasbarra 1994]
1
[Fox et al 2014]
[Escobar, West 1995; Ghosal et al 1999]
![Page 15: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/15.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
[wikipedia.org]
[Ed Bowlby, NOAA]
[Arjas, Gasbarra 1994]
1
[Ewens, 1972; Hartl, Clark 2003]
[Fox et al 2014]
[Escobar, West 1995; Ghosal et al 1999]
![Page 16: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/16.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
[wikipedia.org]
[Ed Bowlby, NOAA]
[Arjas, Gasbarra 1994]
[Saria et al
2010]1
[Ewens, 1972; Hartl, Clark 2003]
[Fox et al 2014]
[Escobar, West 1995; Ghosal et al 1999]
![Page 17: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/17.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
[wikipedia.org]
[Ed Bowlby, NOAA]
[Arjas, Gasbarra 1994]
1
[Saria et al
2010]
[Ewens, 1972; Hartl, Clark 2003]
[Lloyd et al 2012; Miller et al 2010]
[Fox et al 2014]
[Escobar, West 1995; Ghosal et al 1999]
![Page 18: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/18.jpg)
• Bayesian statistics that is not parametric • Bayesian
!
• Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)
Nonparametric Bayes
P(parameters|data) / P(data|parameters)P(parameters)
[wikipedia.org]
[Ed Bowlby, NOAA]
[Sudderth, Jordan 2009]
[Lloyd et al 2012; Miller et al 2010]
[Arjas, Gasbarra 1994]
[Fox et al 2014]
1
[Escobar, West 1995; Ghosal et al 1999]
[Saria et al
2010]
[Ewens 1972; Hartl, Clark 2003]
![Page 19: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/19.jpg)
• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the
distribution of any N data points doesn’t change when permuted:
• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !
• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors
Nonparametric Bayes
p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))
X1, X2, . . .
2
![Page 20: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/20.jpg)
• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the
distribution of any N data points doesn’t change when permuted:
• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !
• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors
Nonparametric Bayes
p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))
X1, X2, . . .
2
![Page 21: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/21.jpg)
• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the
distribution of any N data points doesn’t change when permuted:
• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !
• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors
Nonparametric Bayes
p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))
X1, X2, . . .
[Hewitt, Savage 1955; Aldous 1983]2
![Page 22: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/22.jpg)
• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the
distribution of any N data points doesn’t change when permuted:
• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !
• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors
Nonparametric Bayes
p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))
X1, X2, . . .
p(X1, . . . , XN ) =
Z
✓
NY
n=1
p(Xn|✓)P (d✓)
[Hewitt, Savage 1955; Aldous 1983]2
![Page 23: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/23.jpg)
• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the
distribution of any N data points doesn’t change when permuted:
• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !
• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors
Nonparametric Bayes
p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))
p(X1, . . . , XN ) =
Z
✓
NY
n=1
p(Xn|✓)P (d✓)
X1, X2, . . .
[Hewitt, Savage 1955; Aldous 1983]2
![Page 24: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/24.jpg)
• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the
distribution of any N data points doesn’t change when permuted:
• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !
• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors
Nonparametric Bayes
p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))
p(X1, . . . , XN ) =
Z
✓
NY
n=1
p(Xn|✓)P (d✓)
X1, X2, . . .
[Hewitt, Savage 1955; Aldous 1983]2
![Page 25: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/25.jpg)
• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the
distribution of any N data points doesn’t change when permuted:
• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !
• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors
Nonparametric Bayes
p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))
p(X1, . . . , XN ) =
Z
✓
NY
n=1
p(Xn|✓)P (d✓)
X1, X2, . . .
[Hewitt, Savage 1955; Aldous 1983]2
![Page 26: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/26.jpg)
• A theoretical motivation: De Finetti’s Theorem • A data sequence is infinitely exchangeable if the
distribution of any N data points doesn’t change when permuted:
• De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable if and only if, for all N and some distribution P: !
• Motivates: • Parameters and likelihoods • Priors • “Nonparametric Bayesian” priors
Nonparametric Bayes
p(X1, . . . , XN ) = p(X�(1), . . . , X�(N))
p(X1, . . . , XN ) =
Z
✓
NY
n=1
p(Xn|✓)P (d✓)
X1, X2, . . .
[Hewitt, Savage 1955; Aldous 1983]2
![Page 27: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/27.jpg)
Outline• Dirichlet process
• Background for intuition • Generative model • What does a growing/infinite number of parameters
really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric
Bayesian statistics
3
![Page 28: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/28.jpg)
Outline• Dirichlet process
• Background for intuition • Generative model • What does a growing/infinite number of parameters
really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric
Bayesian statistics
3
![Page 29: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/29.jpg)
Outline• Dirichlet process
• Background for intuition • Generative model • What does a growing/infinite number of parameters
really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric
Bayesian statistics
3
![Page 30: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/30.jpg)
Outline• Dirichlet process
• Background for intuition • Generative model • What does a growing/infinite number of parameters
really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric
Bayesian statistics
3
![Page 31: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/31.jpg)
Outline• Dirichlet process
• Background for intuition • Generative model • What does a growing/infinite number of parameters
really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric
Bayesian statistics
3
![Page 32: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/32.jpg)
Outline• Dirichlet process
• Background for intuition • Generative model • What does a growing/infinite number of parameters
really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric
Bayesian statistics
3
![Page 33: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/33.jpg)
Outline• Dirichlet process
• Background for intuition • Generative model • What does a growing/infinite number of parameters
really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric
Bayesian statistics
3
![Page 34: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/34.jpg)
Outline• Dirichlet process
• Background for intuition • Generative model • What does a growing/infinite number of parameters
really mean (in Nonparametric Bayes)? • Chinese restaurant process • Inference • Venture further into the wild world of Nonparametric
Bayesian statistics
3
![Page 35: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/35.jpg)
Generative model
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 36: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/36.jpg)
Generative model• Finite Gaussian mixture
model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 37: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/37.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
• Inference goal: assignments of data points to clusters, cluster parameters
• Finite Gaussian mixture model (K=2 clusters)
4
![Page 38: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/38.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 39: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/39.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
⇢1 ⇢2
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 40: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/40.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
xnindep⇠ N (µzn ,⌃)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
⇢1 ⇢2
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 41: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/41.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
xnindep⇠ N (µzn ,⌃)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
⇢1 ⇢2
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 42: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/42.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
xnindep⇠ N (µzn ,⌃)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
⇢1 ⇢2
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 43: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/43.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
xnindep⇠ N (µzn ,⌃)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
⇢1 ⇢2
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 44: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/44.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
xnindep⇠ N (µzn ,⌃)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
⇢1 ⇢2
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 45: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/45.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
xnindep⇠ N (µzn ,⌃)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
⇢1 ⇢2
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 46: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/46.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
xnindep⇠ N (µzn ,⌃)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 47: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/47.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
xnindep⇠ N (µzn ,⌃)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
⇢1 ⇢2
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 48: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/48.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
xnindep⇠ N (µzn ,⌃)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
⇢1 ⇢2
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 49: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/49.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K=2 clusters)
• Don’t know µ1, µ2
• Don’t know ⇢1, ⇢2
zniid⇠ Categorical(⇢1, ⇢2)
xnindep⇠ N (µzn ,⌃)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇠ Beta(a1, a2)⇢2 = 1� ⇢1
⇢1 ⇢2
• Inference goal: assignments of data points to clusters, cluster parameters
4
![Page 50: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/50.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • • [R demo]
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
��(m) = (m� 1)!
�(x) = x�(x� 1)
⇢1 2 (0, 1)
5
![Page 51: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/51.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • • [R demo]
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
��(m) = (m� 1)!
�(x) = x�(x� 1)
⇢1 2 (0, 1)
5
![Page 52: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/52.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • • [R demo]
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
�(x) = x�(x� 1)
⇢1 2 (0, 1)
5
�(m+ 1) = m!
![Page 53: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/53.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • • [R demo]
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
⇢1 2 (0, 1)
5
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 54: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/54.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • • [R demo]
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
⇢1 2 (0, 1)
5
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 55: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/55.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • • [R demo]
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
ρ1
dens
ity
⇢1 2 (0, 1)
5
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 56: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/56.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • • [R demo]
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
ρ1
dens
ity
⇢1 2 (0, 1)
5
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 57: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/57.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • • [R demo]
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
ρ1
dens
ity
⇢1 2 (0, 1)
5
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 58: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/58.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • •
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
ρ1
dens
ity
a1 > a2
⇢1 2 (0, 1)
5
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 59: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/59.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • •
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
ρ1
dens
ity
a1 > a2
⇢1 2 (0, 1)
5
[demo]
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 60: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/60.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • •
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
ρ1
dens
ity
a1 > a2
⇢1 2 (0, 1)
5
[demo]
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 61: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/61.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • •
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1
dens
ity
a1 > a2
⇢1 2 (0, 1)
5
[demo]
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 62: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/62.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • •
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1
dens
ity
a1 > a2
p(⇢1, z) / ⇢1{z=1}1 (1� ⇢1)
1{z=2}⇢a1�11 (1� ⇢1)
a2�1
⇢1 2 (0, 1)
5
[demo]
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 63: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/63.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • •
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1
dens
ity
a1 > a2
p(⇢1, z) / ⇢1{z=1}1 (1� ⇢1)
1{z=2}⇢a1�11 (1� ⇢1)
a2�1
⇢1 2 (0, 1)
5
[demo]
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 64: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/64.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • •
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1
dens
ity
a1 > a2
p(⇢1, z) / ⇢1{z=1}1 (1� ⇢1)
1{z=2}⇢a1�11 (1� ⇢1)
a2�1
⇢1 2 (0, 1)
5
[demo]
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 65: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/65.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • •
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1
dens
ity
a1 > a2
p(⇢1, z) / ⇢1{z=1}1 (1� ⇢1)
1{z=2}⇢a1�11 (1� ⇢1)
a2�1
⇢1 2 (0, 1)
5
[demo]
p(⇢1|z) / ⇢a1+1{z=1}�11 (1� ⇢1)
a2+1{z=2}�1 / Beta(⇢1|a1 + 1{z = 1}, a2 + 1{z = 2})
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 66: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/66.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • •
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1
dens
ity
a1 > a2
p(⇢1, z) / ⇢1{z=1}1 (1� ⇢1)
1{z=2}⇢a1�11 (1� ⇢1)
a2�1
⇢1 2 (0, 1)
5
[demo]
p(⇢1|z) / ⇢a1+1{z=1}�11 (1� ⇢1)
a2+1{z=2}�1 / Beta(⇢1|a1 + 1{z = 1}, a2 + 1{z = 2})
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 67: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/67.jpg)
Beta distribution reviewBeta(⇢1|a1, a2) =
�(a1 + a2)
�(a1)�(a2)⇢a1�11 (1� ⇢1)
a2�1 a1, a2 > 0
• Gamma function • integer m: • for x > 0:
• What happens? • • •
• Beta is conjugate to Cat
a = a1 = a2 ! 0a = a1 = a2 ! 1
�
⇢1 ⇠ Beta(a1, a2), z ⇠ Cat(⇢1, ⇢2)ρ1
dens
ity
a1 > a2
p(⇢1, z) / ⇢1{z=1}1 (1� ⇢1)
1{z=2}⇢a1�11 (1� ⇢1)
a2�1
⇢1 2 (0, 1)
5
[demo]
p(⇢1|z) / ⇢a1+1{z=1}�11 (1� ⇢1)
a2+1{z=2}�1 / Beta(⇢1|a1 + 1{z = 1}, a2 + 1{z = 2})
�(m+ 1) = m!�(x+ 1) = x�(x)
![Page 68: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/68.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K clusters)
6
![Page 69: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/69.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K clusters)
6
![Page 70: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/70.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K clusters)
⇢1 ⇢2
⇢1:K ⇠ Dirichlet(a1:K)
⇢36
![Page 71: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/71.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K clusters)
⇢1 ⇢2
⇢1:K ⇠ Dirichlet(a1:K)
⇢36
µkiid⇠ N (µ0,⌃0)
![Page 72: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/72.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K clusters)
⇢1 ⇢2
⇢1:K ⇠ Dirichlet(a1:K)
⇢36
µkiid⇠ N (µ0,⌃0)
zniid⇠ Categorical(⇢1:K)
![Page 73: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/73.jpg)
Generative modelP(parameters|data) / P(data|parameters)P(parameters)
• Finite Gaussian mixture model (K clusters)
xnindep⇠ N (µzn ,⌃)
µkiid⇠ N (µ0,⌃0)
⇢1 ⇢2
zniid⇠ Categorical(⇢1:K)
⇢1:K ⇠ Dirichlet(a1:K)
⇢36
![Page 74: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/74.jpg)
Dirichlet distribution reviewDirichlet(⇢1:K |a1:K) =
�(PK
k=1 ak)QKk=1 �(ak)
KY
k=1
⇢ak�1k ak > 0
a = ak ! 0 a = ak ! 1a = ak = 1
7
![Page 75: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/75.jpg)
Dirichlet distribution reviewDirichlet(⇢1:K |a1:K) =
�(PK
k=1 ak)QKk=1 �(ak)
KY
k=1
⇢ak�1k ak > 0
a = ak ! 0 a = ak ! 1a = ak = 1
⇢k 2 (0, 1)X
k
⇢k = 1
7
![Page 76: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/76.jpg)
Dirichlet distribution reviewDirichlet(⇢1:K |a1:K) =
�(PK
k=1 ak)QKk=1 �(ak)
KY
k=1
⇢ak�1k ak > 0
a = ak ! 0 a = ak ! 1a = ak = 1
7
![Page 77: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/77.jpg)
Dirichlet distribution review
• What happens? • Dirichlet is conjugate to Categorical
Dirichlet(⇢1:K |a1:K) =�(
PKk=1 ak)QK
k=1 �(ak)
KY
k=1
⇢ak�1k ak > 0
a = ak ! 0 a = ak ! 1a = ak = 1
7
![Page 78: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/78.jpg)
Dirichlet distribution review
• What happens? • Dirichlet is conjugate to Categorical
Dirichlet(⇢1:K |a1:K) =�(
PKk=1 ak)QK
k=1 �(ak)
KY
k=1
⇢ak�1k ak > 0
a = ak ! 0 a = ak ! 1a = ak = 1
a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)
ρ1
dens
ity
ρ2
7
![Page 79: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/79.jpg)
Dirichlet distribution review
• What happens? • Dirichlet is conjugate to Categorical
Dirichlet(⇢1:K |a1:K) =�(
PKk=1 ak)QK
k=1 �(ak)
KY
k=1
⇢ak�1k ak > 0
a = ak ! 0 a = ak ! 1a = ak = 1
a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)
ρ1
dens
ity
ρ2
7
![Page 80: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/80.jpg)
Dirichlet distribution review
• What happens? • Dirichlet is conjugate to Categorical
Dirichlet(⇢1:K |a1:K) =�(
PKk=1 ak)QK
k=1 �(ak)
KY
k=1
⇢ak�1k ak > 0
a = ak ! 0 a = ak ! 1a = ak = 1
a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)
ρ1
dens
ity
ρ2
7
![Page 81: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/81.jpg)
Dirichlet distribution review
• What happens? • Dirichlet is conjugate to Categorical
Dirichlet(⇢1:K |a1:K) =�(
PKk=1 ak)QK
k=1 �(ak)
KY
k=1
⇢ak�1k ak > 0
a = ak ! 0 a = ak ! 1a = ak = 1
a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)
ρ1
dens
ity
ρ2
7
![Page 82: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/82.jpg)
Dirichlet distribution review
• What happens? • Dirichlet is conjugate to Categorical
Dirichlet(⇢1:K |a1:K) =�(
PKk=1 ak)QK
k=1 �(ak)
KY
k=1
⇢ak�1k ak > 0
a = ak ! 0 a = ak ! 1a = ak = 1
a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)
ρ1
dens
ity
ρ2
[demo]
7
![Page 83: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/83.jpg)
Dirichlet distribution review
• What happens? • Dirichlet is conjugate to Categorical
Dirichlet(⇢1:K |a1:K) =�(
PKk=1 ak)QK
k=1 �(ak)
KY
k=1
⇢ak�1k ak > 0
a = ak ! 0 a = ak ! 1a = ak = 1
a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)
ρ1
dens
ity
ρ2
[demo]
7
![Page 84: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/84.jpg)
Dirichlet distribution review
• What happens? • Dirichlet is conjugate to Categorical
Dirichlet(⇢1:K |a1:K) =�(
PKk=1 ak)QK
k=1 �(ak)
KY
k=1
⇢ak�1k ak > 0
a = ak ! 0 a = ak ! 1
⇢1:K ⇠ Dirichlet(a1:K), z ⇠ Cat(⇢1:K)
a = ak = 1
a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)
ρ1
dens
ity
ρ2
[demo]
7
![Page 85: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/85.jpg)
Dirichlet distribution review
• What happens? • Dirichlet is conjugate to Categorical
Dirichlet(⇢1:K |a1:K) =�(
PKk=1 ak)QK
k=1 �(ak)
KY
k=1
⇢ak�1k ak > 0
a = ak ! 0 a = ak ! 1
⇢1:K ⇠ Dirichlet(a1:K), z ⇠ Cat(⇢1:K)
⇢1:K |z d= Dirichlet(a01:K), a0k = ak + 1{z = k}
a = ak = 1
a = (0.5,0.5,0.5) a = (5,5,5) a = (40,10,10)
ρ1
dens
ity
ρ2
[demo]
7
![Page 86: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/86.jpg)
What if K > N ?• e.g. species sampling, topic modeling, groups on a
social network, etc.
⇢1 ⇢2 ⇢3
…
⇢1000
• Components: number of latent groups
• Clusters: number of components represented in the data
• Number of clusters for N data points is < K and random
• Number of clusters grows with N
8
![Page 87: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/87.jpg)
What if K > N ?
⇢1 ⇢2 ⇢3
…
⇢1000
• Components: number of latent groups
• Clusters: number of components represented in the data
• Number of clusters for N data points is < K and random
• Number of clusters grows with N
8
![Page 88: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/88.jpg)
• e.g. species sampling, topic modeling, groups on a social network, etc.
⇢1 ⇢2 ⇢3
…
⇢1000
• Components: number of latent groups
• Clusters: number of components represented in the data
• Number of clusters for N data points is < K and random
• Number of clusters grows with N
What if K > N ?
8
![Page 89: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/89.jpg)
• e.g. species sampling, topic modeling, groups on a social network, etc.
⇢1 ⇢2 ⇢3
…
⇢1000
• Components: number of latent groups
• Clusters: number of components represented in the data
• Number of clusters for N data points is < K and random
• Number of clusters grows with N
What if K > N ?
8
![Page 90: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/90.jpg)
• e.g. species sampling, topic modeling, groups on a social network, etc.
⇢1 ⇢2 ⇢3
…
⇢1000
• Components: number of latent groups
• Clusters: number of components represented in the data
• Number of clusters for N data points is < K and random
• Number of clusters grows with N
What if K > N ?
8
![Page 91: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/91.jpg)
• e.g. species sampling, topic modeling, groups on a social network, etc.
⇢1 ⇢2 ⇢3
…
⇢1000
• Components: number of latent groups
• Clusters: number of components represented in the data
• [demo 1, demo 2]
• Number of clusters for N data points is < K and random
• Number of clusters grows with N
What if K > N ?
8
![Page 92: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/92.jpg)
• e.g. species sampling, topic modeling, groups on a social network, etc.
⇢1 ⇢2 ⇢3
…
⇢1000
• Components: number of latent groups
• Clusters: number of components represented in the data
• [demo 1, demo 2]
• Number of clusters for N data points is < K and random
• Number of clusters grows with N
What if K > N ?
8
![Page 93: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/93.jpg)
• e.g. species sampling, topic modeling, groups on a social network, etc.
⇢1 ⇢2 ⇢3
…
⇢1000
• Components: number of latent groups
• Clusters: number of components represented in the data
• [demo 1, demo 2]
• Number of clusters for N data points is < K and random
• Number of clusters grows with N
What if K > N ?
8
![Page 94: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/94.jpg)
• Here, difficult to choose finite K in advance (contrast with small K): don’t know K, difficult to infer, streaming data
• How to generate K = ∞ strictly positive frequencies that sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
9
![Page 95: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/95.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
9
![Page 96: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/96.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
9
![Page 97: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/97.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
9
![Page 98: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/98.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
9
![Page 99: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/99.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
9
![Page 100: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/100.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
9
![Page 101: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/101.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
9
![Page 102: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/102.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
9
![Page 103: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/103.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
9
![Page 104: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/104.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
• “Stick breaking”
9
![Page 105: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/105.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
V1 ⇠ Beta(a1, a2 + a3 + a4)
• “Stick breaking”
9
![Page 106: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/106.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
V1 ⇠ Beta(a1, a2 + a3 + a4) ⇢1 = V1
• “Stick breaking”
9
![Page 107: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/107.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
V1 ⇠ Beta(a1, a2 + a3 + a4) ⇢1 = V1
V2 ⇠ Beta(a2, a3 + a4)
• “Stick breaking”
9
![Page 108: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/108.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
V1 ⇠ Beta(a1, a2 + a3 + a4) ⇢1 = V1
V2 ⇠ Beta(a2, a3 + a4)
• “Stick breaking”
⇢2 = (1� V1)V2
9
![Page 109: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/109.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
V1 ⇠ Beta(a1, a2 + a3 + a4) ⇢1 = V1
V2 ⇠ Beta(a2, a3 + a4) ⇢2 = (1� V1)V2
V3 ⇠ Beta(a3, a4)
• “Stick breaking”
9
![Page 110: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/110.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
V1 ⇠ Beta(a1, a2 + a3 + a4) ⇢1 = V1
V2 ⇠ Beta(a2, a3 + a4) ⇢2 = (1� V1)V2
V3 ⇠ Beta(a3, a4) ⇢3 = (1� V1)(1� V2)V3
• “Stick breaking”
9
![Page 111: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/111.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)) ⇢1
d= Beta(a1,
KX
k=1
ak � a1)
V1 ⇠ Beta(a1, a2 + a3 + a4) ⇢1 = V1
V2 ⇠ Beta(a2, a3 + a4) ⇢2 = (1� V1)V2
V3 ⇠ Beta(a3, a4) ⇢3 = (1� V1)(1� V2)V3
⇢4 = 1�3X
k=1
⇢k
• “Stick breaking”
9
![Page 112: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/112.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Observation: ⇢1:K ⇠ Dirichlet(a1:K)
10
![Page 113: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/113.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
…
ak = 1, bk = ↵ > 0
10
![Page 114: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/114.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
ak = 1, bk = ↵ > 0
10
![Page 115: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/115.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
ak = 1, bk = ↵ > 0
V1 ⇠ Beta(a1, b1)
10
![Page 116: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/116.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
ak = 1, bk = ↵ > 0
V1 ⇠ Beta(a1, b1) ⇢1 = V1
10
![Page 117: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/117.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
ak = 1, bk = ↵ > 0
V1 ⇠ Beta(a1, b1)
V2 ⇠ Beta(a2, b2)
⇢1 = V1
10
![Page 118: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/118.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
ak = 1, bk = ↵ > 0
V1 ⇠ Beta(a1, b1)
V2 ⇠ Beta(a2, b2)
⇢1 = V1
⇢2 = (1� V1)V2
10
![Page 119: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/119.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
ak = 1, bk = ↵ > 0
V1 ⇠ Beta(a1, b1)
V2 ⇠ Beta(a2, b2)
⇢1 = V1
⇢2 = (1� V1)V2
10
![Page 120: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/120.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
…
ak = 1, bk = ↵ > 0
V1 ⇠ Beta(a1, b1)
V2 ⇠ Beta(a2, b2)
⇢1 = V1
⇢2 = (1� V1)V2
10
![Page 121: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/121.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
…
ak = 1, bk = ↵ > 0
V1 ⇠ Beta(a1, b1)
V2 ⇠ Beta(a2, b2)
Vk ⇠ Beta(ak, bk)
⇢1 = V1
⇢2 = (1� V1)V2
10
![Page 122: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/122.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
…
ak = 1, bk = ↵ > 0
V1 ⇠ Beta(a1, b1)
V2 ⇠ Beta(a2, b2)
Vk ⇠ Beta(ak, bk)
⇢1 = V1
⇢2 = (1� V1)V2
⇢k =
2
4k�1Y
j=1
(1� Vj)
3
5Vk
10
![Page 123: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/123.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
…
ak = 1, bk = ↵ > 0
V1 ⇠ Beta(a1, b1)
V2 ⇠ Beta(a2, b2)
Vk ⇠ Beta(ak, bk)
⇢1 = V1
⇢2 = (1� V1)V2
⇢k =
2
4k�1Y
j=1
(1� Vj)
3
5Vk
[Ishwaran, James 2001]10
![Page 124: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/124.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
…
ak = 1, bk = ↵ > 0
V1 ⇠ Beta(a1, b1)
V2 ⇠ Beta(a2, b2)
Vk ⇠ Beta(ak, bk)
⇢1 = V1
⇢2 = (1� V1)V2
⇢k =
2
4k�1Y
j=1
(1� Vj)
3
5Vk
[Ishwaran, James 2001]10
![Page 125: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/125.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
…
V1 ⇠ Beta(a1, b1)
V2 ⇠ Beta(a2, b2)
Vk ⇠ Beta(ak, bk)
⇢1 = V1
⇢2 = (1� V1)V2
⇢k =
2
4k�1Y
j=1
(1� Vj)
3
5Vk
ak = 1, bk = ↵ > 0
⇢ = (⇢1, ⇢2, . . .) ⇠ GEM(↵)
[McCloskey 1965; Engen 1975; Patil and Taillie 1977; Ewens 1987; Sethuraman 1994; Ishwaran, James 2001]10
![Page 126: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/126.jpg)
Choosing K = ∞• Here, difficult to choose finite K in advance (contrast with
small K): don’t know K, difficult to infer, streaming data • How to generate K = ∞ strictly positive frequencies that
sum to one? • Dirichlet process stick-breaking: • Griffiths-Engen-McCloskey (GEM) distribution:
…
ak = 1, bk = ↵ > 0
⇢ = (⇢1, ⇢2, . . .) ⇠ GEM(↵)
[McCloskey 1965; Engen 1975; Patil and Taillie 1977; Ewens 1987; Sethuraman 1994; Ishwaran, James 2001]
Vkiid⇠ Beta(1,↵) ⇢k =
2
4k�1Y
j=1
(1� Vj)
3
5Vk
10
![Page 127: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/127.jpg)
Distributions• Beta → random distribution
over
• Dirichlet → random distribution over
• GEM / Dirichlet stick-breaking → random distribution over
• Dirichlet process → random distribution over :
1, 2
1, 2, . . . ,K
1, 2, . . .
�
11
![Page 128: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/128.jpg)
Distributions• Beta → random distribution
over
• Dirichlet → random distribution over
• GEM / Dirichlet stick-breaking → random distribution over
• Dirichlet process → random distribution over :
1, 2
1, 2, . . . ,K
1, 2, . . .
�
11
![Page 129: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/129.jpg)
Distributions• Beta → random distribution
over
• Dirichlet → random distribution over
• GEM / Dirichlet stick-breaking → random distribution over
• Dirichlet process → random distribution over :
1, 2
1, 2, . . . ,K
1, 2, . . .
1 2
�
11
![Page 130: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/130.jpg)
Distributions• Beta → random distribution
over
• Dirichlet → random distribution over
• GEM / Dirichlet stick-breaking → random distribution over
• Dirichlet process → random distribution over :
1, 2
1, 2, . . . ,K
1, 2, . . .
1 2
�
11
![Page 131: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/131.jpg)
Distributions• Beta → random distribution
over
• Dirichlet → random distribution over
• GEM / Dirichlet stick-breaking → random distribution over
• Dirichlet process → random distribution over :
1, 2
1, 2, . . . ,K
1, 2, . . .
1 2
1 2 3 4
�
11
![Page 132: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/132.jpg)
Distributions• Beta → random distribution
over
• Dirichlet → random distribution over
• GEM / Dirichlet process stick-breaking → random distribution over
• Dirichlet process → random distribution over :
1, 2
1, 2, . . . ,K
1, 2, . . .
1 2
1 2 3 4
�
11
![Page 133: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/133.jpg)
Distributions• Beta → random distribution
over
• Dirichlet → random distribution over
• GEM / Dirichlet process stick-breaking → random distribution over
• Dirichlet process → random distribution over :
1, 2
1, 2, . . . ,K
1, 2, . . .
1 2
1 2 3 4
1 2 3 4 ……
�
11
![Page 134: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/134.jpg)
Distributions• Beta → random distribution
over
• Dirichlet → random distribution over
• GEM / Dirichlet process stick-breaking → random distribution over
• Dirichlet process → random distribution over :
1, 2
1, 2, . . . ,K
1, 2, . . .
1 2
1 2 3 4
1 2 3 4 ……
�• Infinity of parameters: components • Growing number of parameters: clusters
11
![Page 135: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/135.jpg)
Exercises• Prove the Dirichlet is conjugate to the categorical
• What is the posterior after N data points? • Suppose ; prove that
!
!
• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ
⇢1:K ⇠ Dirichlet(a1:K)
) ⇢1d= Beta(a1,
KX
k=1
ak � a1)?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)
• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case
• How do the two compare when you change α?
12
1 2 3 4 ……
![Page 136: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/136.jpg)
Exercises• Prove the beta (Dirichlet) is conjugate to the categorical
• What is the posterior after N data points? • Suppose ; prove that
!
!
• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ
⇢1:K ⇠ Dirichlet(a1:K)
) ⇢1d= Beta(a1,
KX
k=1
ak � a1)?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)
• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case
• How do the two compare when you change α?
1 2 3 4 ……
12
![Page 137: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/137.jpg)
Exercises• Prove the beta (Dirichlet) is conjugate to the categorical
• What is the posterior after N data points? • Suppose ; prove that
!
!
• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ
⇢1:K ⇠ Dirichlet(a1:K)
) ⇢1d= Beta(a1,
KX
k=1
ak � a1)?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)
• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case
• How do the two compare when you change α?
1 2 3 4 ……
12
![Page 138: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/138.jpg)
Exercises• Prove the beta (Dirichlet) is conjugate to the categorical
• What is the posterior after N data points? • Suppose ; prove that
!
!
• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ
⇢1:K ⇠ Dirichlet(a1:K)
) ⇢1d= Beta(a1,
KX
k=1
ak � a1)?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)
• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case
• How do the two compare when you change α?
1 2 3 4 ……
12
![Page 139: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/139.jpg)
Exercises• Prove the beta (Dirichlet) is conjugate to the categorical
• What is the posterior after N data points? • Suppose ; prove that
!
!
• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ
⇢1:K ⇠ Dirichlet(a1:K)
) ⇢1d= Beta(a1,
KX
k=1
ak � a1)?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)
• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case
• How do the two compare when you change α?
1 2 3 4 ……
12
![Page 140: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/140.jpg)
Exercises• Prove the beta (Dirichlet) is conjugate to the categorical
• What is the posterior after N data points? • Suppose ; prove that
!
!
• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ
⇢1:K ⇠ Dirichlet(a1:K)
) ⇢1d= Beta(a1,
KX
k=1
ak � a1)?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)
• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case
• How do the two compare when you change α?
1 2 3 4 ……
12
![Page 141: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/141.jpg)
Exercises• Prove the beta (Dirichlet) is conjugate to the categorical
• What is the posterior after N data points? • Suppose ; prove that
!
!
• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ
⇢1:K ⇠ Dirichlet(a1:K)
) ⇢1d= Beta(a1,
KX
k=1
ak � a1)?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)
• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case
• How do the two compare when you change α?
1 2 3 4 ……
12
![Page 142: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/142.jpg)
Exercises• Prove the beta (Dirichlet) is conjugate to the categorical
• What is the posterior after N data points? • Suppose ; prove that
!
!
• Code your own GEM simulator for ρ; why is this hard? • Simulate drawing cluster indicators (z) from your ρ
⇢1:K ⇠ Dirichlet(a1:K)
) ⇢1d= Beta(a1,
KX
k=1
ak � a1)?? (⇢2,...,⇢K)1�⇢1
d= Dirichlet(a2, . . . , aK)
• Compare the number of clusters as N changes in the GEM case with the growth in the K=1000 case
• How does the growth in N change when you change α?
1 2 3 4 ……
12
![Page 143: Nonparametric Bayesian Statistics - MIT · 2016. 5. 16. · •Bayesian statistics that is not parametric • Bayesian • Not parametric (i.e. not finite parameter, unbounded/ growing/infinite](https://reader034.vdocuments.net/reader034/viewer/2022051912/6002dec34c1d1734821b4410/html5/thumbnails/143.jpg)
ReferencesA full reference list is provided at the end of the “Part III” slides.