hierarchical topic models and the nested chinese restaurant process blei, griffiths, jordan,...
TRANSCRIPT
![Page 1: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/1.jpg)
Hierarchical Topic Models and the Nested Chinese Restaurant
ProcessBlei, Griffiths, Jordan, Tenenbaum
presented by Rodrigo de Salvo Braz
![Page 2: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/2.jpg)
Document classification
• One-class approach: one topic per document, with words generated according to the topic.
• For example, a Naive Bayes model.
![Page 3: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/3.jpg)
Document classification
• It is more realistic to assume more than one topic per document.
• Generative model: pick a mixture distribution over K topics and generate words from it.
![Page 4: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/4.jpg)
Document classification
• Even more realistic: topics may be organized in a hierarchy (not independent);
• Pick a path from root to leaf in a tree; each node is a topic; sample from the mixture.
![Page 5: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/5.jpg)
Dirichlet distribution (DD)
• Distribution over distribution vectors of dimension K:P(p; u, ) = 1/Z(u) i pi
ui
• Parameters are a prior distribution (“previous observations”);
• Symmetric Dirichlet distribution assumes a uniform prior distribution (ui = uj, any i, j).
![Page 6: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/6.jpg)
Latent Dirichlet Allocation (LDA)
• Generative model of multiple-topic documents;
• Generate a mixture distribution on topics using a Dirichlet distribution;
• Pick a topic according to their distribution and generate words according to the word distribution for the topic.
![Page 7: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/7.jpg)
Latent Dirichlet Allocation (LDA)
K
W
wWords
Topics
Topic distribution
DD hyper parameter
![Page 8: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/8.jpg)
Chinese Restaurant Process (CRP)
1 out of 9 customers
![Page 9: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/9.jpg)
Chinese Restaurant Process (CRP)
2 out of 9 customers
![Page 10: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/10.jpg)
Chinese Restaurant Process (CRP)
3 out of 9 customers
![Page 11: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/11.jpg)
Chinese Restaurant Process (CRP)
4 out of 9 customers
![Page 12: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/12.jpg)
Chinese Restaurant Process (CRP)
5 out of 9 customers
![Page 13: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/13.jpg)
Chinese Restaurant Process (CRP)
6 out of 9 customers
![Page 14: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/14.jpg)
Chinese Restaurant Process (CRP)
7 out of 9 customers
![Page 15: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/15.jpg)
Chinese Restaurant Process (CRP)
8 out of 9 customers
![Page 16: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/16.jpg)
Chinese Restaurant Process (CRP)
9 out of 9 customers
Data point (a distribution itself) sampled
![Page 17: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/17.jpg)
Species Sampling Mixture
• Generative model of multiple-topic documents;
• Generate a mixture distribution on topics using a CRP prior;
• Pick a topic according to their distribution and generate words according to the word distribution for the topic.
![Page 18: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/18.jpg)
Species Sampling Mixture
K
W
wWords
Topics
Topic distribution
CRP hyper parameter
![Page 19: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/19.jpg)
Nested CRP1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
![Page 20: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/20.jpg)
Hierarchical LDA (hLDA)
• Generative model of multiple-topic documents;• Generate a mixture distribution on topics using a
Nested CRP prior;• Pick a topic according to their distribution and
generate words according to the word distribution for the topic.
![Page 21: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/21.jpg)
hLDA graphical model
![Page 22: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/22.jpg)
Artificial data experiment
100 1000-word documents on 25-term vocabulary
Each vertical bar is a topic
![Page 23: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/23.jpg)
CRP prior vs. Bayes Factors
![Page 24: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/24.jpg)
Predicting the structure
![Page 25: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/25.jpg)
NIPS abstracts
![Page 26: Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz](https://reader030.vdocuments.net/reader030/viewer/2022032607/56649eca5503460f94bd79dd/html5/thumbnails/26.jpg)
Comments
• Accommodates growing collections of data;
• Hierarchical organization makes sense, but not clear to me why the CRP prior is the best prior for that;
• No mention of time; maybe it takes a very long time.