technical foundations and inference - topicmodels.infotopicmodels.info/ckling/tmt/part2.pdf ·...

25
Technical Foundations and Inference Topic Model Tutorial - Part 2 Hannover, 2016 Arnim Bleier [email protected]

Upload: others

Post on 22-May-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Technical Foundationsand Inference

Topic Model Tutorial - Part 2 Hannover, 2016

Arnim [email protected]

2

● Probabilistic Graphical Models are a general framework to represent assumptions about the (in-) dependence between random variables.

● Knowing the inner workings of Topic Models helps us to better interpret their results.

Why should we care?

3

Outline

● Generative storylines & Plates

● Gibbs sampling

● Simple Topic Model

● Latent Dirichlet Allocation

4

Recap: Conference dinner

?

k 1 k 2k 1 k 3510

210

310

for k 1

for k 2

Probabilities:

for k 3

5

Recap: Conference dinner

k 1 k 2k 1 k 3

?

=normalizing constant

number of observations in k

General case:

6

Generative Storyline

=

N+1

7

Generative Storyline

N+1

prior

8

Plate Notation

i ;

9

Gibbs sampling

X

Iteratively sample each variable conditioned on all other variables.

10

Gibbs sampling

X

prior

iterations

stationarydistribution

11

Simple Topic Model

Generative Storyline:

d

Draw a global distribution over topics.

For each document ddraw a topic.

12

Simple Topic Model

Generative Storyline:

d

For each topic k, draw a distribution over the vocabulary.

dw

For each document ddraw the words w from the topic

indexed by z .

d

d

d

* Mixture of Unigrams

*

13

Likelihood of document d being generated from topic k.

Simple Topic Model

*

* Approximation not considering the dependence of words within documents.

d

di=1

d

d

di

d

14

Simple Topic Model

*

d

di=1

d

d

di

d= ?

We need to know from which topic k document d was generated.

Global distribution over topics.

topics

document

15

Simple Topic Model

document

*

d

di=1

d

d

di

d=

16

Simple Topic Model

document

*

d

di=1

d

d

di

d=

17

Simple Topic Model

document

*

d

di=1

d

d

di

d=

18

Simple Topic Model

document

*

d

di=1

d

d

di

d=

We can now sample the membership for document d and update the model.

19

Latent Dirichlet Allocation

Generative Storyline:

20

Latent Dirichlet Allocation

Generative Storyline:

Document specific distribution over topics.

21

Latent Dirichlet Allocation

Likelihood of word i in document d being generated from topic k.

22

(Simple Topic Model)Associated Press Topics

23

Associated Press Topics(LDA)

24

Conclusions

● Topic Models can be formulated within the wider framework of Probabilistic Graphical Models.

● Different versions of Topic Models can be formulated.

● More complex models are not necessarily better.

● However, more complex models can help to express assumptions about the dataset.

Thank you!

25

References

● M. Steyvers, T. Griffiths. Latent Semantic Analysis: A Road to Meaning, chap. Probabilistic topic models, 2007

● Heinrich, Gregor. Parameter estimation for text analysis, 2008.

● P. Resnik, E. Hardisty. Gibbs sampling for the uninitiated, 2010.

● M. D. Lee, E. J. Wagenmakers. Bayesian cognitive modeling: A practical course, 2014.

● S. Jackman. Bayesian analysis for the social sciences , 2009.