department of computer science csci 5622: machine learning ... · •a general way to model count...

53
Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 20: Topic modeling and variational inferrence Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1

Upload: others

Post on 10-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Department of Computer ScienceCSCI 5622: Machine Learning

Chenhao TanLecture 20: Topic modeling and variational inferrence

Slides adapted from Jordan Boyd-Graber, Chris Ketelsen

1

Page 2: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Administrivia

• Poster printing (stay tuned!)• HW 5 (final homework) is due next Friday!• Midpoint feedback

2

Page 3: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Learning Objectives

• Learn about latent Dirichlet allocation

• Understand the inituion behind variational inference

3

Page 4: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Topic models

• Discrete count data

4

Page 5: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Topic models

5

• Suppose you have a huge number of documents

• Want to know what's going on• Can't read them all (e.g. every New

York Times article from the 90's)• Topic models offer a way to get a

corpus-level view of major themes• Unsupervised

Page 6: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Why should you care?

• Neat way to explore/understand corpus collections• E-discovery• Social media• Scientific data

• NLP Applications• Word sense disambiguation• Discourse segmentation

• Psychology: word meaning, polysemy• A general way to model count data and a general inference

algorithm6

Page 7: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Conceptual approach

• Input: a text corpus and number of topics K • Output:

• K topics, each topic is a list of words• Topic assignment for each document

7

Forget the Bootleg, Just Download the Movie LegallyMultiplex Heralded As

Linchpin To GrowthThe Shape of Cinema, Transformed At the Click of

a MouseA Peaceful Crew Puts

Muppets Where Its Mouth IsStock Trades: A Better Deal For Investors Isn't SimpleThe three big Internet portals begin to distinguish

among themselves as shopping malls

Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens

Corpus

Page 8: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Conceptual approach

8

computer,

technology,

system,

service, site,

phone,

internet,

machine

play, film,

movie, theater,

production,

star, director,

stage

sell, sale,

store, product,

business,

advertising,

market,

consumer

TOPIC 1 TOPIC 2 TOPIC 3

• K topics, each topic is a list of words

Page 9: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Conceptual approach

9

• Topic assignment for each document

Forget the Bootleg, Just Download the Movie Legally

Multiplex Heralded As Linchpin To

GrowthThe Shape of

Cinema, Transformed At the Click of a

Mouse A Peaceful Crew Puts Muppets

Where Its Mouth Is

Stock Trades: A Better Deal For Investors Isn't

Simple

Internet portals begin to distinguish among themselves as shopping malls

Red Light, Green Light: A

2-Tone L.E.D. to Simplify Screens

TOPIC 2"BUSINESS"

TOPIC 3"ENTERTAINMENT"

TOPIC 1"TECHNOLOGY"

Page 10: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Topics from Science

10

Page 11: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Topic models

• Discrete count data• Gaussian distributions are not appropriate

11

Page 12: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Generative model: Latent Dirichlet Allocation• Generate a document, or a bag of words• Blei, Ng, Jordan. Latent Dirichlet Allocation. JMLR, 2003.

12

Page 13: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Generative model: Latent Dirichlet Allocation• Generate a document, or a bag

of words• Multinomial distribution

• Distribution over discrete outcomes

• Represented by non-negative vector that sums to one

• Picture representation

13

(1,0,0) (0,0,1)

(1/2,1/2,0)(1/3,1/3,1/3) (1/4,1/4,1/2)

(0,1,0)

Page 14: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Generative model: Latent Dirichlet Allocation• Generate a document, or a bag

of words• Multinomial distribution

• Distribution over discrete outcomes

• Represented by non-negative vector that sums to one

• Picture representation• Come from a Dirichlet distribution

14

(1,0,0) (0,0,1)

(1/2,1/2,0)(1/3,1/3,1/3) (1/4,1/4,1/2)

(0,1,0)

Page 15: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Generative story

15

computer,

technology,

system,

service, site,

phone,

internet,

machine

play, film,

movie, theater,

production,

star, director,

stage

sell, sale,

store, product,

business,

advertising,

market,

consumer

TOPIC 1

TOPIC 2

TOPIC 3

Page 16: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Generative story

16

Forget the Bootleg, Just Download the Movie Legally

Multiplex Heralded As Linchpin To Growth

The Shape of Cinema, Transformed At the Click of

a Mouse

A Peaceful Crew Puts Muppets Where Its Mouth Is

Stock Trades: A Better Deal For Investors Isn't Simple

The three big Internet portals begin to distinguish

among themselves as shopping mallsRed Light, Green Light: A

2-Tone L.E.D. to Simplify Screens

TOPIC 2

TOPIC 3

TOPIC 1

Page 17: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Generative story

17

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

Page 18: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Generative story

18

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

Page 19: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Generative story

19

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

Page 20: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Generative story

20

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

Page 21: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Missing component: how to generate a multinomial distribution

21

Page 22: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Missing component: how to generate a multinomial distribution

22

What are Topic Models?

Dirichlet Distribution

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 12 of 26

Page 23: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Missing component: how to generate a multinomial distribution

23

Page 24: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Conjugacy of Dirichlet and Multinomial

24

What are Topic Models?

Dirichlet Distribution

• If � ⇠ Dir(↵), w ⇠ Mult(�), and nk = |{wi : wi = k}| then

p(�|↵,w) / p(w |�)p(�|↵) (1)

/Y

k

�nkY

k

�↵k�1 (2)

/Y

k

�↵k+nk�1 (3)

• Conjugacy: this posterior has the same form as the prior

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 14 of 26

Page 25: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Conjugacy of Dirichlet and Multinomial

25

What are Topic Models?

Dirichlet Distribution

• If � ⇠ Dir(↵), w ⇠ Mult(�), and nk = |{wi : wi = k}| then

p(�|↵,w) / p(w |�)p(�|↵) (1)

/Y

k

�nkY

k

�↵k�1 (2)

/Y

k

�↵k+nk�1 (3)

• Conjugacy: this posterior has the same form as the prior

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 14 of 26

Page 26: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Making the generative story formal

26

What are Topic Models?

Generative Model Approach

MNθd zn wn

Kβk

α

λ

• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �

• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵

• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.

• Choose the observed word wn from the distribution �zn .

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26

Page 27: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

27

What are Topic Models?

Generative Model Approach

MNθd zn wn

Kβk

α

λ

• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �

• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵

• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.

• Choose the observed word wn from the distribution �zn .

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26

Making the generative story formal

Page 28: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

28

Making the generative story formalWhat are Topic Models?

Generative Model Approach

MNθd zn wn

Kβk

α

λ

• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �

• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵

• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.

• Choose the observed word wn from the distribution �zn .

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26

Page 29: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

29

Making the generative story formal

What are Topic Models?

Generative Model Approach

MNθd zn wn

Kβk

α

λ

• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �

• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵

• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.

• Choose the observed word wn from the distribution �zn .Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26

Page 30: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Topic models: What’s important

• Topic models (latent variables)• Topics to word types—multinomial distribution• Documents to topics—multinomial distribution

• Modeling & Algorithm• Model: story of how your data came to be• Latent variables: missing pieces of your story• Statistical inference: filling in those missing pieces

• We use latent Dirichlet allocation (LDA), a fully Bayesian version of pLSI, probabilistic version of LSA

30

Page 31: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

31

Which variables are hidden?

Page 32: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

32

Size of Variable

Page 33: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Joint distribution

33

Page 34: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Joint distribution

34

Page 35: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Posterior distribution

35

Page 36: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Variational inference

36

Page 37: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

KL divergence and evidence lower bound

37

Page 38: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

KL divergence and evidence lower bound

38

Page 39: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

A different way to get ELBO

• Jensen’s inequality

39

Page 40: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Evidence Lower Bound

40

Page 41: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Evidence Lower Bound

41

Page 42: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Variational inference

• Propose variational distribution q• Find ELBO (evidence lower bound) using q• Set derivatives to 0 and update variables

42

Page 43: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Variational distribution for LDA

43

Page 44: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Overall Algorithm

44

Page 45: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Updates to Maximize ELBO

45

Page 46: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Homework 5

• The original algorithm also updates alphas• Not required for the homework

46

Page 47: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Evaluation

Example

• Three topics

� =

2

664

cat dog hamburger iron pig.26 .185 .185 .185 .185.185 .185 .26 .185 .185.185 .185 .185 .26 .185

3

775 (4)

• Assume uniform �: (2.0, 2.0, 2.0)

• Compute update for �

�ni

/ �iv

exp

0

@ (�

i

) �

0

@X

j

�j

1

A

1

A (5)

• For the first word (dog) in the document: dog cat cat pig

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 35 of 40

47

Page 48: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

48

Evaluation

Update � for dog

� =

2

664

cat dog hamburger iron pig.26 .185 .185 .185 .185.185 .185 .26 .185 .185.185 .185 .185 .26 .185

3

775

�ni

/

�iv

exp

0

@ (�

i

) �

0

@X

j

�j

1

A

1

A

• � = (2.000, 2.000, 2.000)

• �(0) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• �(1) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• �(2) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• After normalization: {0.333, 0.333, 0.333}

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 36 of 40

Page 49: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

49

Evaluation

Update � for pig

� =

2

664

cat dog hamburger iron pig.26 .185 .185 .185 .185.185 .185 .26 .185 .185.185 .185 .185 .26 .185

3

775

�ni

/

�iv

exp

0

@ (�

i

) �

0

@X

j

�j

1

A

1

A

• � = (2.000, 2.000, 2.000)

• �(0) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• �(1) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• �(2) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• After normalization: {0.333, 0.333, 0.333}

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 37 of 40

Page 50: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Evaluation

Update � for cat

� =

2

664

cat dog hamburger iron pig.26 .185 .185 .185 .185.185 .185 .26 .185 .185.185 .185 .185 .26 .185

3

775

�ni

/

�iv

exp

0

@ (�

i

) �

0

@X

j

�j

1

A

1

A

• � = (2.000, 2.000, 2.000)

• �(0) / 0.260 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.072• �(1) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• �(2) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• After normalization: {0.413, 0.294, 0.294}

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 38 of 40

50

Page 51: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Evaluation

Update �

• Document: dog cat cat pig• Update equation

�i

= ↵i

+

X

n

�ni

(6)

• Assume ↵ = (.1, .1, .1)

�0 �1 �2dog .333 .333 .333cat .413 .294 .294 x2pig .333 .333 .333↵ 0.1 0.1 0.1

sum 1.592 1.354 1.354

• Note: do not normalize!

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 39 of 40

51

Page 52: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Evaluation

Update �

• Count up all of the � across all documents• For each topic, divide by total• Corresponds to maximum likelihood of expected counts

• Unlike Gibbs sampling, no Dirichlet prior

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 40 of 40

52

Page 53: Department of Computer Science CSCI 5622: Machine Learning ... · •A general way to model count data and a general inference algorithm 6. Conceptual approach •Input: a text corpus

Recap

• Topic models: a neat way to model discrete count data• Variational inference converts intractable optimization to

maximizing ELBO

53