department of computer science csci 5622: machine learning ... · •a general way to model count...

Department of Computer ScienceCSCI 5622: Machine Learning

Chenhao TanLecture 20: Topic modeling and variational inferrence

Slides adapted from Jordan Boyd-Graber, Chris Ketelsen

Administrivia

• Poster printing (stay tuned!)• HW 5 (final homework) is due next Friday!• Midpoint feedback

Learning Objectives

• Learn about latent Dirichlet allocation

• Understand the inituion behind variational inference

Topic models

• Discrete count data

Topic models

• Suppose you have a huge number of documents

• Want to know what's going on• Can't read them all (e.g. every New

York Times article from the 90's)• Topic models offer a way to get a

corpus-level view of major themes• Unsupervised

Why should you care?

• Neat way to explore/understand corpus collections• E-discovery• Social media• Scientific data

• NLP Applications• Word sense disambiguation• Discourse segmentation

• Psychology: word meaning, polysemy• A general way to model count data and a general inference

algorithm6

Conceptual approach

• Input: a text corpus and number of topics K • Output:

• K topics, each topic is a list of words• Topic assignment for each document

Forget the Bootleg, Just Download the Movie LegallyMultiplex Heralded As

Linchpin To GrowthThe Shape of Cinema, Transformed At the Click of

a MouseA Peaceful Crew Puts

Muppets Where Its Mouth IsStock Trades: A Better Deal For Investors Isn't SimpleThe three big Internet portals begin to distinguish

among themselves as shopping malls

Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens

Corpus

Conceptual approach

computer,

technology,

system,

service, site,

phone,

internet,

machine

play, film,

movie, theater,

production,

star, director,

sell, sale,

store, product,

business,

advertising,

market,

consumer

TOPIC 1 TOPIC 2 TOPIC 3

• K topics, each topic is a list of words

Conceptual approach

• Topic assignment for each document

Forget the Bootleg, Just Download the Movie Legally

Multiplex Heralded As Linchpin To

GrowthThe Shape of

Cinema, Transformed At the Click of a

Mouse A Peaceful Crew Puts Muppets

Where Its Mouth Is

Stock Trades: A Better Deal For Investors Isn't

Simple

Internet portals begin to distinguish among themselves as shopping malls

Red Light, Green Light: A

2-Tone L.E.D. to Simplify Screens

TOPIC 2"BUSINESS"

TOPIC 3"ENTERTAINMENT"

TOPIC 1"TECHNOLOGY"

Topics from Science

Topic models

• Discrete count data• Gaussian distributions are not appropriate

Generative model: Latent Dirichlet Allocation• Generate a document, or a bag of words• Blei, Ng, Jordan. Latent Dirichlet Allocation. JMLR, 2003.

Generative model: Latent Dirichlet Allocation• Generate a document, or a bag

of words• Multinomial distribution

• Distribution over discrete outcomes

• Represented by non-negative vector that sums to one

• Picture representation

(1,0,0) (0,0,1)

(1/2,1/2,0)(1/3,1/3,1/3) (1/4,1/4,1/2)

(0,1,0)

Generative model: Latent Dirichlet Allocation• Generate a document, or a bag

of words• Multinomial distribution

• Distribution over discrete outcomes

• Represented by non-negative vector that sums to one

• Picture representation• Come from a Dirichlet distribution

(1,0,0) (0,0,1)

(1/2,1/2,0)(1/3,1/3,1/3) (1/4,1/4,1/2)

(0,1,0)

Generative story

computer,

technology,

system,

service, site,

phone,

internet,

machine

play, film,

movie, theater,

production,

star, director,

sell, sale,

store, product,

business,

advertising,

market,

consumer

TOPIC 1

TOPIC 2

TOPIC 3

Generative story

Forget the Bootleg, Just Download the Movie Legally

Multiplex Heralded As Linchpin To Growth

The Shape of Cinema, Transformed At the Click of

a Mouse

A Peaceful Crew Puts Muppets Where Its Mouth Is

Stock Trades: A Better Deal For Investors Isn't Simple

The three big Internet portals begin to distinguish

among themselves as shopping mallsRed Light, Green Light: A

2-Tone L.E.D. to Simplify Screens

TOPIC 2

TOPIC 3

TOPIC 1

Generative story

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

sell, sale, store, product,

business, advertising,

market, consumer

Generative story

market, consumer

Generative story

market, consumer

Generative story

market, consumer

Missing component: how to generate a multinomial distribution

What are Topic Models?

Dirichlet Distribution

Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 12 of 26

Missing component: how to generate a multinomial distribution

Conjugacy of Dirichlet and Multinomial

• If � ⇠ Dir(↵), w ⇠ Mult(�), and nk = |{wi : wi = k}| then

p(�|↵,w) / p(w |�)p(�|↵) (1)

�nkY

�↵k�1 (2)

�↵k+nk�1 (3)

• Conjugacy: this posterior has the same form as the prior

Conjugacy of Dirichlet and Multinomial

• If � ⇠ Dir(↵), w ⇠ Mult(�), and nk = |{wi : wi = k}| then

p(�|↵,w) / p(w |�)p(�|↵) (1)

�nkY

�↵k�1 (2)

�↵k+nk�1 (3)

• Conjugacy: this posterior has the same form as the prior

Making the generative story formal

Generative Model Approach

MNθd zn wn

• For each topic k 2 {1, . . . ,K}, draw a multinomial distribution �kfrom a Dirichlet distribution with parameter �

• For each document d 2 {1, . . . ,M}, draw a multinomialdistribution ✓d from a Dirichlet distribution with parameter ↵

• For each word position n 2 {1, . . . ,N}, select a hidden topic znfrom the multinomial distribution parameterized by ✓.

• Choose the observed word wn from the distribution �zn .

MNθd zn wn

Making the generative story formalWhat are Topic Models?

MNθd zn wn

• Choose the observed word wn from the distribution �zn .Machine Learning: Jordan Boyd-Graber | Boulder Topic Models | 16 of 26

Topic models: What’s important

• Topic models (latent variables)• Topics to word types—multinomial distribution• Documents to topics—multinomial distribution

• Modeling & Algorithm• Model: story of how your data came to be• Latent variables: missing pieces of your story• Statistical inference: filling in those missing pieces

• We use latent Dirichlet allocation (LDA), a fully Bayesian version of pLSI, probabilistic version of LSA

Which variables are hidden?

Size of Variable

Joint distribution

Posterior distribution

Variational inference

KL divergence and evidence lower bound

A different way to get ELBO

• Jensen’s inequality

Evidence Lower Bound

Variational inference

• Propose variational distribution q• Find ELBO (evidence lower bound) using q• Set derivatives to 0 and update variables

Variational distribution for LDA

Overall Algorithm

Updates to Maximize ELBO

Homework 5

• The original algorithm also updates alphas• Not required for the homework

Evaluation

Example

• Three topics

cat dog hamburger iron pig.26 .185 .185 .185 .185.185 .185 .26 .185 .185.185 .185 .185 .26 .185

775 (4)

• Assume uniform �: (2.0, 2.0, 2.0)

• Compute update for �

/ �iv

@ (�

• For the first word (dog) in the document: dog cat cat pig

Evaluation

Update � for dog

@ (�

• � = (2.000, 2.000, 2.000)

• �(0) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• �(1) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• �(2) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• After normalization: {0.333, 0.333, 0.333}

Evaluation

Update � for pig

@ (�

• � = (2.000, 2.000, 2.000)

• �(0) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• �(1) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• �(2) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

Evaluation

Update � for cat

@ (�

• � = (2.000, 2.000, 2.000)

• �(0) / 0.260 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.072• �(1) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

0.051• �(2) / 0.185 ⇥ exp ( (2.000) � (2.000 + 2.000 + 2.000)) =

Evaluation

Update �

• Document: dog cat cat pig• Update equation

= ↵i

• Assume ↵ = (.1, .1, .1)

�0 �1 �2dog .333 .333 .333cat .413 .294 .294 x2pig .333 .333 .333↵ 0.1 0.1 0.1

sum 1.592 1.354 1.354

• Note: do not normalize!

Evaluation

Update �

• Count up all of the � across all documents• For each topic, divide by total• Corresponds to maximum likelihood of expected counts

• Unlike Gibbs sampling, no Dirichlet prior

• Topic models: a neat way to model discrete count data• Variational inference converts intractable optimization to

maximizing ELBO

department of computer science csci 5622: machine learning ... · •a general way to model count...

Documents

controlling management egn 5622 enterprise systems...

arq 5622-5ª aula- grandes mestres anos 60

csci 2011 discrete mathematics lecture 6 csci 2011

tettex 5622 digital kilovoltmeter

csci 2011 discrete mathematics lecture 3 csci 2011

5622 notice n°129 exe - sanibroy.de

department of computer science csci 5622: machine …csci...

lanier 5622 5627 manual del sistema af1022system_set_oi_es

pxie-5622 specifications - national instruments

csci 2011 discrete mathematics lecture 8 csci 2011

saulh@economia.unam. abogado general tel....

www.nicepps.ro 5622 a

concurrent computing csci 201l jeffrey miller, ph.d. http...

graphics csci 201l jeffrey miller, ph.d. http :// www - scf....

department of computer science csci 5622: machine learning...

parts &...

saulh@economia.unam. abogado general tel. 5622-2187

college of sciences - htir.co.in€¦ · csci 193...

discover pass second substitute senate bill 5622

comunicado - agenda - arpet - bienvenidos a la web … ·...