distance-dependent chinese restaurant franchisedongwookim-ml.github.io/thesispresentation.pdf ·...

43
Distance-Dependent Chinese Restaurant Franchise Dongwoo Kim Computer Science Department Thesis Presentation Tuesday, December 21, 2010

Upload: others

Post on 20-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Distance-DependentChinese Restaurant Franchise

Dongwoo KimComputer Science Department

Thesis Presentation

Tuesday, December 21, 2010

Page 2: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Presentation Objectives

• Introduce the concept and problems of Bayesian parametric / non-parametric topic models

• Propose the distance-dependent Chinese restaurant franchise

• Present the experimental results

• Based on four different time varying corpora

•NIPS, SIGIR, SIGMOD, SIGGRAPH

2

Tuesday, December 21, 2010

Page 3: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

LATENT DIRICHLET ALLOCATION

! z w"

#

MN

Figure 1: Graphical model representation of LDA. The boxes are “plates” representing replicates.The outer plate represents documents, while the inner plate represents the repeated choiceof topics and words within a document.

where p(zn |") is simply "i for the unique i such that zin = 1. Integrating over " and summing overz, we obtain the marginal distribution of a document:

p(w |!,#) =�

p(" |!)

N

$n=1%znp(zn |")p(wn |zn,#)

d". (3)

Finally, taking the product of the marginal probabilities of single documents, we obtain the proba-bility of a corpus:

p(D |!,#) =M

$d=1

p("d |!)

Nd

$n=1%zdnp(zdn |"d)p(wdn |zdn,#)

d"d .

The LDA model is represented as a probabilistic graphical model in Figure 1. As the figuremakes clear, there are three levels to the LDA representation. The parameters ! and # are corpus-level parameters, assumed to be sampled once in the process of generating a corpus. The variables"d are document-level variables, sampled once per document. Finally, the variables zdn and wdn areword-level variables and are sampled once for each word in each document.

It is important to distinguish LDA from a simple Dirichlet-multinomial clustering model. Aclassical clustering model would involve a two-level model in which a Dirichlet is sampled oncefor a corpus, a multinomial clustering variable is selected once for each document in the corpus,and a set of words are selected for the document conditional on the cluster variable. As with manyclustering models, such a model restricts a document to being associated with a single topic. LDA,on the other hand, involves three levels, and notably the topic node is sampled repeatedly within thedocument. Under this model, documents can be associated with multiple topics.

Structures similar to that shown in Figure 1 are often studied in Bayesian statistical modeling,where they are referred to as hierarchical models (Gelman et al., 1995), or more precisely as con-ditionally independent hierarchical models (Kass and Steffey, 1989). Such models are also oftenreferred to as parametric empirical Bayes models, a term that refers not only to a particular modelstructure, but also to the methods used for estimating parameters in the model (Morris, 1983). In-deed, as we discuss in Section 5, we adopt the empirical Bayes approach to estimating parameterssuch as ! and # in simple implementations of LDA, but we also consider fuller Bayesian approachesas well.

997

Topic ModelAn approach to analyze large volume of unlabeled documents

Tuesday, December 21, 2010

Page 4: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Concept of Topic Model

SIGIR corpus

4

Tuesday, December 21, 2010

Page 5: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Concept of Topic Model

• Every word has its latent topic (unknown)spam filtering

experiment

ML5

Tuesday, December 21, 2010

Page 6: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Concept of Topic Model

• Topic modeling task can be viewed as a topic assignment to every word

6

Tuesday, December 21, 2010

Page 7: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Output of Topic Model

Topics: multinomial over vocabulary

spam 0.12

email 0.10

filtering 0.08

filters 0.06

filter 0.06

messages 0.05

: :

Bayesian 0.001

machine 0.10

learning 0.09

model 0.08

likelihood 0.08

class 0.07

variable 0.06

: :

spam 0.001

experiment 0.15

validation 0.11

result 0.9

performance 0.8

test 0.7

perform 0.6

: :

email 0.001

7

Tuesday, December 21, 2010

Page 8: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Parametric Topic Models

• Representative model is LDA(Latent Dirichlet Allocation)(Blei, 2004)

•Number of topics must be determined for the corpus

• Like K-means, user should select appropriate number of topics K before training the model

8

Tuesday, December 21, 2010

Page 9: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

A Limitation of Parametric Topic Models

• It is difficult to determine the number of topics for a corpus

• Computing optimal number of topics is very time-consuming

• The optimal number of topics varies for each corpus

9

Tuesday, December 21, 2010

Page 10: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Non-Parametric Topic Models

• Representative model is HDP-LDA(Hierarchical Dirichlet Process)(Teh, 2006)

• Assumes an infinite number of topics

•Model automatically captures the appropriate number of topics

10

Tuesday, December 21, 2010

Page 11: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

A Limitation of HDP

• HDP does not consider the relationships among the documents

• However, some document collections exhibit patterns arising from the relationships among the documents

• For example, articles from conference proceedings exhibit a temporal pattern of topics

•When assigning topics to documents, probabilities for topics within a nearby neighborhood of documents should be higher than the topics in documents that are far apart

11

Tuesday, December 21, 2010

Page 12: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Proposed Model

• This thesis proposes a variant of HDP, called distance-dependent Chinese restaurant franchise

• ddCRF considers the relationships among the documents in the corpus

• ddCRF captures the temporal patterns of topics within conference proceedings

12

Tuesday, December 21, 2010

Page 13: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Distance-Dependent Chinese Restaurant FranchiseVariation of Bayesian non-parametric topic model

!

G

i

xi

0G

0"

!

Gj

# G0

H

0"

xji

ji

Figure 1: (Left) A representation of a Dirichlet process mixture model as a graphical model. (Right)A hierarchical Dirichlet process mixture model. In the graphical model formalism, each node in thegraph is associated with a random variable, where shading denotes an observed variable. Rectanglesdenote replication of the model within the rectangle. Sometimes the number of replicates is givenin the bottom right corner of the rectangle.

Using a somewhat different metaphor, the Polya urn scheme is closely related to a distributionon partitions known as the Chinese restaurant process (Aldous 1985). This metaphor has turnedout to be useful in considering various generalizations of the Dirichlet process (Pitman 2002a), andit will be useful in this paper. The metaphor is as follows. Consider a Chinese restaurant with anunbounded number of tables. Each !i corresponds to a customer who enters the restaurant, whilethe distinct values "k correspond to the tables at which the customers sit. The ith customer sits at thetable indexed by "k, with probability proportional to the number of customers mk already seatedthere (in which case we set !i = "k), and sits at a new table with probability proportional to #0

(incrementK, draw "K ! G0 and set !i = "K).

3.3 Dirichlet process mixture models

One of the most important applications of the Dirichlet process is as a nonparametric prior on theparameters of a mixture model. In particular, suppose that observations xi arise as follows:

!i | G ! G

xi | !i ! F (!i) , (9)

where F (!i) denotes the distribution of the observation xi given !i. The factors !i are conditionallyindependent given G, and the observation xi is conditionally independent of the other observationsgiven the factor !i. When G is distributed according to a Dirichlet process, this model is referredto as a Dirichlet process mixture model. A graphical model representation of a Dirichlet processmixture model is shown in Figure 1 (Left).

SinceG can be represented using a stick-breaking construction (6), the factors !i take on values"k with probability $k. We may denote this using an indicator variable zi which takes on positiveintegral values and is distributed according to ! (interpreting ! as a random probability measure on

7

Tuesday, December 21, 2010

Page 14: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Chinese Restaurant Franchise(CRF)

• HDP is hard to imagine & understand

• Introduced as a Metaphor to explain the HDP(Teh, 2006)

• Two level hierarchical Chinese restaurant process(CRP)

14

Tuesday, December 21, 2010

Page 15: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

CRF Metaphor

• Each restaurant has an infinite number of tables

• N customers are sequentially sitting down at the tables

• Each table has one dish to serve

15

Tuesday, December 21, 2010

Page 16: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

•Explain topic modeling by using CRF metaphor

•Consider a restaurant as a document

•Consider a customer as a word

•Consider a table as a topic

T1-1 ...T1-2 T1-3 T1-4

CRF to Topic Model

Restaurant(Document1)

16

Tuesday, December 21, 2010

Page 17: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

T1-1 ...T1-2 T1-3 T1-4

spam

CRF to Topic Model

Restaurant(Document1)

17

First customer ‘spam’ is coming to ‘document1’ restaurantAnd sitting at the table T1-1

Tuesday, December 21, 2010

Page 18: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

T1-1 ...T1-2 T1-3 T1-4

spam

email

CRF to Topic Model

Restaurant(Document1)

Second customer ‘email’ is coming to ‘document1’ restaurantAnd considering where to sit

18

Tuesday, December 21, 2010

Page 19: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

T1-1 ...T1-2 T1-3 T1-4

spam

email

CRF to Topic Model

Restaurant(Document1)

•Probability of ‘email’ sitting at

➡ an occupied table is proportional to the number of customers already sitting at that table

➡ new table is proportional to a constant γ19

Tuesday, December 21, 2010

Page 20: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

T1-1 ...T1-2 T1-3 T1-4

spam

CRF to Topic Model

Restaurant(Document1)

•Formally, probability of ‘email’ sitting at the

zi = table no of ith customer

nk = number of customers already sitting at table k

3. Chinese Restaurant Process

In this chapter, we introduce the Chinese restaurant process, used as a prior for Bayesian

nonparametric methods, and its extension to the distance-dependent Chinese restaurant

process.

3.1 Chinese restaurant process.

The Chinese restaurant process(CRP) is a probability distribution on partitions. The dis-

tribution is obtained by a process by which N customers sit down in a Chinese restaurant

which has an infinite number of tables. The basic process of CRP is described as a sequential

process by which customers sit down at a randomly chosen table drawn from its probability

distribution. After N customers have sat down, their configuration represents a random

partition.

In the CRP, the probability of a subsequent customer sitting at a table is computed by

preceding customers already sitting at the table. Let K be the number of tables occupied by

any customer, zi the table assignment of the ith customer and nk the number of customers

sitting at the table k. The probability of each table for the ith customer is specified as

follows:

p(zi = k| z1:(i−1), γ) =nk

γ + i− 1

p(zi = K + 1| z1:(i−1), γ) =γ

γ + i− 1

where γ is a parameter. A probability of a customer sitting at the new table is proportional

to γ, and a probability of customer sitting at table k is proportional to the number of

customers sitting at the table k.

3.2 Distance dependent CRP.

Blei and Frazier[1] introduced the distant dependent Chinese restaurant process (ddCRP) in

which the probability of a customer sitting at a table is dependent on the distances between

customers. They also introduce the customer-based CRP as an alternative representation of

6

γ = parameter

email

occupied table k

new table

3. Chinese Restaurant Process

In this chapter, we introduce the Chinese restaurant process, used as a prior for Bayesian

nonparametric methods, and its extension to the distance-dependent Chinese restaurant

process.

3.1 Chinese restaurant process.

The Chinese restaurant process(CRP) is a probability distribution on partitions. The dis-

tribution is obtained by a process by which N customers sit down in a Chinese restaurant

which has an infinite number of tables. The basic process of CRP is described as a sequential

process by which customers sit down at a randomly chosen table drawn from its probability

distribution. After N customers have sat down, their configuration represents a random

partition.

In the CRP, the probability of a subsequent customer sitting at a table is computed by

preceding customers already sitting at the table. Let K be the number of tables occupied by

any customer, zi the table assignment of the ith customer and nk the number of customers

sitting at the table k. The probability of each table for the ith customer is specified as

follows:

p(zi = k| z1:(i−1), γ) =nk

γ + i− 1

p(zi = K + 1| z1:(i−1), γ) =γ

γ + i− 1

where γ is a parameter. A probability of a customer sitting at the new table is proportional

to γ, and a probability of customer sitting at table k is proportional to the number of

customers sitting at the table k.

3.2 Distance dependent CRP.

Blei and Frazier[1] introduced the distant dependent Chinese restaurant process (ddCRP) in

which the probability of a customer sitting at a table is dependent on the distances between

customers. They also introduce the customer-based CRP as an alternative representation of

6

20

Tuesday, December 21, 2010

Page 21: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

T1-1 ...T1-2 T1-3 T1-4

spam

emailfilter

learning

likelihoodmodel

experiment

validation

CRF to Topic Model

Restaurant(Document1)

•Above result shows

•configuration after N customers are sitting at the tables

•This process represents how CRP works

21

Tuesday, December 21, 2010

Page 22: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

T1-1 ...T1-2 T1-3 T1-4

spam

emailfilter

learning

likelihoodmodel

experiment

validation

T2-1 ...T2-2 T2-3 T2-4

class

learningvariable

email

filteringspamslikelihood

CRF to Topic Model

Restaurant(Document1)

Restaurant(Document2)

There are many restaurants in the world !!!

22

Tuesday, December 21, 2010

Page 23: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

T1-1 ...T1-2 T1-3 T1-4

spam

emailfilter

learning

likelihoodmodel

experiment

validation

T2-1 ...T2-2 T2-3 T2-4

class

learningvariable

email

filteringspamslikelihood

CRF to Topic Model

Restaurant(Document1)

Restaurant(Document2)

However, how do we know that T1-1 and T2-2 are the same topics?

23

Tuesday, December 21, 2010

Page 24: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Introduce Menu Level CRP

Document1

Document2

Topic1 ...Topic2 Topic3 Topic4

T1-1

T2-2

T1-2

T2-1

T1-3

Menu Level CRP

Customer Level CRP

T1-1 ...T1-2 T1-3 T1-4

spam

emailfilter

learning

likelihoodmodel

experiment

validation

T2-1 ...T2-2 T2-3 T2-4

class

learningvariable

email

filteringspamslikelihood

24

Tuesday, December 21, 2010

Page 25: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Introduce Menu Level CRP

Topic1 ...Topic2 Topic3 Topic4

T1-1

T2-2

T1-2

T2-1

T1-3

Menu Level CRP

•Menu level CRP decides which dish is served to each table

•Customers at Table ‘T1-1’ and ‘T2-2’ eat dish ‘Topic1’

•Customers at Table ‘T1-2’ and ‘T2-1’ eat dish ‘Topic2’25

Tuesday, December 21, 2010

Page 26: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Limitation of Menu Level CRP

Topic1 ...Topic2 Topic3 Topic4

T1-1

T2-2

T1-2

T2-1

T1-3

Menu Level CRP

•New Assumptions

•Document 1 and document 2 are written in 2000

•Document 3 is written in 1978

26

T3-1

Tuesday, December 21, 2010

Page 27: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Limitation of Menu Level CRP

Topic1 ...Topic2 Topic3 Topic4

T1-1

T2-2

T1-2

T2-1

T1-3

Menu Level CRP

•Topic2 is about ‘spam filtering’

•Now, new table ‘T4-1’ is coming and choosing a menu to serve

•What if table ‘T4-1’ is a document written in 1979

T4-1

27

T3-1

Tuesday, December 21, 2010

Page 28: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Limitation of Menu Level CRP

Topic1 ...Topic2 Topic3 Topic4

T1-1

T2-2

T1-2

T2-1

T1-3

Menu Level CRP

•Original CRF does not consider a relationship between tables

•Topic2(spam-filtering) can be served to the table T4-1(1979 document)

•This is not an appropriate modeling

T4-1

28

T3-1

Tuesday, December 21, 2010

Page 29: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Introduce ddCRF Metaphor

• Selection of a dish could be influenced by nearby restaurants

• If there is a famous menu in a specific region, we probably want to eat that menu in that region

• If a document was written in 1979, the topics would be more likely to be the same as the documents written in 1978, and less likely to be the same as the documents written in 2000

29

Tuesday, December 21, 2010

Page 30: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Consider Relationship

Topic1 ...Topic2 Topic3 Topic4

T1-1

T2-2

T1-2

T2-1

T1-3

Menu Level CRP

T4-1

•For choosing dish for table ‘T4-1’, we compare the relationship between ‘T4-1’ and others tables already sitting down at menu level tables

30

T3-1

Tuesday, December 21, 2010

Page 31: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Distance-Dependent CRP

obtain the special case of sequential CRPs. When we define the distance measure such that

dij = ∞ for those j > i, using either the logistic or the exponential decay function brings

f(∞) = 0, and this results in a sequential CRP in which no customer can be assigned to a

later customer.

Explicit table assignments do not occur in the customer based CRP, but the connected

components implicitly exhibit a clustering property. Moreover a customer based CRP can

be reverted to a table based CRP by summing over the distances within the same component

of customers’ link structure. Let K be an imaginary number of tables, which is the same as

the number of connected components of customers, and zi denote the index of the imaginary

table of the ith customer. The probability of each table for the ith customer is specified as

follows:

p(zi = k | D, z1:(i−1), γ) =

�zj=k f(dij)

γ +�

j �=i f(dij)

p(zi = K + 1 | D, z1:(i−1), γ) =γ

γ +�

j �=i f(dij).

The partition probability over customers can be computed in the customer-based ddCRP

simply as follows:

p(c1:N | D, f, γ) =N�

i=1

1[ci = i]γ + 1[ci �= i]f(dij)

γ +�

j �=i f(dici).

However, in the table-based ddCRP, to compute the partition probability over customers,

we must consider all combinations of table assignments, and the number of combinations

increases factorially as the number of customers increases. If we make the assumption of

sequential non-exchangeability of data such that the model would be the sequential ddCRP,

the partition probability can be computed by

p(z1:N | D, f, γ) =

N�

i

1[zi = Ki−1 + 1]γ + 1[zi �= Ki−1 + 1]�

zj=zi,j<i f(dij)�j<i f(dij) + γ

,

where Ki is the number of allocated tables until the ith customer sits at a table.

Although this commitment to a sequential table-based ddCRP would lose the relative

advantages of the customer-based ddCRP in the efficiency of sampling [1], we use the table-

based distance dependent CRP in the rest of this paper. We chose to do so because in

the hierarchical model presented in the next section, it is relatively easy to implement and

compute the conditional probabilities. We discuss this in more detail in Sections 4.

8

f = decay function

dij = distance between i & j

•We model the relationship as a distance between documents

•Distance metric should be defined such that the distance between two close documents is small

•Decay function

•0(long distance) ~ 1(short distance) 31

3. Chinese Restaurant Process

In this chapter, we introduce the Chinese restaurant process, used as a prior for Bayesian

nonparametric methods, and its extension to the distance-dependent Chinese restaurant

process.

3.1 Chinese restaurant process.

The Chinese restaurant process(CRP) is a probability distribution on partitions. The dis-

tribution is obtained by a process by which N customers sit down in a Chinese restaurant

which has an infinite number of tables. The basic process of CRP is described as a sequential

process by which customers sit down at a randomly chosen table drawn from its probability

distribution. After N customers have sat down, their configuration represents a random

partition.

In the CRP, the probability of a subsequent customer sitting at a table is computed by

preceding customers already sitting at the table. Let K be the number of tables occupied by

any customer, zi the table assignment of the ith customer and nk the number of customers

sitting at the table k. The probability of each table for the ith customer is specified as

follows:

p(zi = k| z1:(i−1), γ) =nk

γ + i− 1

p(zi = K + 1| z1:(i−1), γ) =γ

γ + i− 1

where γ is a parameter. A probability of a customer sitting at the new table is proportional

to γ, and a probability of customer sitting at table k is proportional to the number of

customers sitting at the table k.

3.2 Distance dependent CRP.

Blei and Frazier[1] introduced the distant dependent Chinese restaurant process (ddCRP) in

which the probability of a customer sitting at a table is dependent on the distances between

customers. They also introduce the customer-based CRP as an alternative representation of

6

3. Chinese Restaurant Process

In this chapter, we introduce the Chinese restaurant process, used as a prior for Bayesian

nonparametric methods, and its extension to the distance-dependent Chinese restaurant

process.

3.1 Chinese restaurant process.

The Chinese restaurant process(CRP) is a probability distribution on partitions. The dis-

tribution is obtained by a process by which N customers sit down in a Chinese restaurant

which has an infinite number of tables. The basic process of CRP is described as a sequential

process by which customers sit down at a randomly chosen table drawn from its probability

distribution. After N customers have sat down, their configuration represents a random

partition.

In the CRP, the probability of a subsequent customer sitting at a table is computed by

preceding customers already sitting at the table. Let K be the number of tables occupied by

any customer, zi the table assignment of the ith customer and nk the number of customers

sitting at the table k. The probability of each table for the ith customer is specified as

follows:

p(zi = k| z1:(i−1), γ) =nk

γ + i− 1

p(zi = K + 1| z1:(i−1), γ) =γ

γ + i− 1

where γ is a parameter. A probability of a customer sitting at the new table is proportional

to γ, and a probability of customer sitting at table k is proportional to the number of

customers sitting at the table k.

3.2 Distance dependent CRP.

Blei and Frazier[1] introduced the distant dependent Chinese restaurant process (ddCRP) in

which the probability of a customer sitting at a table is dependent on the distances between

customers. They also introduce the customer-based CRP as an alternative representation of

6

Tuesday, December 21, 2010

Page 32: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Distance-Dependent CRF

Document1

Document2

Topic1 ...Topic2 Topic3 Topic4

T1-1

T2-2

T1-2

T2-1

T1-3

Menu Level ddCRP

Customer Level CRP

T1-1 ...T1-2 T1-3 T1-4

spam

emailfilter

learning

likelihoodmodel

experiment

validation

T2-1 ...T2-2 T2-3 T2-4

class

learningvariable

email

filteringspamslikelihood

32

T3-1

Tuesday, December 21, 2010

Page 33: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

ExperimentsComparisons with other topic models

Tuesday, December 21, 2010

Page 34: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Experiments

• Infer topics from four different corpora (SIGIR, SIGMOD, SIGGRAPH, NIPS)

• Use MCMC (Gibbs-Sampling) technique for inference

• Use 2 kinds of decay functions

• Logistic decay function, exponential decay function

• Compared with other topic models

• LDA, HDP

34

Tuesday, December 21, 2010

Page 35: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Evaluation Metrics

•Qualitative Evaluation

• Emergence and disappearance of topics

•Quantitative Evaluation

• Held-out likelihood

• Complexity

35

Tuesday, December 21, 2010

Page 36: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Topic Emergence

!

!"!!#

!"!$

!"!$#

!"!%

!"!%#

!"!&

!"!&#

!"!'

!"!'#

!"#$"%&#'(

)*&+&",-./01(

()*+,+-./,/)++.0)+,1234)52(0,1234)5+,.54263)+,1234)5,)/.23,)7)(4+

Figure 6.4: Topic proportion over time. Identified from SIGIR

– 20 –

36

Tuesday, December 21, 2010

Page 37: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

!

!"!!#

!"!$

!"!$#

!"!%

!"!%#

!"!&

!"!&#

!"!'

!"#$"%&#'(

))*+,(

()*+,-+*./,0./1-2.34,0./1-2(,0./1-2,+-((*4-(,152-(56/7,-2262,12*.3-7

Figure 6.4: Topic proportion over time. Identified from SIGIR

– 20 –

Topic Emergence

37

Tuesday, December 21, 2010

Page 38: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Topic Emergence

38

!

!"!#

!"!$

!"!%

!"!&

!"'

!"'#

!"'$

!"'%

!"'&

!"#

!"#$"%&#'(

)*&+&",-./01(

()*+,-)./012,-2034**564-5+)7,012,-0)+,82095+50-5+)./20652,905*/4-)+:82

Figure 6.4: Topic proportion over time. Identified from SIGIR

– 20 –

Tuesday, December 21, 2010

Page 39: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Topic Emergence

39

!

!"!!#

!"!$

!"!$#

!"!%

!"!%#

!"!&

!"!&#

!"!'

!"!'#

!"!#

!"#$"%&#'(

))*+,(

()*+,-)./012,-2034**564-5+)7,012,-085+50-5+)./20)+,920:-,8)3+)4.09,+;48

Figure 6.4: Topic proportion over time. Identified from SIGIR

– 20 –

Tuesday, December 21, 2010

Page 40: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Held-out Likelihood

40

5 10 15 20

-66000

-65600

-65200

decay parameter

Heldout-likelihood

SIGIR

logisticexponentialCRF

5 10 15 20

-102500

-101500

-100500

decay parameter

Heldout-likelihood

SIGMOD

logisticexponentialCRF

5 10 15 20

-32500

-32300

-32100

decay parameter

Heldout-likelihood

SIGGRAPH

logisticexponentialCRF

2 4 6 8

-331500

-330500

-329500

decay parameter

Heldout-likelihood

NIPS

logisticexponentialCRF

Tuesday, December 21, 2010

Page 41: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Complexity

41

5 10 15 20

5500

5700

5900

decay parameter

Complexity

SIGIR

logisticexponentialHDP

5 10 15 20

6900

7100

7300

decay parameter

Complexity

SIGMOD

logisticexponentialHDP

5 10 15 20

1650

1750

1850

1950

decay parameter

Complexity

SIGGRAPH

logisticexponentialHDP

2 4 6 8

5600

6000

6400

decay parameter

Complexity

NIPS

logisticexponentialHDP

Tuesday, December 21, 2010

Page 42: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Contributions & Future Work• Designed and implemented distance-dependent CRF

• A variant of Chinese restaurant franchise

• For a corpus where the relationships among documents are important

• Modeled topics from four different corpora to capture temporal patterns of topics

• Quantitative evaluation shows improved performance over LDA(parametric topic model) and HDP(non-parametric topic model)

• Qualitative evaluation shows interesting temporal patterns of topic emergence

• Future work will explore various definitions of distance: time dimension, spatial dimension, or some other dimension

• The ddCRF can be applied to various other problems where other topic models have been successfully applied

• Cognitive science, computational biology, multimedia (image, music, video) analysis ...

42

Tuesday, December 21, 2010

Page 43: Distance-Dependent Chinese Restaurant Franchisedongwookim-ml.github.io/ThesisPresentation.pdf · Distance-Dependent Chinese Restaurant Franchise Variation of Bayesian non-parametric

Q&A

43

Tuesday, December 21, 2010