a generic approach to topic models and its application to virtual communities
DESCRIPTION
PhD defense slides, translated to English.TRANSCRIPT
-
A generic approach to topic modelsand its application to virtual communities
Gregor Heinrich
PhD presentation (English translation incl. backup slides, 45min)Faculty of Mathematics and Computer Science
University of Leipzig
28 November 2012Version 2.9 EN BU
Gregor Heinrich A generic approach to topic models 1 / 35
-
A generic approach to topic modelsand its application to virtual communities
Gregor Heinrich
PhD presentation (English translation incl. backup slides, 45min)Faculty of Mathematics and Computer Science
University of Leipzig
28 November 2012Version 2.9 EN BU
Gregor Heinrich A generic approach to topic models 1 / 35
-
Overview
Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook
Gregor Heinrich A generic approach to topic models 2 / 35
-
Motivation: Virtual communities
Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the
Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the
Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the
cooperation
authorship
citationauthorship
recommendation
authorshipcitation
annotationsimilarity
annotation
Virtual communities = groups of persons who exchange informationand knowledge electronically
Examples: organisations, digital libraries, Web 2.0 applicationsincl. social networks
Data are multimodal: text content; authorship, citation, annotationsand recommendations; cooperation and other social relations
Typical case: discrete data with high dynamics and large volumes
Gregor Heinrich A generic approach to topic models 3 / 35
-
Motivation: Unsupervised mining of discrete data
Identification of relationships in large data volumesOnly data (and possibly model) required (information retrieval,network analysis, clustering, NLP methods)Density problem: Features too sparse for analysis inhigh-dimensional feature spaceVocabulary problem: Semantic similarity , lexical similarity(polysemy, synonymy, etc.)
4/24/12 1:58 AM
Page 1 of 2file:///data/workspace/knowceans-freshmind-lucene3/bar.svg
judicial assembly
location
long objectpressure unit
furnitureverb
long object
verb
verb
people
people
location
bank
computer
furniture
bar
court
restaurant
stickatm
counter
prevent
staff
adhere
glue
personnel
employees
yard
teller
network
table
barbar
Gregor Heinrich A generic approach to topic models 4 / 35
-
Motivation: Unsupervised mining of discrete data
Identification of relationships in large data volumesOnly data (and possibly model) required (information retrieval,network analysis, clustering, NLP methods)Density problem: Features too sparse for analysis inhigh-dimensional feature spaceVocabulary problem: Semantic similarity , lexical similarity(polysemy, synonymy, etc.)
4/24/12 1:58 AM
Page 1 of 2file:///data/workspace/knowceans-freshmind-lucene3/bar.svg
judicial assembly
location
long objectpressure unit
furnitureverb
long object
verb
verb
people
people
location
bank
computer
furniture
bar
court
restaurant
stickatm
counter
prevent
staff
adhere
glue
personnel
employees
yard
teller
network
table
barbar
Gregor Heinrich A generic approach to topic models 4 / 35
-
Topic models as approach
...
...
words documents
Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)
Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction
Gregor Heinrich A generic approach to topic models 5 / 35
-
Topic models as approach
...
...
wordstopics
documents
Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)
Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction
Gregor Heinrich A generic approach to topic models 5 / 35
-
Topic models as approach
...
...
wordstopics
documents
bar
rhythm
drum
Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)
Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction
Gregor Heinrich A generic approach to topic models 5 / 35
-
Topic models as approach
...
...
wordstopics
documents
bar
restaurant
wine
rhythm
drum
Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)
Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction
Gregor Heinrich A generic approach to topic models 5 / 35
-
Topic models as approach
...
...
wordstopics
documents
Rhythm & Spice
Playing Drums
Leipzigs Barsand Restaurants
for Beginners
Jamaican Grillbar
restaurant
wine
rhythm
drum
Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)
Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction
Gregor Heinrich A generic approach to topic models 5 / 35
-
Topic models as approach
...
...
wordstopics
documents
Rhythm & Spice
Playing Drums
Leipzigs Barsand Restaurants
for Beginners
Jamaican Grillbar
restaurant
wine
rhythm
drum
Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)
Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction
Gregor Heinrich A generic approach to topic models 5 / 35
-
Topic models as approach
...
...
wordstopics
documents
Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)
Reduce vocabulary problem: Find semantic relations Reduce density problem: Dimensionality reduction
Gregor Heinrich A generic approach to topic models 5 / 35
-
Language models: Unigram model
wm,n
zz
w2,2 w2,3w2,1w1,2 w1,3w1,1
document 1
document 2
p(w | z)
document m
word n
1 2 3
One distribution for all data
Gregor Heinrich A generic approach to topic models 6 / 35
-
Language models: Unigram mixture model
wm,n
zmz2z1
w2,2 w2,3w2,1w1,2 w1,3w1,1
document 1
document 2
p(w | z1) p(w | z2)
document m
word n
1 2 3 1 2 3
One distribution per document
Gregor Heinrich A generic approach to topic models 6 / 35
-
Language models: Unigram admixture model
wm,n
zm,n
document m
word n
z2,2z1,2
w2,2 w2,3w2,1w1,2 w1,3w1,1
document 1
document 2
z1,1 z1,3 z2,1 z2,3
p(w | z1,1) . . . . . . p(w | z2,3)
One distribution per word basic topic model
Gregor Heinrich A generic approach to topic models 6 / 35
-
Language models: Unigram admixture model
wm,n
zm,nz2,2z1,2
w2,2 w2,3w2,1w1,2 w1,3w1,1
document 1
document 2
z1,1 z1,3 z2,1 z2,3
p(w | z1,1) . . . . . . p(w | z2,3)
document m
word n
One distribution per word basic topic model
Gregor Heinrich A generic approach to topic models 6 / 35
-
Bayesian topic models: The Dirichlet distribution
p3
p2
p1
p(w | z) Dir(~)
~ = (4, 4, 2)
1 2 3 1 2 3 1 2 3
1 = 1 2 = 1
3 = 1
1 = 1 2 = 1
3 = 1
1 = 1 2 = 1
3 = 1
Bayesian methodology:
Distributions generatedfrom prior distributionsSpeech + other discretedata: Dirichlet distributionimportant prior:
Defined on simplex:Surface containing alldiscrete distributionsParameter ~ controlsbehaviour
Bayesian topic model:Latent Dirichlet Allocation(LDA) (Blei et al. 2003)
Gregor Heinrich A generic approach to topic models 7 / 35
-
Latent Dirichlet Allocation
wm,n
zm,n
~k
~m
document m
word ntopic k
Dir()
Dir()
Latent Dirichlet Allocation (Blei et al. 2003)
Gregor Heinrich A generic approach to topic models 8 / 35
-
Latent Dirichlet Allocation
wm,n
zm,n
~k
~m
document m
word ntopic k
Dir()
Dir()
Concert tonight at Rhythm and Spice Restaurant . . .
Latent Dirichlet Allocation (Blei et al. 2003)
Gregor Heinrich A generic approach to topic models 8 / 35
-
Latent Dirichlet Allocation
wm,n
zm,n
~k
~m
document m
word ntopic k
Dir()
restaurant
bar food
grill
concertmusic
rhythm ba
r
. . .
. . .
topic 2
topic 1
Concert tonight at Rhythm and Spice Restaurant . . .
~2
~1
Dir()
Generating word distributions for all topics
Gregor Heinrich A generic approach to topic models 8 / 35
-
Latent Dirichlet Allocation
wm,n
zm,n
~k
~m
document m
word ntopic k
Dir()
restaurant
bar food
grill
concertmusic
rhythm ba
r
. . .
. . .
topic 2
topic 1
Concert tonight at Rhythm and Spice Restaurant . . .
~2
~1
Dir()
topic 1
~1
topic 2 . . .
document 1
Generating topic distribution for document
Gregor Heinrich A generic approach to topic models 8 / 35
-
Latent Dirichlet Allocation
wm,n
zm,n
~k
~m
document m
word ntopic k
Dir()
restaurant
bar food
grill
concertmusic
rhythm ba
r
. . .
. . .
topic 2
topic 1
Concert tonight at Rhythm and Spice Restaurant . . .
~2
~1
Dir()
topic 1
~1
topic 2 . . .
document 1
Sampling the topic index for first word, z = 2
Gregor Heinrich A generic approach to topic models 8 / 35
-
Latent Dirichlet Allocation
wm,n
zm,n
~k
~m
document m
word ntopic k
Dir()
restaurant
bar food
grill
concertmusic
rhythm ba
r
. . .
. . .
topic 2
topic 1
Concert tonight at Rhythm and Spice Restaurant . . .
~2
~1
Dir()
topic 1
~1
topic 2 . . .
document 1
Sampling a word from term distribution for topic 2, concert
Gregor Heinrich A generic approach to topic models 8 / 35
-
State of the art
Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for topic model (title + abstract) only since 2012: ACM>400, Google Scholar >1300.
Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms
Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods
Gregor Heinrich A generic approach to topic models 9 / 35
-
State of the art
Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for topic model (title + abstract) only since 2012: ACM>400, Google Scholar >1300.
Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms
Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods
wm,n
zm,n
cm, j
ym, j
~am
~k ~k
~x
xm, j xm,n
m [1,M]n [1,Nm]j [1, Jm] k [1,K]k [1,K]
x [1, A]
Experttagtopic model
(Heinrich 2011)
Gregor Heinrich A generic approach to topic models 9 / 35
-
State of the art
Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for topic model (title + abstract) only since 2012: ACM>400, Google Scholar >1300.
Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms
Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods
Gregor Heinrich A generic approach to topic models 9 / 35
-
State of the art
Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for topic model (title + abstract) only since 2012: ACM>400, Google Scholar >1300.
Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms
Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods
Gregor Heinrich A generic approach to topic models 9 / 35
-
Research questions
How can topic models be described in a generic way in order to usetheir properties across particular applications?
Can generic topic models be implemented generically and, if so,can repeated structures be exploited for optimisations?
How can generic models be applied to data in virtual communities?
Gregor Heinrich A generic approach to topic models 10 / 35
-
Overview
Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook
Gregor Heinrich A generic approach to topic models 11 / 35
-
How can topic models be described in a generic way in order to usetheir properties across particular applications?
Gregor Heinrich A generic approach to topic models 12 / 35
-
Topic models: Example structures
m[1,M]
k[1,K] n[1,Nm]
zm,n
~k wm,n
~m
(a) Latent Dirichlet allocation (LDA)
m[1,M ]
k[1,K] n[1,Nm]
zm,n
cm,n
~c
c[1,C]
~k wm,n
~m
(b) Authortopic model (ATM)
m[1,M]n[1,Nm]
~ ~k
[1,L]
wm,n
y[1,K]
~x
~r
z1m,n
z2m,n~rm
~m,x z3m,n
(c) Pachinko allocation model (PAM4)
m[1,M]n[1,Nm]
wm,n
T[1,|T |]
~0 zTm,n~m,0
~m,T ztm,n~T
~ ~k
k[1,|t|+|T |+1]
m,n ~T,t
T,t[1,|T ||t|]
(d) Hierarchical PAM (hPAM)(Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007)
Gregor Heinrich A generic approach to topic models 13 / 35
-
Generic topic models NoMMs
xoutK
xink= f (xin)
~k
Dir(~k |)~3
~2
~1
Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x
Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.
Gregor Heinrich A generic approach to topic models 14 / 35
-
Generic topic models NoMMs
xoutK
xink= f (xin)
~k
xoutK
xin1
xin2
k= f (xin1 , xin2)
xout1K
xin
xout2
k= f (xin)
~k ~k
Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x
Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.
Gregor Heinrich A generic approach to topic models 14 / 35
-
Generic topic models NoMMs
Kk= f (xin)
~kK
xin1
xin2
k= f (xin1 , xin2)
xout1K
xout2
k= f (xin)
~k ~k
x1 x2
Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x
Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.
Gregor Heinrich A generic approach to topic models 14 / 35
-
Generic topic models NoMMs
~k |xin1 x1xin2 xout2
xout1~k |~k | x
2
Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x
Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.
Gregor Heinrich A generic approach to topic models 14 / 35
-
Generic topic models NoMMs
~k |xin1 x1xin2 xout2
xout1~k |~k | x
2
Dir(~k |)Dir(~k |)Dir(~k |)
Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x
Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.
Gregor Heinrich A generic approach to topic models 14 / 35
-
Generic topic models NoMMs
~m |m
~k | zm,n
Dir(~k |)Dir(~k |)
wm,n
LDA
Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x
Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.
Gregor Heinrich A generic approach to topic models 14 / 35
-
Generic topic models NoMMs
~m |m
~k | zm,n
Dir(~k |)Dir(~k |)
wm,n
LDA
Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x
Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.
Gregor Heinrich A generic approach to topic models 14 / 35
-
Generic topic models NoMMs
~m |m=1
~k | zm,n
Dir(~k |)Dir(~k |)
wm,n
LDA~1
Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x
Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.
Gregor Heinrich A generic approach to topic models 14 / 35
-
Generic topic models NoMMs
~m |m=1
~k | z1,1=3
Dir(~k |)Dir(~k |)
wm,n
LDA~1
Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x
Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.
Gregor Heinrich A generic approach to topic models 14 / 35
-
Generic topic models NoMMs
~m |m=1
~k | z1,1=3
Dir(~k |)Dir(~k |)
wm,n
LDA~3
Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x
Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.
Gregor Heinrich A generic approach to topic models 14 / 35
-
Generic topic models NoMMs
~m |m=1
~k | z1,1=3
Dir(~k |)Dir(~k |)
w1,1=2
LDA~3
Generic characteristics of topic models:Levels with discrete components ~k, generated from DirichletdistributionsCoupling via values of discrete variables x
Network of mixed membership (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.
Gregor Heinrich A generic approach to topic models 14 / 35
-
Topic models as NoMMs
~m | zm,n=k
[V][K]m
[M]
wm,n=t~k |
(a) Latent Dirichlet allocation, LDA
m
[M]
xm,n=x
[A]~x|
zm,n=k[V][K]
wm,n=t~k | ~am
(b) Authortopic model, ATM
~rm | ~rm
[M]z2m,n=x
[s1]~m,x|~x
z3m,n=y[V][s2]
wm,n=t~y | ~
(c) Pachinko allocation model, PAMm
[M]zTm,n=T
~m,T |~Tztm,n=t
~T,t |
~m,0|~0
m
[M]
[V]
wm,n~`,T,t|
`m,n[|T |]
[|t|]
[3][|T |+|t|+1]
zTm,n=T
(d) Hierarchical pachinko allocation model, hPAM(Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007)
m[1,M]
k[1,K] n[1,Nm]
zm,n
~k wm,n
~m
m[1,M ]
k[1,K] n[1,Nm]
zm,n
cm,n
~c
c[1,C]
~k wm,n
~m
m[1,M]n[1,Nm]
~ ~k
[1,L]
wm,n
y[1,K]
~x
~r
z1m,n
z2m,n~rm
~m,x z3m,n
m[1,M]n[1,Nm]
wm,n
T[1,|T |]
~0 zTm,n~m,0
~m,T ztm,n~T
~ ~k
k[1,|t|+|T |+1]
m,n ~T,t
T,t[1,|T ||t|]
Gregor Heinrich A generic approach to topic models 15 / 35
-
Overview
Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook
Gregor Heinrich A generic approach to topic models 16 / 35
-
Can generic topic models be implemented generically. . . ?
Gregor Heinrich A generic approach to topic models 17 / 35
-
Bayesian inference problem and Gibbs sampler
Bayesian inference: inversion of generative process:Find distributions over parameters and latent variables/topics H,given observations V and Dirichlet parameters A
= Determine posterior distribution p(H, |V ,A)Intractability approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)
In topic models: Marginalise parameters (Collapsed GS)Sample topics Hi for each data point i in turn: Hi p(Hi |Hi,V ,A)
xm y w
V1 H1 2 H2 3
~rm |r ~m,x| ~y |
Gregor Heinrich A generic approach to topic models 18 / 35
-
Bayesian inference problem and Gibbs sampler
Bayesian inference: inversion of generative process:Find distributions over parameters and latent variables/topics H,given observations V and Dirichlet parameters A
= Determine posterior distribution p(H, |V ,A)Intractability approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)
In topic models: Marginalise parameters (Collapsed GS)Sample topics Hi for each data point i in turn: Hi p(Hi |Hi,V ,A)
xm y w
V1 H1 2 H2 3
~rm |r ~m,x| ~y |
Gregor Heinrich A generic approach to topic models 18 / 35
-
Bayesian inference problem and Gibbs sampler
Bayesian inference: inversion of generative process:Find distributions over parameters and latent variables/topics H,given observations V and Dirichlet parameters A
= Determine posterior distribution p(H, |V ,A)Intractability approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)
In topic models: Marginalise parameters (Collapsed GS)Sample topics Hi for each data point i in turn: Hi p(Hi |Hi,V ,A)
~rm |rm
~m,x|w
~y | 1 H1 2 H2 3 V
? ? ?
? ?
p( , , , , | , A)Gregor Heinrich A generic approach to topic models 18 / 35
-
Bayesian inference problem and Gibbs sampler
Bayesian inference: inversion of generative process:Find distributions over parameters and latent variables/topics H,given observations V and Dirichlet parameters A
= Determine posterior distribution p(H, |V ,A)Intractability approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)
In topic models: Marginalise parameters (Collapsed GS)Sample topics Hi for each data point i in turn: Hi p(Hi |Hi,V ,A)
~rm |rm
~m,x|w
~y | 1 H1 2 H2 3 V
? ? ?
? ?
p( , , , , | , A)Gregor Heinrich A generic approach to topic models 18 / 35
-
Bayesian inference problem and Gibbs sampler
Bayesian inference: inversion of generative process:Find distributions over parameters and latent variables/topics H,given observations V and Dirichlet parameters A
= Determine posterior distribution p(H, |V ,A)Intractability approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)
In topic models: Marginalise parameters (Collapsed GS)Sample topics Hi for each data point i in turn: Hi p(Hi |Hi,V ,A)
~rm |rm
~m,x|w
~y | ? ?
H1i H2i Vp( , | , A)Hi ,H1i ,H2i
Gregor Heinrich A generic approach to topic models 18 / 35
-
Sampling distribution for NoMMs
xm y w~rm |r ~m,x| ~y |
Gibbs sampler can be generically derived (Heinrich 2009)
Typical case: Quotients of factors over levels `:
p(Hi|Hi,V ,A) `
nik,t + t nik,t +
[`]nk,t = count of co-occurrences between input and output values of alevel (components and samples)
More complex variants covered by q(k, t) ,beta({nk,t}Tt=1 + )beta({nik,t}Tt=1 + )
Gregor Heinrich A generic approach to topic models 19 / 35
-
Sampling distribution for NoMMs
xm y w~rm |r ~m,x| ~y |
Gibbs sampler can be generically derived (Heinrich 2009)
Typical case: Quotients of factors over levels `:
p(Hi|Hi,V ,A) `
nik,t + t nik,t +
[`]nk,t = count of co-occurrences between input and output values of alevel (components and samples)
More complex variants covered by q(k, t) ,beta({nk,t}Tt=1 + )beta({nik,t}Tt=1 + )
Gregor Heinrich A generic approach to topic models 19 / 35
-
Sampling distribution for NoMMs
xm y wq(m, x) q((m,x), y) q(y,w)
Gibbs sampler can be generically derived (Heinrich 2009)
Typical case: Quotients of factors over levels `:
p(Hi|Hi,V ,A) `
nik,t + t nik,t +
[`] = `
[q(k, t)
][`]nk,t = count of co-occurrences between input and output values of alevel (components and samples)
More complex variants covered by q(k, t) ,beta({nk,t}Tt=1 + )beta({nik,t}Tt=1 + )
Gregor Heinrich A generic approach to topic models 19 / 35
-
Typology of NoMM substructures
~a |
~b |
a
b~k |
z
a~z |
b
~a |~z |
c
c
~a |
~b |
a
b~z |
c
x
z1
yza
~z |b
~a |
N1. Dirichletmultinomial
v
za~z |~ca
b
x
a~x |
b
~a | y~y |
c
q(a, z) q(z, b) q(a, x y) q(x, b) q(y, c) q(a, x) q(b, y) q(k, c), k = f (x, y)
ca,z q(z, b) q(a, z) q(z, b) q(z, c) q(a, z1) q(b, z2) q(z1, c c) q(z2, c c)
E2. Autonomous edges C2. Combined indices
N2. Observed parameters E3. Coupled edges C3. Interleaved indices
z2
NoMM substructures: Nodes, edges/branches, componentindices/merging of edges:
Representation via q-functions and likelihoodMultiple samples per data point: q(a, x y) for respective level
Library incl. additional structures: alternative distributions,regression, aggregation etc. q-functions + other factors
Gregor Heinrich A generic approach to topic models 20 / 35
-
Implementation: Gibbs meta-sampler
NoMM codegenerator
C/Java codeprototype
Java VM native platform
optimise
codetemplates
generate
Java modelinstance
C/Java codemodule
validate deploydata =compile
data =
specificationtopic model
Code generator for topic models in Java and C
Separation of knowledge domains: topic model applications vs.machine learning vs. computing architecture
Gregor Heinrich A generic approach to topic models 21 / 35
-
Implementation: Gibbs meta-sampler
codegenerator
C/Java codeprototype
Java VM native platform
optimise
codetemplates
generate
Java modelinstance
C/Java codemodule
validate deploydata =compile
data =
specificationtopic model
NoMM
Code generator for topic models in Java and C
Separation of knowledge domains: topic model applications vs.machine learning vs. computing architecture
Gregor Heinrich A generic approach to topic models 21 / 35
-
Example NoMM script and generated kernel: hPAM2
~k | wmn
xmn=x
[V]
~m |m[M]
[Y]
[X]ymn=y
~xm,x|xxm[M] x = 0 : k = 0x , 0, y = 0 : k = 1 + xx, y , 0 : k = 1 + X + y
model = HPAM2
description:
Hierarchical PAM model 2 (HPAM2)
sequences:
# variables sampled for each (m,n)
w, x, y : m, n
network:
# each line one NoMM node
m >> theta | alpha >> x
m,x >> thetax | alphax[x] >> y
x,y >> phi[k] >> w
# java code to assign k
k : {
if (x==0) { k = 0; }
else if (y==0) k = 1 + x;
else k = 1 + X + y;
}.
// hidden edge
for (hx = 0; hx < X; hx++) {
// hidden edge
for (hy = 0; hy < Y; hy++) {
mxsel = X * m + hx;
mxjsel = hx;
if (hx == 0)
ksel = 0;
else if (hy == 0)
ksel = 1 + hx;
else
ksel = 1 + X + hy;
pp[hx][hy] = (nmx[m][hx] + alpha[hx])
* (nmxy[mxsel][hy] + alphax[mxjsel][hy])
/ (nmxysum[mxsel] + alphaxsum[mxjsel])
* (nkw[ksel][w[m][n]] + beta)
/ (nkwsum[ksel] + betasum);
psum += pp[hx][hy];
} // for h
} // for h
sub
root
sup
sub sub
supwords
words
words
words words words
Gregor Heinrich A generic approach to topic models 22 / 35
-
Example NoMM script and generated kernel: hPAM2
~k | wmn
xmn=x
[V]
~m |m[M]
[Y]
[X]ymn=y
~xm,x|xxm[M] x = 0 : k = 0x , 0, y = 0 : k = 1 + xx, y , 0 : k = 1 + X + y
model = HPAM2
description:
Hierarchical PAM model 2 (HPAM2)
sequences:
# variables sampled for each (m,n)
w, x, y : m, n
network:
# each line one NoMM node
m >> theta | alpha >> x
m,x >> thetax | alphax[x] >> y
x,y >> phi[k] >> w
# java code to assign k
k : {
if (x==0) { k = 0; }
else if (y==0) k = 1 + x;
else k = 1 + X + y;
}.
// hidden edge
for (hx = 0; hx < X; hx++) {
// hidden edge
for (hy = 0; hy < Y; hy++) {
mxsel = X * m + hx;
mxjsel = hx;
if (hx == 0)
ksel = 0;
else if (hy == 0)
ksel = 1 + hx;
else
ksel = 1 + X + hy;
pp[hx][hy] = (nmx[m][hx] + alpha[hx])
* (nmxy[mxsel][hy] + alphax[mxjsel][hy])
/ (nmxysum[mxsel] + alphaxsum[mxjsel])
* (nkw[ksel][w[m][n]] + beta)
/ (nkwsum[ksel] + betasum);
psum += pp[hx][hy];
} // for h
} // for h
Gregor Heinrich A generic approach to topic models 22 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 1
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 5
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 10
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 15
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 20
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 30
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 40
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 50
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 60
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 80
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 100
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 120
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 150
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 200
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 300
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
DocumentTopic distribution in Gibbs sampler
Iteration 500, converged stationary state
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich A generic approach to topic models 23 / 35
-
Fast sampling: Hybrid scaling methods
1 10 100 1000iterations
depend.indep.
5000
perplexity
PAM4
model dim. speedup
LDA 500 30.2
PAM4 40 40 7.4PAM4 24.1
PAM4 49.8
par. indep.ser.
40 4040 40
Serial and parallel scaling methods:Generalised results for LDA to generic NoMMs, specifically (Porteouset al. 2008; Newman et al. 2009) + novel approach
Problem: Sampling space for stat. dependent variables: K L . . . Independence assumption: Separate samplers with dimensions
K + L + . . . K L . . .Empirical result: Iterations , but topic quality comparable
Hybrid approaches with independent samplers highly effectiveImplementation: complexity covered by meta-sampler
Gregor Heinrich A generic approach to topic models 24 / 35
-
Overview
Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook
Gregor Heinrich A generic approach to topic models 25 / 35
-
How can generic models be applied to data in virtual communities?
Gregor Heinrich A generic approach to topic models 26 / 35
-
NoMM design process~a |
~b |
a
b~k |
z
a~z |
b
~a |~z |
c
c
~a |
~b |
a
b~z |
c
x
z1
yza
~z |b
~a |
N1. Dirichletmultinomial
v
za~z |~ca
b
x
a~x |
b
~a | y~y |
c
q(a, z) q(z, b) q(a, x y) q(x, b) q(y, c) q(a, x) q(b, y) q(k, c), k = f (x, y)
ca,z q(z, b) q(a, z) q(z, b) q(z, c) q(a, z1) q(b, z2) q(z1, c c) q(z2, c c)
E2. Autonomous edges C2. Combined indices
N2. Observed parameters E3. Coupled edges C3. Interleaved indices
z2
Typology Library of NoMM substructures Idea: Construct models from simple substructures that connect
terminal nodes:Terminal nodes multimodal data (virtual communities...)Substructures relationships in data; latent semantics
Process:Assumptions on dependencies in dataIterative association to structures in model (usage of typology)
Gibbs distribution known! model behaviour: q(x, y) = rich get richerImplementation and test with Gibbs meta-sampler; possibly iteration
Gregor Heinrich A generic approach to topic models 27 / 35
-
NoMM design process
1 2 3 4 51 2 3 4 5Define Definemodelling task Createand metricsevidence model terminals Formulatemodelassumptions Composemodel andpredict properties
6 7 8 9 106 7 8 9 10Write Generateand adapt ImplementNoMM script target metric Evaluatebased ontest corpus Optimiseand integrate fortarget platformGibbs samplerTypology Library of NoMM substructures
Idea: Construct models from simple substructures that connectterminal nodes:
Terminal nodes multimodal data (virtual communities...)Substructures relationships in data; latent semantics
Process:Assumptions on dependencies in dataIterative association to structures in model (usage of typology)
Gibbs distribution known! model behaviour: q(x, y) = rich get richerImplementation and test with Gibbs meta-sampler; possibly iteration
Gregor Heinrich A generic approach to topic models 27 / 35
-
Application: Expert finding with tag annotations
authorship
annotationRetrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the
authorship
Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents expertsFrequently documents with additional annotations, here: tags
Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong
Connection of tags and experts via topics(1) Data: For each document m: text ~wm, authors ~am, tags ~cm(2) Goal: Tag query ~c : p(~c | a) = max, word query ~w : p(~w | a) = max(3) Terminal nodes: Authors in input, words and tags in output
Gregor Heinrich A generic approach to topic models 28 / 35
-
Application: Expert finding with tag annotations
Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the
14~wm 13
5 14 33AB
12
TH
AB
TH
5
33
~am ~cm
document m
authors tags
Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents expertsFrequently documents with additional annotations, here: tags
Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong
Connection of tags and experts via topics(1) Data: For each document m: text ~wm, authors ~am, tags ~cm(2) Goal: Tag query ~c : p(~c | a) = max, word query ~w : p(~w | a) = max(3) Terminal nodes: Authors in input, words and tags in output
Gregor Heinrich A generic approach to topic models 28 / 35
-
Application: Expert finding with tag annotations
Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the
14~wm 13
5 14 33AB
12
TH
AB
TH
5
33
~am ~cm
document m
authors tags
Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents expertsFrequently documents with additional annotations, here: tags
Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong
Connection of tags and experts via topics(1) Data: For each document m: text ~wm, authors ~am, tags ~cm(2) Goal: Tag query ~c : p(~c | a) = max, word query ~w : p(~w | a) = max(3) Terminal nodes: Authors in input, words and tags in output
Gregor Heinrich A generic approach to topic models 28 / 35
-
Application: Expert finding with tag annotations
Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the
14~wm 13
5 14 33AB
12
TH
AB
TH
5
33
~am ~cm
document m
authors tags
Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents expertsFrequently documents with additional annotations, here: tags
Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong
Connection of tags and experts via topics(1) Data: For each document m: text ~wm, authors ~am, tags ~cm(2) Goal: Tag query ~c : p(~c | a) = max, word query ~w : p(~w | a) = max(3) Terminal nodes: Authors in input, words and tags in output
Gregor Heinrich A generic approach to topic models 28 / 35
-
Model assumptions
Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the
14~wm 13
5 14 33AB
12
TH
AB
TH
5
33
~am ~cm
document m
authors tags
(4) Model assumptions:(a) Expertise of an author is weighted with the portion of authorship(b) Semantics of expertise expressed by topics z. Each author has a
single field of expertise (topic distribution).(c) Semantics of tags expressed by topics y
Gregor Heinrich A generic approach to topic models 29 / 35
-
Model assumptions
Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the
14~wm 13
5 14 33AB
12
TH
AB
TH
5
33
~am ~cm
document m
authors tags
(4) Model assumptions:(a) Expertise of an author is weighted with the portion of authorship(b) Semantics of expertise expressed by topics z. Each author has a
single field of expertise (topic distribution).(c) Semantics of tags expressed by topics y
Gregor Heinrich A generic approach to topic models 29 / 35
-
Model assumptions
Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the
14~wm 13
5 14 33
AB
TH
5
33~cm
AB
12
TH
~am
document m
authors tags
(4) Model assumptions:(a) Expertise of an author is weighted with the portion of authorship(b) Semantics of expertise expressed by topics z. Each author has a
single field of expertise (topic distribution).(c) Semantics of tags expressed by topics y
Gregor Heinrich A generic approach to topic models 29 / 35
-
Model construction
~cm
~am wm,n[M,Nm]
p(. . . |~a, ~w, ~c) . . .
authors word
tags
(5) Model construction: (a) Start with terminal nodes (from step 3)
Gregor Heinrich A generic approach to topic models 30 / 35
-
Model construction
~cm
. . .xm,n
~amm wm,n
[M,Nm]
p(x, . . . |) am,x q(x, . . . ) . . .
document author word
tags
(b) Authorship ~am given as observed distribution node sampels author x of a word
Gregor Heinrich A generic approach to topic models 30 / 35
-
Model construction
~cm
q(x, z)xm,n
~amm zm,n wm,n
[M,Nm]
p(x, z, . . . |) am,x q(x, z) . . .
document author word topic word
tags
(c) Each author has only a single field of expertise (topic distribution) q(x, z) associates (word-)topics with sampled authors x (cf. ATM)
Gregor Heinrich A generic approach to topic models 30 / 35
-
Model construction
~cm
q(x, z)xm,n
~amm
q(z,w)zm,n wm,n
[M,Nm]
p(x, z, . . . |) am,x q(x, z) q(z,w) . . .
document author word topic word
tags
(d) Topic distribution over terms connect z and w via q(z,w)
Gregor Heinrich A generic approach to topic models 30 / 35
-
Model construction
cm, jq(x,zy)
xm,nj~am
mq(z,w)zm,n
wm,n[M,Nm]
p(x, z, y |) am,x q(x, z y) q(z,w) q(y, c)
q(y, c) [M, Jm]ym, j
document author
word topic
tag topic
word
tag
(e) Introduce tag topics ym,j for cm,j as distributions over tagsq(x, z y) overlays values for z and y
Gregor Heinrich A generic approach to topic models 30 / 35
-
Model construction
cm, jq(x,zy)
xm,n~am
mq(z,w)zm,n
wm,n[M,Nm]
p(x, z, y |) am,x q(x, z y) q(z,w) q(y, c)
q(y, c) [M, Jm]ym, j
document author
word topic
tag topic
word
tag
(e) Introduce tag topics ym,j for cm,j as distributions over tagsq(x, z y) overlays values for z and y
Gregor Heinrich A generic approach to topic models 30 / 35
-
Model construction
cm, jq(x,zy)
xm, j~am
mq(z,w)zm,n
wm,n[M,Nm]
p(x, z, y |) am,x q(x, z y) q(z,w) q(y, c)
q(y, c) [M, Jm]ym, j
document author
word topic
tag topic
word
tag
(e) Introduce tag topics ym,j for cm,j as distributions over tagsq(x, z y) overlays values for z and y
Gregor Heinrich A generic approach to topic models 30 / 35
-
Model construction ordinary approach
wm,n
zm,n
cm, j
ym, j
~am
~k ~k
~x
xm, j xm,n
m [1,M]n [1,Nm]j [1, Jm] k [1,K]k [1,K]
x [1, A]
Experttagtopic model
(Heinrich 2011)
Gregor Heinrich A generic approach to topic models 30 / 35
-
Experttagtopic model: Evaluation
ATM ETT
AP@
10
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ETT
word queries tag queries
ATM ETT-500
-450
-400
-350
-300
-250
-200
-150
coherencescore
NIPS Corpus: 2.3 million words, 2037 authors, 165 tagsRetrieval: Average Precision @10:
Term queries: ETT > ATMTag queries: Similarly good AP values
Topic coherence (Mimno et al. 2011): ETT > ATMSemi-supervised learning: Tag queries retrieve items without tags
Gregor Heinrich A generic approach to topic models 31 / 35
-
Overview
Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook
Gregor Heinrich A generic approach to topic models 32 / 35
-
Conclusions: Resesarch contributions
Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler
Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler
Design process based on typology of NoMM substructures
Application to virtual communities: Experttagtopic model forexpert finding with annotated documentsContribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios
Gregor Heinrich A generic approach to topic models 33 / 35
-
Conclusions: Resesarch contributions
Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler
Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler
+ Variational Inference for NoMMs (Heinrich and Goesele 2009)
Design process based on typology of NoMM substructures
Application to virtual communities: Experttagtopic model forexpert finding with annotated documentsContribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios
Gregor Heinrich A generic approach to topic models 33 / 35
-
Conclusions: Resesarch contributions
Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler
Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler
+ Variational Inference for NoMMs (Heinrich and Goesele 2009)
Design process based on typology of NoMM substructures+ AMQ model: Meta-model for virtual communities as formal basis for
scenario modelling (Heinrich 2010)
Application to virtual communities: Experttagtopic model forexpert finding with annotated documentsContribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios
Gregor Heinrich A generic approach to topic models 33 / 35
-
Conclusions: Resesarch contributions
Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler
Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler
+ Variational Inference for NoMMs (Heinrich and Goesele 2009)
Design process based on typology of NoMM substructures+ AMQ model: Meta-model for virtual communities as formal basis for
scenario modelling (Heinrich 2010)
Application to virtual communities: Experttagtopic model forexpert finding with annotated documents
+ Models ETT2 and ETT3 incl. novel NoMM structure; retrievalapproaches (Heinrich 2011b)
Contribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios
Gregor Heinrich A generic approach to topic models 33 / 35
-
Conclusions: Resesarch contributions
Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler
Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler
+ Variational Inference for NoMMs (Heinrich and Goesele 2009)Design process based on typology of NoMM substructures
+ AMQ model: Meta-model for virtual communities as formal basis forscenario modelling (Heinrich 2010)
Application to virtual communities: Experttagtopic model forexpert finding with annotated documents
+ Models ETT2 and ETT3 incl. novel NoMM structure; retrievalapproaches (Heinrich 2011b)
+ Graph-based expert search using ETT models: Integration ofexplorative search/browsing and distributions from topic models
Contribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios
Gregor Heinrich A generic approach to topic models 33 / 35
-
Conclusions: Resesarch contributions4/15/12 1:47 AM
Page 1 of 2file:///data/workspace/knowceans-freshmind-lucene3/fmica.svg
1.
2.
4.
8.
3.
6.
9.
7.
10.
5.
authors
authors
authorsauthors
cites
authorsauthors
cites
cites
authors
authors
authors
authors
cites
cites
authors
cites
authors
cites
authors
cites
cites
authorscites
cites
authors
authors
authors
authorscites
authorsauthors
cites cites
authors
cites
authors
citescites authorsauthors
authors
authors
authors
cites
cites
authors
[blind source separation] [independent component analysis]
[blind source separation] [independent component analysis]
[blind source separation] [independent component analysis]
[blind source separation] [independent component analysis]
[blind source separation] [independent component analysis]
[blind source separation] [independent component analysis]
[blind source separation] [independent component analysis]
[blind source separation] [independent component analysis]
[blind source separation] [independent component analysis]
[blind source separation] [independent component analysis]
Yang_H1
[blind source separation] [independent component analysis]
Cichocki_A
Bell_A2
Rokni_U
Lee_T1
Parra_L1
Shouval_H
Oja_E1
Lin_J
Hyvarinen_A
A New Learning Algorithm for Blind Signal Separation
9
Independent Component Analysis of Eiectroencephalographic Data
4
Unmixing Hyperspectral Data,
4
Factorizing Multivariate Function Classes,
1
A Non-linear Information Maximisation Algorithm that Performs Blind Separation
4
Maximum Likelihood Blind Source Separation A Context-Sensitive Generalization of ICA,
4
Semiparametric Approach to Multichannel Blind Deconvolution of Nonminimum Phase Systems,
2
One-unit Learning Rules for Independent Component Analysis,
Extended ICA Removes Artifacts from Electroencephalographic Recordings,
8
Multiplicative Updating Rule for Blind Separation Derived from the Method of Scoring,
1
Algorithms for Independent Components Analysis and Higher Order Statistics,
2
Edges are the "Independent Components" of Natural Scenes,
2
Blind Separation of Delayed and Convolved Sources,
2
The Efficiency and the Robustness of Natural Gradient Descent Learning Rule,
2
Receptive Field Formation in Natural Scene Environments Comparison of Single Cell Learning Rules,
4
New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit,
Search for Information Bearing Components in Speech,
1
Source Separation and Density Estimation by Faithful Equivariant SOM,
2
Data Visualization and Feature Selection New Algorithms for Nongaussian Data,
1
Independent Component Analysis for Identification of Artifacts in Magnetoencephalographic Recordings,
5
Unsupervised Classification with Non-Gaussian Mixture Models Using ICA,
3
Sparse Code Shrinkage.' Denoising by Nonlinear Maximum Likelihood Estimation,
2
Symplectic Nonlinear Component Analysis
Blind Separation of Filtered Sources Using State-Space Approach,
1
[blind source separation] [independent component analysis]
[blind source separation] [independent component analysis]
[blind source separation] [independent component analysis]
Figure: ETT1: Expert search in community browser
Gregor Heinrich A generic approach to topic models 33 / 35
-
Conclusions: Resesarch contributions
Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler
Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler
+ Variational Inference for NoMMs (Heinrich and Goesele 2009)Design process based on typology of NoMM substructures
+ AMQ model: Meta-model for virtual communities as formal basis forscenario modelling (Heinrich 2010)
Application to virtual communities: Experttagtopic model forexpert finding with annotated documents
+ Models ETT2 and ETT3 incl. novel NoMM structure; retrievalapproaches (Heinrich 2011b)
+ Graph-based expert search using ETT models: Integration ofexplorative search/browsing and distributions from topic models
Contribution to facilitated model-based construction of topicmodels, specifically for virtual communities and other multimodalscenarios
Gregor Heinrich A generic approach to topic models 33 / 35
-
Outlook
New applications and NoMM structures, e.g., time as variableAlternative inference methods:
Generic Collapsed Variational Bayes (Teh et al. 2007): Structuresimilar to Collapsed Gibbs-SamplerNon-parametric methods: Learning model dimensions using Dirichletor PitmanYor process priors (Teh et al. 2004; Buntine and Hutter2010), NoMM polymorphism (Heinrich 2011a)
Improved support in design process:Data-driven design: Search over model structures to obtain bestmodel for data setArchitecture-specific Gibbs meta-sampler, e.g., massively-parallel orFPGA, cf. (Heinrich et al. 2011)
Integration with interactive user interfaces: Models can be createdon the fly, e.g., for visual analytics
Gregor Heinrich A generic approach to topic models 34 / 35
-
Thank you!
Q + A
Gregor Heinrich A generic approach to topic models 35 / 35
-
References I
References
Barnard, K., P. Duygulu, D. Forsyth, N. de Freitas, D. Blei, and M. Jordan (2003, August).Matching words and pictures.JMLR Special Issue on Machine Learning Methods for Text and Images 3(6), 11071136.
Bellegarda, J. (2000, August).Exploiting latent semantic information in statistical language modeling.Proc. IEEE 88(8), 12791296.
Blei, D., A. Ng, and M. Jordan (2003, January).Latent Dirichlet allocation.Journal of Machine Learning Research 3, 9931022.
Buntine, W. and M. Hutter (2010).A Bayesian review of the Poisson-Dirichlet process.arXiv:1007.0296v1 [math.ST].
Chang, J., J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei (2009).Reading tea leaves: How humans interpret topic models.In Proc. Neural Information Processing Systems (NIPS).
Gregor Heinrich A generic approach to topic models 36 / 35
-
References II
Dietz, L., S. Bickel, and T. Scheffer (2007, June).Unsupervised prediction of citation influences.In Proceedings of the 24th International Conference on Machine Learning, Corvallis, Oregon,
USA.
Heinrich, G. (2009).A generic approach to topic models.In Proc. European Conf. on Mach. Learn. / Principles and Pract. of Know. Discov. in Databases
(ECML/PKDD), Part 1, pp. 517532.
Heinrich, G. (2010).Actorsmediaqualities: a generic model for information retrieval in virtual communities.In Proc. 7th International Workshop on Innovative Internet Community Systems (I2CS 2007), part
of I2CS Jubilee proceedings, Lecture Notes in Informatics, GI.
Heinrich, G. (2011a, March).Infinite LDA Implementing the HDP with minimum code complexity.Technical note TN2011/1, arbylon.net.
Heinrich, G. (2011b).Typology of mixed-membership models: Towards a design method.In Proc. European Conf. on Mach. Learn. / Principles and Pract. of Know. Discov. in Databases
(ECML/PKDD).
Gregor Heinrich A generic approach to topic models 37 / 35
-
References III
Heinrich, G. and M. Goesele (2009).Variational Bayes for generic topic models.In Proc. 32nd Annual German Conference on Artificial Intelligence (KI2009).
Heinrich, G., J. Kindermann, C. Lauth, G. Paa, and J. Sanchez-Monzon (2005).Investigating word correlation at different scopes a latent concept approach.In Workshop Lexical Ontology Learning at Int. Conf. Mach. Learning.
Heinrich, G., F. Logemann, V. Hahn, C. Jung, G. Figueiredo, and W. Luk (2011).HW/SW co-design for heterogeneous multi-core platforms: The hArtes toolchain, Chapter Audio
array processing for telepresence, pp. 173207.Springer.
Li, W., D. Blei, and A. McCallum (2007).Mixtures of hierarchical topics with pachinko allocation.In International Conference on Machine Learning.
Li, W. and A. McCallum (2006).Pachinko allocation: DAG-structured mixture models of topic correlations.In ICML 06: Proceedings of the 23rd international conference on Machine learning, New York,
NY, USA, pp. 577584. ACM.
Gregor Heinrich A generic approach to topic models 38 / 35
-
References IV
Mimno, D., H. M. Wallach, E. Talley, M. Leenders, and A. McCallum (2011, July).Optimizing semantic coherence in topic models.In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing,
Edinburgh, UK, pp. 262272.
Newman, D., A. Asuncion, P. Smyth, and M. Welling (2009, August).Distributed algorithms for topic models.JMLR 10, 18011828.
Porteous, I., D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling (2008).Fast collapsed Gibbs sampling for latent Dirichlet allocation.In KDD 08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge
discovery and data mining, New York, NY, USA, pp. 569577. ACM.
Rosen-Zvi, M., T. Griffiths, M. Steyvers, and P. Smyth (2004).The author-topic model for authors and documents.In Proc. 20th Conference on Uncertainty in Artificial Intelligence (UAI).
Teh, Y., M. Jordan, M. Beal, and D. Blei (2004).Hierarchical Dirichlet processes.Technical Report 653, Department of Statistics, University of California at Berkeley.
Teh, Y. W., D. Newman, and M. Welling (2007).A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation.In Advances in Neural Information Processing Systems, Volume 19.
Gregor Heinrich A generic approach to topic models 39 / 35
-
Appendix
Gregor Heinrich A generic approach to topic models 40 / 35
-
Example: Text mining for semantic clusters
Topic label Dominant terms according to k,t = p(term |topic)Bundesliga FC SC Munchen Borussia SV VfL Kickers SpVgg Uhr Koln Bochum Freiburg VfB
Eintracht Bayern Hamburger Bayern+Munchen
Polizei / Unfall Polizei verletzt schwer Auto Unfall Fahrer Angaben schwer+verletzt Menschen Wa-gen Verletzungen Lawine Mann vier Meter Strae
Tschetschenien Rebellen russischen Grosny russische Tschetschenien Truppen Kaukasus MoskauAngaben Interfax tschetschenischen Agentur
Politik / Hessen FDP Koch Hessen CDU Koalition Gerhardt Wagner Liberalen hessischen Wester-welle Wolfgang Roland+Koch Wolfgang+Gerhardt
Wetter Grad Temperaturen Regen Schnee Suden Norden Sonne Wetter Wolken Deutsch-land zwischen Nacht Wetterdienst Wind
Politik / Kroatien Parlament Partei Stimmen Mehrheit Wahlen Wahl Opposition Kroatien PrasidentParlamentswahlen Mesic Abstimmung HDZ
Die Grunen Grunen Parteitag Atomausstieg Trittin Grune Partei Trennung Mandat Ausstieg AmtRoestel Jahren Muller Radcke Koalition
Russische Politik Russland Putin Moskau russischen russische Jelzin Wladimir Tschetschenien Rus-slands Wladimir+Putin Kreml Boris Prasidenten
Polizei / Schulen Polizei Schulen Schuler Tater Polizisten Schule Tat Lehrer erschossen BeamtenMann Polizist Beamte verletzt Waffe
Bigram-LDA: Topics from 18400 dpa news messages, Jan. 2000 (Heinrich et al. 2005)
Gregor Heinrich A generic approach to topic models 41 / 35
-
Notation: Bayesian network vs. NoMM levels
~k
xi
K
ki
~k | [T ]xiki
[K]
parameters + hyperparameters nodes ( |)variables ki, xi edges ki, xi
plates (i.i.d. repetitions) i, k indexes i + dimensions k
Gregor Heinrich A generic approach to topic models 42 / 35
-
NoMM representation: Variable dependencies
~k ~k ~k
~k
vi
~k
~k
`=1 `=2 `=3 `=4 `=5 `=6
~k
A
X
vi
`=8`=7
~k
vihi
hihi
hi hi
~k |
`=1
`=2 `=3
`=4
`=5 `=6
~k | ~k |~k |~k | ~k |
~k
`=8
h1i
h2i
h4ih3i v5i h6i
v8i
`=7
~k |v7i
k1i
k2i
Gregor Heinrich A generic approach to topic models 43 / 35
-
Collapsed Gibbs sampler
~x (0)
x2
x1
~x
p(~x |V)
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
~x (0)
x2
x1
~x ~x continuous, i [1, 2]H discrete, i [1,W]
l
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
x2
x1i = 1
~xp(x1 | x2)
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
x2
x1i = 1
~x
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
x2
x1i = 2
~x
p(x2 | x1)
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
x2
x1i = 2
~x
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
x2
x1i = 1
~x
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
x2
x1i = 1
~x
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
x2
x1i = 2
~x
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
x2
x1i = 1
~x
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
x2
x1i = 2
~x
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
x2
x1i = 1
~x
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
x2
x1i = 2
~x
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Collapsed Gibbs sampler
x2
x1i = 1
~x
Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters correlate with H marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, Hi, and observed, V:
Hi p(Hi |Hi,V ,A) . (1)Stationary state: full conditional distribution (1) simulates posterior
Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)
Gregor Heinrich A generic approach to topic models 44 / 35
-
Generic topic models: Generative process
~j
~kK
J
xi
xi
Ixi
...
X X
A
Generative process on level `:
xi Mult(xi | ~k) ~k Dir(~k | ~j) (2)k = fk(parents(xi), i) j = fj(known parents(xi), i) . (3)
Gregor Heinrich A generic approach to topic models 45 / 35
-
Generic topic models: Complete-data likelihood
Likelihood of all hidden and visible data X = {H,V} and parameters :p(X, |A) =
`L
[ i
p(xi,out|~xi,in) data items Discrete unionsq
k
p(~k|~) components Dirichlet 4
][`]
levels
=`L
[k
f (~k, ~nk, ~)][`]
~nk = (nk,1, nk,2, ) (4)
Product dependent on co-occurrences nk,t between input and outputvalues, xi,in=k and xi,out=t, on each level `
There are variants to component selection xi,in=k
There are mixture node variants, e.g., observed components
Gregor Heinrich A generic approach to topic models 46 / 35
-
Generic topic models: Complete-data likelihood
The conjugacy between the multinomial and Dirichlet distributions ofmodel levels leads to a simple complete-data likelihood:
p(X, |A) =`
i
Mult(x`i |`, k`i )k
Dir(~`k | ~`j ) (5)
=`
i
ki,xi
k
1B(~j)
t
j1k,xi
` (6)=
`
k
1B(~j)
t
j+nk,t1k,xi
` (7)=
`
k
B(~nk + ~j)B(~j)
Dir(~k |~nk + ~j)` (8)
where brackets []` enclose a particular level `.nk,t is how often k and t co-occur.
Gregor Heinrich A generic approach to topic models 47 / 35
-
Inference: Generic full conditionals
Gibbs full conditionals are derived for groups of dependent hidden edges,Hdi Hd X and surrounding edges Sdi Sd considered observed. Alltokens co-located with a particular observation: Xdi = {Hdi , Sdi }.Full conditional via chain rule applied to (8) with integrated out:
p(Hdi |X\Hdi ,A) =p(Hdi , S
di |X\{Hdi , Sdi },A)
p(Sdi |X\{Hdi , Sdi },A)(9)
p(Xdi |X\Xdi ,A) =p(X |A)
p(X\Xdi |A)(10)
=`
[k
B(~nk + ~j)
B(~nk\Xdi + ~j)]`
(11)
`{Hd ,Sd}
[B(~nk + ~j)
B(~nk\Xdi + ~j)]`
(12)
H11H21
H12H22
H13H23
H14H24
Hd2
H1Hd
Gregor Heinrich A generic approach to topic models 48 / 35
-
Inference: q-functions
q(k, t) =B(~nk + ~j)
B(~nk\xdi + ~j)|xdi |=1=
nik,t + t nik,t +
|xdi |=2=
nk,t\xdi,1 + t nk,t\xdi,1 +
nk,t\xdi,2 + + (xdi,1xdi,2)
t nk,t\xdi,2 + + 1. . .
Gregor Heinrich A generic approach to topic models 49 / 35
-
q-functions: Polya urn and sampling weights
E{~k}
Figure: Polya urn: sampling with over-replacement.
q(k, t) ,B(~nk + )B(~nik + )
|t|=1=
ntik,t +
ntik + T= smoothed ratio of occurrences
t={u,v}=
nuik,u +
nuik + Tnvik,v + + (u v)nvik + T + 1
, q(k, u v). . .
Gregor Heinrich A generic approach to topic models 50 / 35
-
q-functions: Polya urn and sampling weights
E{~k}
Figure: Polya urn: sampling with over-replacement.
q(k, t) ,B(~nk + )B(~nik + )
|t|=1=
ntik,t +
ntik + T= smoothed ratio of occurrences
t={u,v}=
nuik,u +
nuik + Tnvik,v + + (u v)nvik + T + 1
, q(k, u v). . .
Gregor Heinrich A generic approach to topic models 50 / 35
-
q-functions: Polya urn and sampling weights
E{~k}
Figure: Polya urn: sampling with over-replacement.
q(k, t) ,B(~nk + )B(~nik + )
|t|=1=
ntik,t +
ntik + T= smoothed ratio of occurrences
t={u,v}=
nuik,u +
nuik + Tnvik,v + + (u v)nvik + T + 1
, q(k, u v). . .
Gregor Heinrich A generic approach to topic models 50 / 35
-
q-functions: Polya urn and sampling weights
E{~k}
Figure: Polya urn: sampling with over-replacement.
q(k, t) ,B(~nk + )B(~nik + )
|t|=1=
ntik,t +
ntik + T= smoothed ratio of occurrences
t={u,v}=
nuik,u +
nuik + Tnvik,v + + (u v)nvik + T + 1
, q(k, u v). . .
Gregor Heinrich A generic approach to topic models 50 / 35
-
q-functions: Polya urn and sampling weights
E{~k}
Figure: Polya urn and discrete parameters.
q(k, t) ,B(~nk + )B(~nik + )
|t|=1=
ntik,t +
ntik + T= smoothed ratio of occurrences
t={u,v}=
nuik,u +
nuik + Tnvik,v + + (u v)nvik + T + 1
, q(k, u v). . .
Gregor Heinrich A generic approach to topic models 51 / 35
-
q-functions: Polya urn and sampling weights
E{~k}
Figure: Polya urn and discrete parameters.
q(k, t) ,B(~nk + )B(~nik + )
|t|=1=
ntik,t +
ntik + T= smoothed ratio of occurrences
t={u,v}=
nuik,u +
nuik + Tnvik,v + + (u v)nvik + T + 1
, q(k, u v). . .8 6 4 +
8 6 5 +B( )
B( )
ti =
ti
Gregor Heinrich A generic approach to topic models 51 / 35
-
q-functions: Polya urn and sampling weights
E{~k}
Figure: Polya urn and discrete parameters.
q(k, t) ,B(~nk + )B(~nik + )
|t|=1=
ntik,t +
ntik + T= smoothed ratio of occurrences
t={u,v}=
nuik,u +
nuik + Tnvik,v + + (u v)nvik + T + 1
, q(k, u v). . .
8 6 4 +
8 +
ti
Gregor Heinrich A generic approach to topic models 51 / 35
-
q-functions: Polya urn and sampling weights
E{~k}
Figure: Polya urn and discrete parameters.
q(k, t) ,B(~nk + )B(~nik + )
|t|=1=
ntik,t +
ntik + T= smoothed ratio of occurrences
t={u,v}=
nuik,u +
nuik + Tnvik,v + + (u v)nvik + T + 1
, q(k, u v). . .7 6 4 +
8 6 5 +B( )
B( )
ti = { , }ui viui vi
Gregor Heinrich A generic approach to topic models 51 / 35
-
q-functions: Polya urn and sampling weights
E{~k}
Figure: Polya urn and discrete parameters.
q(k, t) ,B(~nk + )B(~nik + )
|t|=1=
ntik,t +
ntik + T= smoothed ratio of occurrences
t={u,v}=
nuik,u +
nuik + Tnvik,v + + (u v)nvik + T + 1
, q(k, u v). . .
ui
ui
Gregor Heinrich A generic approach to topic models 51 / 35
-
q-functions: Polya urn and sampling weights
E{~k}
Figure: Polya urn and discrete parameters.
q(k, t) ,B(~nk + )B(~nik + )
|t|=1=
ntik,t +
ntik + T= smoothed ratio of occurrences
t={u,v}=
nuik,u +
nuik + Tnvik,v + + (u v)nvik + T + 1
, q(k, u v). . .
ui vi
ui q(k, )u=v1
Gregor Heinrich A generic approach to topic models 51 / 35
-
q-functions: Polya urn and sampling weights
E{~k}
Figure: Polya urn and discrete parameters.
q(k, t) ,B(~nk + )B(~nik + )
|t|=1=
ntik,t +
ntik + T= smoothed ratio of occurrences
t={u,v}=
nuik,u +
nuik + Tnvik,v + + (u v)nvik + T + 1
, q(k, u v). . .
ui vi
ui q(k, )u=vvi1
Gregor Heinrich A generic approach to topic models 51 / 35
-
q-functions: Polya urn and sampling weights
E{~k}
Figure: Polya urn and discrete parameters.
q(k, t) ,B(~nk + )B(~nik + )
|t|=1=
ntik,t +
ntik + T= smoothed ratio of occurrences
t={u,v}=
nuik,u +
nuik + Tnvik,v + + (u v)nvik + T + 1
, q(k, u v). . .
ui vi
ui q(k, )u=v1
Gregor Heinrich A generic approach to topic models 51 / 35
-
NoMM substructure library: Gibbs weights and likelihood9.3. SUB-STRUCTURE LIBRARY 171ID.Name
Structure diagram Gibbs sampler weight w, Likelihood p for single token iModelled aspect, example models
N1,E1,C1.DirMultnodes,unbranched
=1 =2
ziaik |
bi,na | j
C1B: k = ziN1B: j = f (ai, i)
E1S(zi): n i
C1A: k = i
E1
N1A: j 1
w(z|) = q(a, z)q(z, b) E1S(zi): q(a, z)q(z, b1 b2 . . . bNi )p(b|a) = z a,zz,bMixture/admixture: LDA [Blei et al. 2003b], PAM [Li & McCallum 2006]; LDCC[Shafiei & Milios 2006] (E1S)
N2.
Observedparameters =1 =2
ziaiz |ca
bi w(z|ca, ) = ca,z q(z, b)p(b|a) = z ca,zz,bLabel distribution: ATM [Rosen-Zvi et al. 2004]
N3.
Non-Dirichletprior
=2
ziaiz |
bi
=1
a | a p(a |)
w(z|a, ) = p(zi |ai, a)q(z, b) ;M-step: estimate a [Blei & Laerty 2007]
p(b|a) = z a,zz,bAlternative distributions on the simplex: CTM [Blei & Laerty 2007]: a exp, N (,); TLM [Wallach 2008]: hierarchy of Dirichlet priors
N4.
Non-discreteoutput
=1
ziaia |
=2
z | vi
z p(z |)
w(z|, ) = q(a, z)p(vi | z) ; M-step: estimate zp(v|a) = z a,z p(v | z)Non-multinomial observ.: Corr-LDA [Barnard et al. 2003], GMM [McLachlan &Peel 2000]: p(v|) = N (x | ,)
N5+E4.
Aggregation
=2zi
z |bi
=3
=1
aia | zm
m| ,vm
m jm (z zj) regression
w(z|zm, vm, ) = q(a, z) q(z,w)N (vm |v m,2) ;M-step: estimate v,
2 |z,v (for linear regression, N5B)prediction: vm =
vm
Regression/supervised learning: Supervised LDA [Blei & McAulie 2007], Rela-tional topic model [Chang & Blei 2009]
E2.
Autonomousedges
=1
=2xi
ai jx |
bi
a |
=3
y jy |
c jE2A: i jE2B: i = j
w(x, y|) = q(a, xy)q(x, b)q(y, c) E2A: q(ai j, xiy j) = q(ai, xiy j)q(a j, xiy j)p(b, c|a) = x a,xx,by a,yy,cCommon mixture of causes: Multimodal LDA [Ramage et al. 2009]
E3.
Couplededges
=1
=2zi
aiz |
bi
a |
=3
z |ci
w(z|) = q(a, z)q(z, b)q(z, c)p(b, c|a) = z a,zz,bz,cCommon cause for observations: Hidden relational model (HRM) [Xu et al. 2006],Link-LDA [Erosheva et al. 2004]
C2.
Combinedindices
=1a |
=2
b |
ai
b j
=3
kci j
xi
y j |C2B: k = (xi, y j)C2A: k = (i, xi)
C2C: k = g(i, j, xi, y j)
w(x, y|) = q(a, x)q(b, y)q(k, c)p(c|a, b) = x[a,xy b,yk,c] , k = g(x, y, i, j)Dierent dependent causes, relation: hPAM [Li et al. 2007a], HRM [Xu et al.2006], Multi-LDA [Porteous et al. 2008a]
C3.
Interleavedindices
=1a |
=2
b |
ai
b j
=3
|ci j
xi
z
y j C3A: i jC3B: i = j
C3A: w(xi, y j |) = q(ai, xi)q(b j, y j)q(xi, ci c j)q(y j, ci c j)C3B: w(x, y|) = q(a, x)q(b, y)[q(x, c c)](xy) [q(x, c c)q(y, c c)]1(xy)C3A: p(c |a) = x a,xx,c, p(c |b) sim., C3B: p(c|a, b) = x a,xx,cy b,yy,cDierent causes, same eect: proposed here
C4.
Switch
=1a |
=2
b |
ai
bi
z |
z |
=3ci
=4
di
zi
si
s ?=0
s ?=1
w(z, s|) = q(a, z)[q(b, 1)q(z, c)](s1) [q(b, 2)q(z, d)](s2)p(c, d|a, b) = z a,z[b,0z,c + b,1z,d]Select complex submodels: Multi-grain LDA [Titov & McDonald 2008], Entity-topic model