[yang, downey and boyd-graber 2015] efficient methods for incorporating knowledge into topic models
Post on 06-Jan-2017
9.717 Views
Preview:
TRANSCRIPT
Efficient Methods for Incorporating Knowledge into Topic Models
[Yang, Downey and Boyd-Graber 2015]
2015/10/24
EMNLP 2015 Reading
@shuyo
Large-scale Topic Model
• In academic papers
– Up to 10^3 topics
• Industrial applications
– 10^5~10^6 topics!
– Search engines, online ads. and so on
– To capture infrequent topics
• This paper handles up to 500 topics...
really?
(Standard) LDA [Blei+ 2003, Griffiths+ 2004]
• "Conventional" Gibbs sampling
𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝ 𝑞𝑡 ≔ 𝑛𝑑,𝑡 + 𝛼𝑛𝑤,𝑡 + 𝛽
𝑛𝑡 + 𝑉𝛽
– 𝑇 : Topic size
– For 𝑈~𝒰 0, 𝑧𝑇 𝑞𝑧 , find 𝑡 s.t. 𝑧
𝑡−1 𝑞𝑧 < 𝑈 < 𝑧𝑡 𝑞𝑧
• For large T, it is computationally intensive
– 𝑛𝑤,𝑡 is sparse
– When T is very large, 𝑛𝑑,𝑡 is too e.g. 𝑇 = 106 > 𝑛𝑑
SparseLDA [Yao+ 2009]
𝑡
𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝
𝑡
𝛼𝛽
𝑛𝑡 + 𝑉𝛽+
𝑡
𝑛𝑑,𝑡𝛽
𝑛𝑡 + 𝑉𝛽+
𝑡
𝑛𝑑,𝑡 + 𝛼 𝑛𝑤,𝑡
𝑛𝑡 + 𝑉𝛽
• 𝑠 = 𝑡 𝑠𝑡 , 𝑟 = 𝑡 𝑟𝑡 , 𝑞 = 𝑡 𝑞𝑡
• For 𝑈~𝒰 0, 𝑠 + 𝑟 + 𝑞 ,
– If 0 < 𝑈 < 𝑠, find 𝑡 s.t. 𝑧𝑡−1 𝑠𝑧 < 𝑈 < 𝑧
𝑡 𝑠𝑧
– If 𝑠 < 𝑈 < 𝑠 + 𝑟, find 𝑡 s.t.𝑛𝑑,𝑡 > 0, 𝑧𝑡−1 𝑟𝑧 < 𝑈 − 𝑠 < 𝑧
𝑡 𝑟𝑧
– If 𝑠 + 𝑟 < 𝑈 < 𝑠 + 𝑟 + 𝑞,
find 𝑡 s.t.𝑛𝑤,𝑡 > 0, 𝑧𝑡−1 𝑞𝑧 < 𝑈 − 𝑠 − 𝑟 < 𝑧
𝑡 𝑞𝑧
• Faster because 𝑛𝑤,𝑡 and 𝑛𝑑,𝑡 are sparse
𝑠𝑡 𝑟𝑡 𝑞𝑡
independent on w, d dependent on d only
Leveraging Prior Knowledge
• The objective function of topic models
does not correlate with human
judgements
Word correlation prior knowledge
• Must-link
– “quarterback” and “fumble” are both
related to American football
• Cannot-link
– “fumble” and “bank” imply two different
topics
SC-LDA [Yang+ 2015]
• 𝑚 ∈ 𝑀 : Prior knowledge
• 𝑓𝑚(𝑧, 𝑤, 𝑑) : Potential function of prior
knowledge 𝑚 about word 𝑤 with topic
𝑧 in document 𝑑
• 𝜓 𝒛,𝑀 = 𝑧∈𝒛 exp 𝑓𝑚 𝑧, 𝑤, 𝑑
• 𝑃 𝒘, 𝒛 𝛼, 𝛽,𝑀 = 𝑃 𝒘 𝒛, 𝛽 𝑃 𝒛 𝛼 𝜓(𝒛,𝑀)
maybe ∝
maybe 𝑚 ∈ 𝑀, all 𝑤 with 𝑧 in all 𝑑
Sparse Constrained
Inference for SC-LDA
𝑉
Word correlation prior knowledge for SC-LDA
• 𝑓𝑚 𝑧, 𝑤, 𝑑 =
𝑢∈𝑀𝑤𝑚
logmax 𝜆, 𝑛𝑢,𝑧 +
𝑣∈𝑀𝑤𝑐
log1
max 𝜆, 𝑛𝑣,𝑧
– where 𝑀𝑤𝑚 : Must-link of 𝑤, 𝑀𝑤
𝑐 : Cannot-link of 𝑤
• 𝑃 𝑧 = 𝑡 𝒛−, 𝑤,𝑀 ∝𝛼𝛽
𝑛𝑡+𝑉𝛽+
𝑛𝑑,𝑡𝛽
𝑛𝑡+𝑉𝛽+
𝑛𝑑,𝑡+𝛼 𝑛𝑤,𝑡
𝑛𝑡+𝑉𝛽
𝑢∈𝑀𝑤𝑚
max 𝜆, 𝑛𝑢,𝑧
𝑣∈𝑀𝑤𝑐
1
max 𝜆, 𝑛𝑣,𝑧
Factor Graph
• They tell that prior knowledge is incorporated
“by adding a factor graph to encode prior
knowledge,” but it does not be drawn.
• The potential function 𝑓𝑚 𝑧, 𝑤, 𝑑 contains 𝑛𝑤,𝑧,
and 𝜑𝑤,𝑧 ∝ 𝑛𝑤,𝑧 + 𝛽.
• So the above model seems like Fig.b:
Fig.a Fig.b
[Ramage+ 2009] Labeled LDA
• Supervized LDA for labeled documents
– It is equivalent to SC-LDA with the
following potential function
𝑓𝑚 𝑧, 𝑤, 𝑑 = 1, if 𝑧 ∈ 𝑚𝑑
−∞, else
where 𝑚𝑑 specifies a label set of 𝑑
Experiments
• Baselines
– Dirichlet Forest-LDA [Andrzejewski+ 2009]
– Logic-LDA [Andrzejewski+ 2011]
– MRF-LDA [Xie+ 2015]
• Encodes word correlations in LDA as MRF
– SparseLDA
DATASET DOCS TYPE TOKEN(APPROX) Experiments
NIPS 1,500 12,419 1,900,000Word correlation
NYT-NEWS 3,000,000 102,660 100,000,000
20NG 18,828 21,514 1,946,000 Labeled docs
Generate Word Correlation
• Must-link
– Obtain synsets from WordNet 3.0
– Similarity between the word and its
synsets on word embedding from
word2vec is higher than threshold 0.2
• Cannot-link
– Nothing?
Convergence Speed
The average running time per iteration over 100 iterations, averaged over 5 seeds, on 20NG dataset.
Coherence [Mimno+ 2011]
• 𝐶 𝑡: 𝑉 𝑡 = 𝑚=2𝑀 𝑙=1
𝑚−1 log𝐹 𝑣𝑚
𝑡,𝑣𝑙
𝑡+𝜖
𝐹 𝑣𝑙𝑡
– 𝐹 𝑣 : document frequency of word type 𝑣
– 𝐹 𝑣, 𝑣′ :co-document frequency of word type 𝑣, 𝑣′
It means “include”?
𝜖 is very small like 10−12 [Röder+ 2015]
-39.1 -36.6
References
• [Yang+ 2015] Efficient Methods for Incorporating Knowledge into Topic Models
• [Blei+ 2003] Latent Dirichlet allocation.
• [Griffiths+ 2004] Finding scientific topics.
• [Yao+ 2009] Efficient methods for topic model inference on streaming document collections.
• [Ramage+ 2009] Labeled LDA: A supervised topic model for credit attribution in multilabeled corpora.
• [Andrzejewski+ 2009] Incorporating domain knowledge into topic modeling via Dirichlet forest priors.
• [Andrzejewski+ 2011] A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic.
• [Xie+ 2015] Incorporating word correlation knowledge into topic modeling.
• [Mimno+ 2011] Optimizing semantic coherence in topic models.
• [Röder+ 2015] Exploring the space of topic coherence measures.
top related