finding bursty topics from microblogs
DESCRIPTION
TRANSCRIPT
![Page 1: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/1.jpg)
FINDING BURSTY TOPICS FROM MICROBLOGS
Qiming Diao, Jing Jiang, Feida Zhu, Ee-Peng Lim
Living Analytics Research CentreSchool of Information SystemsSingapore Management University
![Page 2: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/2.jpg)
Abstract
To find topics that have bursty patterns on microblogs
two observations: 1. posts published around the same time
are more likely to have the same topic2. posts published by the same user are
more likely to have the same topic
![Page 3: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/3.jpg)
Introduction
Retrospective bursty event detection : Bursty detection: state machine Topic discovery: LDA
Two assumptions:1. If a post is about a global event, it is likely
to follow a global topic distribution that is time-dependent.
2. If a post is about a personal topic, it is likelyto follow a personal topic distribution that is more or less stable overtime.
![Page 4: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/4.jpg)
Method
Preliminaries d i , u i , t i , w i,j a bursty topic b as a word distribution
coupled with a bursty interval, denoted as ( ϕb,tb
s ,tbe )
Our task: to find meaningful bursty topics from the input text stream.
Our method: a topic discovery step and a burst detection step.
![Page 5: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/5.jpg)
Our Topic Model
Assume:1. C (latent) topics in the text stream,
where each topic c has a word distribution ϕc.
2. A background word distribution ϕB 3. A single post is most likely to be about
a single topic.4. A global topic distribution θt for each
time point t .
![Page 6: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/6.jpg)
Our focus is to find popular global events, we need to separate out these “personal” posts.
A time-independent topic distribution ηu for each user to capture her long term topical interests.
![Page 7: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/7.jpg)
![Page 8: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/8.jpg)
![Page 9: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/9.jpg)
Learning
Gibbs sampling :
M(0) ,M(1) , M(.)
M(c) , M(.)
M(c) , M(.)
E(v) , E(.)
M(v) , M(.)
![Page 10: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/10.jpg)
Learning
M(wi,j) , M(wi,j) , M(.)
![Page 11: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/11.jpg)
Burst Detection
Assume: A series of counts( mc1 , mc2 ,..., mcT)
representing the intensity of the topic at different time points.
These counts are generated by two Poisson distributions corresponding to a bursty state and a normal state.
![Page 12: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/12.jpg)
Burst Detection
σ 0 = 0 . 9 and σ 1 =0 . 6 for all topics.
Finally, a burst is marked by a consecutive subsequence of bursty states.
![Page 13: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/13.jpg)
Experiments
Data Set sampled 2892 users from this dataset and
extracted their tweets between September 1 and November 30, 2011(91 days in total).
the final dataset with 3,967,927 tweets and24,280,638 tokens.
![Page 14: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/14.jpg)
Ground Truth Generation top-30 bursty topics from each model two human judges to judge their quality by
assigning a score of either 0 or 1 Evaluation
We set the number of topics C to 80, α to 50/C and β to 0.01. Each model was run for 500 iterations of Gibbs sampling.
![Page 15: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/15.jpg)
![Page 16: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/16.jpg)
Sample Results and Discussions
![Page 17: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/17.jpg)
Sample Results and Discussions
![Page 18: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/18.jpg)
two case studies to demonstratethe effectiveness of our model
Effectiveness of Temporal Models: BothTimeLDA and TimeUserLDA tend to group posts published on the same day into the same topic.
![Page 19: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/19.jpg)
two case studies to demonstratethe effectiveness of our model
Effectiveness of User Models: it is important to filter out users’ “personal” posts in order to find meaningful global events.
![Page 20: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/20.jpg)
Conclusions
A new topic model that considers both thetemporal information of microblog posts and users’ personal interests.
A Poisson-based state machine to identify bursty periods from the topics discovered by our model.
![Page 21: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/21.jpg)
TM-LDA: EFFICIENT ONLINE MODELING OF THE LATENT TOPIC TRANSITIONS IN SOCIAL MEDIA
![Page 22: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/22.jpg)
ABSTRACT
TM-LDA learns the transition parameters among topics by minimizing the prediction error on topic distribution in subsequent postings.
We develop an efficient updating algorithm to adjust transition parameters, as new documents stream in.
![Page 23: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/23.jpg)
Challenges:1. to model and analyze latent topics in
social textual data;2. to adaptively update the models as the
massive social content streams in;3. to facilitate temporal-aware applications
of social media
![Page 24: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/24.jpg)
contribution
First, we propose a novel temporally-aware topic language model, TM-LDA, which captures the latent topic transitions in temporally-sequenced documents.
Second, we design an efficient algorithm to update TM-LDA which enables it to be performed on large scale data.
Finally, we evaluate TM-LDA against the static topic modeling method(LDA)
![Page 25: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/25.jpg)
METHODOLOGY
TM-LDA Algorithm if we define the space of topic distribution
as X = { x ∈ Rn+ : || x || 1 = 1 } , TM-LDA can be considered as a function f : X → X .
the prediction error
TM-LDA is modeled as a non-linear mapping:
![Page 26: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/26.jpg)
Error Function of TM-LDA:
![Page 27: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/27.jpg)
Iterative Minimization of the Error Function
![Page 28: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/28.jpg)
Direct Minimization of the Error Function
![Page 29: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/29.jpg)
![Page 30: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/30.jpg)
TM-LDA for Twitter Stream
![Page 31: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/31.jpg)
TM-LDA for Twitter Stream
let A = D (1 ;m ) and B = D (2 ;m +1)
![Page 32: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/32.jpg)
UPDATING TRANSITION PARAMETERS Updating Transition Parameters with
Sherman-Morrison-Woodbury Formula
![Page 33: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/33.jpg)
Updating Transition Parameters with QR-factorization
Suppose the QR-factorization of matrix A is A = QR , where Q′Q = I and R is an upper triangularmatrix. RT=Q’B
![Page 34: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/34.jpg)
EXPERIMENTS
Dataset
Using Perplexity as Evaluation Metric
![Page 35: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/35.jpg)
Predicting Future Tweets
TM-LDA first trains LDA on 7-day historical tweets and compute the transition parameter matrix accordingly. Then for each new tweet generated on the 8th day, it predicts the topic distribution of the following tweet.
![Page 36: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/36.jpg)
Estimated Topic Distributions of\Future" Tweets : the topic distribution of the tweet b.
LDA Topic Distributions of \Future" Tweets :the inferred topic distribution of the tweet b .
LDA Topic Distributions of\Previous" Tweets :the inferred topic distribution of the tweet a .
![Page 37: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/37.jpg)
Efficiency of Updating Transition Parameters
![Page 38: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/38.jpg)
Properties of Transition Parameters
T is a square matrix where the size of T is determined by the number of topics trained in LDA.
The row sum of T is always 1, which means that the overall weights emitted from atopicis 1.
![Page 39: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/39.jpg)
APPLYING TM-LDA FORTREND ANAL-YSIS AND SENSEMAKING
![Page 40: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/40.jpg)
![Page 41: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/41.jpg)
Changing Topic Transitions over Time
![Page 42: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/42.jpg)
Various Topic Transition Patterns by Cities
![Page 43: Finding bursty topics from microblogs](https://reader030.vdocuments.net/reader030/viewer/2022020217/54c25e524a795995398b45f8/html5/thumbnails/43.jpg)
CONCLUSIONS
a novel temporally-aware language model, TM-LDA, for efficiently modeling streams ofsocial text such as a Twitter stream for an author
an efficient model updating algorithm for TM-LDA