event detection via lda for the mediaeval2012 sed task
DESCRIPTION
TRANSCRIPT
Thursday, 4 October 2012
MediaEval2012 Social Event Detection Task
Konstantinos N. VavliakisFani A. Tzima
Pericles A. Mitkas
Event Detection via LDA for the MediaEval2012 SED TaskEvent Detection via LDA for the MediaEval2012 SED Task
Information Technologies InstitutesCentre for Research and Technology - Hellas
Electrical and Computer Engineering Department
Aristotle University of Thessaloniki
Intelligent Systems and Software Engineering Labgrouphttp://issel.ee.auth.gr
MediaEval2012 Social Event Detection Task
04/10/2012 2
Goal: Discover social events
3 Challenges:1.Find technical events in Germany2.Find all soccer events in Hamburg
(Germany) and Madrid (Spain) 3.Find demonstration and protest events of
the Indignados movement in Madrid
Social Event Detection at MediaEval 2012
MediaEval2012 Social Event Detection Task
04/10/2012 3
City Classifier
Clean Text
(remove stop
words/html tags)
Translate (using Google
Translate)
Stemming
(Porter stemmer)
City Classifier
(tf-idf for each city)
Identify Topics (per city, using LDA)
Select Relevant Topics
Identify Events
(by detecting peaks)
Merge Events
(of consecutive
days)
Split Events
(by location)
Manually Create Topics
Pre-processing
Topic Identification Event Detection
Event Optimization
Methodology
MediaEval2012 Social Event Detection Task
04/10/2012 4
Preprocessing
Clean text by removing html tags and stop words
Translate non-English words
Perform stemming using the Porter Stemmer
Title Cleaned Title English Title Stemmed i-wall wall wall wall2009...Pallasso trist // Sad Clown pallasso trist sad clown clown sad sad clown clown sad sad clownConjunt Monumental de Sant Pere de Terrassa
conjunt monumental sant pere terrassa
set monumental sant pere terrassa
set monument sant pere terrassa
Seagull in the port seagull port seagull port seagul portWinter doesn't affect the small land of the gnomes - 9/365
winter doesn affect small land gnomes
winter doesn affect small land gnomes
winter doesn affect small land gnome
Jan-09 january january januariTidy chaos - 3/365 tidy chaos tidy chaos tidi chao
E.g.:
MediaEval2012 Social Event Detection Task
04/10/2012 5
City Classification5 cities TF-IDF values of the terms for each city Classified photos according to maximum TF-IDF aggregated valueUsers: Users can not be in more than 2 cities in the same day User statisticsResults: 4149 non classified photos Very good results for city classification, excellent at country level
MediaEval2012 Social Event Detection Task
04/10/2012 6
Topic Identification
Extract Topicsusing LDA with Gibbs
Sampling
Select Relevant Topics
Manually Create Topics
Photos of a City
Examples of LDA topics:
Concept Participation in Topic
sol 0.1544spanish 0.1116revolution 0.1050acampada 0.0983puerta 0.0262mayo 0.0243manifestación 0.0217….
MediaEval2012 Social Event Detection Task
04/10/2012 7
Topic Selection
Extract Topicsusing LDA with Gibbs
Sampling
Select Relevant Topics
Manually Create Topics
Photos of a City
Each photos belongs to many topics Select photos containing “indignados” or
“acampa” and sum their values per topic E.g.:
PhotoID Topic Participation in Topic
5776147261 7 0.72
5776147261 14 0.125776147261 21 0.085776147261 6 0.025776147261 25 0.01….
Topic Sum18 456.5849 223.470 27.131 24.1722 23.39
….
MediaEval2012 Social Event Detection Task
04/10/2012 8
Event Detection & Optimization
Event Detection Find photos of selected topics Count photos per day If higher than a threshold add them to a
new event
Event Optimization Merge events happening in consecutive
days Split events by geolocation distance
MediaEval2012 Social Event Detection Task
04/10/2012 9
Selected/Total Topics:
2/50
Selected/Total Topics:
6/50
Selected/Total Topics:
8/50
ManualTopic
ManualTopic
0
10
20
30
40
50
60
70
80
90
100
80.98
40.52 35.85
76.29
63.35
31.1
26.2625.31
84.58
0.16
0.724
0.578
94.9
50.98
Precision Recall F-Measure NMI
Results - C1: Technical events in Germany
MediaEval2012 Social Event Detection Task
04/10/2012 10
Selected/Total Topics:
1/50
Selected/Total Topics:1/100
Selected/Total Topics:1/100
ManualTopic
ManualTopic
0
10
20
30
40
50
60
70
80
90
100
75.72
86.67 91.21 88.18 88.18
77.67 81.78 84
90.76
0.7680.85
0.847
93.49 93.49Precision Recall F-Measure NMI
Results – C2: Soccer Events in Hamburg/Madrid
MediaEval2012 Social Event Detection Task
04/10/2012 11
Selected/Total Topics:5/100
Selected/Total Topics:5/100
Selected/Total Topics:
3/50
ManualTopic
ManualTopic
0
10
20
30
40
50
60
70
80
90
100
88.53 90.76 86.59 88.91 88.9184.29 86.11
85.38 89.83
0.33
73.8
0.347
90.78 90.78Precision Recall F-Measure NMI
Results – C3: Protest Events of Indignados
MediaEval2012 Social Event Detection Task
04/10/2012 12
Conclusions
Effective and generalized methodology
The selection of topics is the key
Topics created by LDA close to manual topic’s
results
Really good precision
Stemming may improve (slightly) the results
Problems in “vague” topics
MediaEval2012 Social Event Detection Task
04/10/2012 13
Relevant and Future Work
Automatically detect all events from a dataset
using detected topics
Dynamic merging of topics
The concept of important event
is socially defined -> Personalized detection