certh @ mediaeval 2012 social event detection task

CERTH @ MediaEval 2012 Social Event Detection Task

Symeon Papadopoulos, Georgios Petkos, Manos Schinas, Yiannis Kompatsiaris

Pisa, 4-5 October 2011

2

The problem• Identify social events in tagged photos collections:

– Challenge 1: Indignados protest @ Madrid– Challenge 2: Soccer matches @ Madrid, Hamburg– Challenge3: Technical Events @ Germany

• Alternative formulation:– Represent a collection of photos as a graph, where items

with high probability to belong to the same event are connected.

– Each event forms a dense sub-graph in it.– Points to community detection as method to address the

problem.

3

Approach

Step 1

Step 2

Step 3

4

Graph Creation (1)

• Graph creation is based on the use of “Same Class” model– A classifier which predicts whether two images

belong to the same event or not– Support Vector Machine classifier trained with the

data of the 2011 challenge– Input features: dissimilarities across user, title,

tags, description, time taken, GIST, SURF/VLAD

5

Graph Creation (2)

• Use the same class model to connect the items of the collection that belong to the same event

• Retrieve candidate neighbours (~350) to reduce computational cost– 50 with respect to textual features– 150 with respect to time– 50 with respect to location (when it exists)– 100 with respect to visual features

6

Event Partitioning and Expansion (1)

• Event partitioning – The nodes of the graph are clustered into

candidate events by using the Structural Clustering Algorithm for Networks (SCAN).

– The items clustered together by SCAN are used to obtain an aggregate representation of each candidate social event.

– Split the candidate events that exceed a predefined time range into shorter events.

7

Event Partitioning and Expansion (2)

• Expansion of the candidate events set– Each image that does not belong to any event

forms a single-item event. – Merge these single-item events into larger clusters

by checking location and time.– Add the new events in the set of the candidate

events

8

Event Filtering (1)

• Filter in two ways:– By using geo-location (if exists)– By using tag-based models

• Geo-location Filtering– Discard events that don’t contained into the

bounding box of the specific challenge– 30% of candidate events are discarded

9

Event Filtering (2)• Tag-based filtering

– Build term models by finding the 500 dominant terms for the specific locations and event types.

– we collect images from Flickr that are relevant to the location or the type of event of interest.

– Images for Madrid, Hamburg and Germany– Images for indignados, soccer and technical

events

10

Event Filtering (3)• Tag-based filtering

– Probability of appearance

– We compute the ratio of the probability of appearance in the focus set over the probability of appearance in the reference set.

– Keep the 500 terms with the highest ratio– Jaccard similarity between a tag model and events

terms

11

Evaluation

NotationRun 1: Same class model trained with 10000 pairs of images. Run 2: Same class model trained with 30000 pairs of images. Run 3: Same class model of run 1 with post processing step

12

Discussion (1)• Moving from a smaller (run 1) to a larger (run

2) training dataset does not seem to improve most of the performance over fitting

• Method fails in challenge 1 because these events are different from these of the training dataset

• A good tag model has to be used for classification in post-filtering step

13

Discussion (2)• Future actions:

– train the same class model with a richer set of data

– explore different graph construction strategies and community detection algorithms.

• Ways to improve:– better topic classification methods– more sophisticated methods for location

estimation

14

Questions

certh @ mediaeval 2012 social event detection task

Technology

event filtering

event partitioning

candidate social event

events terms

event types

type of event

technical events

shorter events