hashtags as milestones in time

14
Hashtags as Milestones in Time Identifying the hashtags for meaningful events using Twitter search logs and Wikipedia data Stewart Whiting University of Glasgow Omar Alonso Microsoft/Bing Time Aware Information Access Workshop, SIGIR Oregon, 2012. (Work done while on internship at Microsoft)

Upload: edric

Post on 08-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Hashtags as Milestones in Time. Stewart Whiting University of Glasgow Omar Alonso Microsoft/Bing Time Aware Information Access Workshop, SIGIR Oregon, 2012 . (Work done while on internship at Microsoft). Identifying the hashtags for meaningful - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hashtags  as Milestones in Time

Hashtags as Milestones in TimeIdentifying the hashtags for meaningfulevents using Twitter search logs and Wikipedia data

Stewart Whiting University of GlasgowOmar Alonso Microsoft/Bing

Time Aware Information Access Workshop, SIGIR Oregon, 2012.(Work done while on internship at Microsoft)

Page 2: Hashtags  as Milestones in Time

Alright… Outline1. Hashtags as milestones in time2. Introduction

1. Why milestones2. Why hashtags? Can they useful as milestones?

3. Motivation4. Approach

1. Data preparation2. Approach steps

5. Constructing a timeline – examples6. Preliminary conclusions

Page 3: Hashtags  as Milestones in Time

Abstract: Hashtags as milestones in timeWhat we want to do:• Identify event-based hashtags, for timeline creation

– Currently using historic/past data• Filter out junk• Find most temporally significant hashtags

– Use multiple signals: Twitter search logs + related Wikipedia article popularity

• We are not doing topic detection/tracking!

Why?• A good way to express (anchor) a topic on a timeline…• Help users make sense of/navigate temporal information

#what?

Page 4: Hashtags  as Milestones in Time

Introduction• Hashtags used by authors to explicitly

denote the relevant topic(s) in message– “Great passing, great game #euro2012”

• Used by authors and searchers– Broadcast a consume a specific topic– Especially useful in short text retrieval where bag of

words/language modelling are challenging• Reflect mainstream events (or memes!) in real-time

– See trending topics right now• Timelines are very good for displaying events

– But you need to express the events as a meaningful marker, or milestone!

Page 5: Hashtags  as Milestones in Time

Introduction to the data• Two crowds of people

– Authors/searchers on Twitter– Editors/browsers on Wikipedia

• Correlation between signals from the two crowds– People search for what is happening– People edit Wikipedia with what is happening– Two very distinctive signals!

Page 6: Hashtags  as Milestones in Time

Twitter hashtag signals (in search logs)

• But plenty of memes too…– #20PeopleWhoIWantToMeet– #PresentingInTheBatCave– #whiteppldoitbutblackppldont

Page 7: Hashtags  as Milestones in Time

Wikipedia signals

• Whitney Houston• TV appearances• Her death in February

2012

• Events were reflected by discussion with hashtags in Twitter, e.g.– #ripwhitney– #bgtwhitney (BGT =

Britain’s got Talent)

Page 8: Hashtags  as Milestones in Time

Motivation• Both signals have large coverage

– Celebrities, news, weather, people, science, movies etc.• Two robust signals coming from large crowds

– Difficult to influence by individuals (spam?)– Not so reliant on single signal analysis (i.e. wavelets or burst

detection etc)• Discard memes by looking for associated Wikipedia

articles.• Meaningful milestones in timelines provide strong features

to navigate temporal content– Alonso et al. (2010), Matthews et al. (2010), From et al. (2003)

Page 9: Hashtags  as Milestones in Time

Data Preparation – Hashtag Data• Extracted from Bing Social and IE8 query logs• Provides hashtag use, aggregated per day• (Proprietary, but could be extracted from other sources)

• Hashtags are mostly a mix of unigrams and bigrams!• We also want the words in the hashtag• Need to use a word breaker…

– We used Microsoft Web N-Gram Services– Breaks #crosstownshootout into ‘cross town shoutout’ and

#basketballwivesla into ‘basketball wives la’

Page 10: Hashtags  as Milestones in Time

Data Preparation – Wikipedia Data• Created a Lucene index using the Wikipedia Extraction

(WEX) data.

• Wikipedia article viewing popularity statistics– Dump available for each hour since Dec 2007– Published near real-time, for the past hour (on the hour)– Huge number of data points!– So we sampled 8am/8pm each day– Transformed into a daily aggregated time-series (therefore

comparable with hashtag signals)– Smoothed with exponential smoothing (alpha = 0.2)– Over 2 billion data points!

Page 11: Hashtags  as Milestones in Time

Approach Outline

1. For each hashtags from the logs, use word breaker service to extract hashtag terms.

2. Use separated terms to query Wikipedia index – maps each hashtag to a set of possibly associated articles.

3. For each article/hashtag, prepare a same-length comparable time-series of popularity1. Frequency of hashtag over time2. Popularity of article over time

• Pearson correlation co-efficient computed.– Measures association between temporality of the hashtag

occurrence and the Wikipedia article popularity.

Page 12: Hashtags  as Milestones in Time

Example Correlations

Page 13: Hashtags  as Milestones in Time

Constructing a Timeline

Page 14: Hashtags  as Milestones in Time

Conclusions• Early work, but correlating the signals does yield high-

profile temporal events– Hashtag can therefore be used to anchor events on a timeline

• Occasional spurious correlation (need better hashtag frequency data to improve this)– Correlation does not imply causation!

• Future work…– Automatic construction of timelines– Improving correlation quality – examine time windows– Designing an evaluation framework to assess overall timeline

quality