using wikipedia concurrent edit spikes with social network plausibility checks for breaking news...
TRANSCRIPT
![Page 1: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/1.jpg)
MJ no more:Using Wikipedia Concurrent Edit SpikesWith Social Network Plausibility ChecksFor Breaking News DetectionThomas Steiner ([email protected], @tomayac)Seth van Hooland ([email protected], @sethvanhooland)Ed Summers ([email protected], @edsu)
![Page 2: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/2.jpg)
News more and more don't break on the newswire
![Page 3: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/3.jpg)
First Story Detection on Realtime Social Networks
Typically based on Twitter because of their Streaming API [Twitter2012].
Try to detect spikes in time, locality, text (oftentimes restricted domain, e.g., earthquake prediction).
A typical representative for this kind of approach is, e.g., [Petrović2010].
High recallLow precision
[Twitter2012] https://dev.twitter.com/docs/streaming-apis/streams/public
[Petrović2010] Saša Petrović, Miles Osborne, and Victor Lavrenko. 2010. Streaming first story detection with application to Twitter. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 181–189.
![Page 4: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/4.jpg)
Curation based on Wikipedia
Wikipedia page view logs are publicly available [Wikipedia2012]. Updated on an hourly basis.
Osbourne et al. have successfully shown that there is a relation between Wikipedia page views and news events [Osbourne2012].
Improves the approach of [Petrović2010] by using Wikipedia logs.
Key findings:Wikipedia lags about 2h behind the news.Newly created pages add noise.
[Wikipedia2012] http://dumps.wikimedia.org/other/pagecounts-raw/
[Osbourne2012] M. Osborne, S. Petrovic, R. McCreadie, C. Macdonald, I. Ounis. 2012. Bieber no more: First Story Detection using Twitter and Wikipedia. In SIGIR 2012 Workshop on Time-aware Information Access (#TAIA2012), Portland, Oregon, USA
![Page 5: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/5.jpg)
Key idea: inverse the process
Use Wikipedia live IRC stream of recent changes [WikipediaIRC2012], then do a sanity check on social networks.
[WikipediaIRC2012] http://meta.wikimedia.org/wiki/IRC/Channels#Raw_feeds
![Page 6: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/6.jpg)
Introducing Wikipedia Live Monitor
Hooks into the Wikipedia recent changes IRC channels for all Wikipedia locales.
Channel names follow the pattern#language.project, e.g., #de.wikipedia
When an article gets edited, retrieve all language versions and treat them as a cluster.
E.g., en:Albert_Einstein is in the same cluster as de:Albert_Einstein.
![Page 7: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/7.jpg)
1) ≥ 5 Occurrences An article cluster must have at least n edits before it is considered a breaking news candidate.
2) ≤60 Seconds Between Edits An article cluster may have at max n seconds in between edits in order to be regarded a breaking news candidate.
3) ≥2 Concurrent EditorsAn article cluster must be edited by at least n concurrent editors before it is considered a breaking news candidate.
4) ≤240 Seconds Since Last Edit An article cluster is thrown out of the monitoring loop if its last edit is longer ago than n seconds.
Breaking News Conditions
![Page 8: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/8.jpg)
Koninginnedag (http://twitpic.com/cn1vgf/full)
Evaluation—Does it work at all?
![Page 9: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/9.jpg)
Champions League Semi Final BVB vs. RMD with Lewandowski (http://twitpic.com/clo0s0)
Evaluation—Does it work at all?
![Page 10: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/10.jpg)
Boston Bombings (https://twitter.com/jason_koebler/statuses/323892465545388033,http://www.usnews.com/news/articles/2013/04/15/is-wikipedia-better-for-breaking-news-than-twitter)
Evaluation—Does it work at all?
![Page 11: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/11.jpg)
Lag time for global events: <5 min
Resignation of Pope Benedict XVI (http://en.wikipedia.org/wiki/Resignation_of_Pope_Benedict_XVI)
Three first edit times (UTC) after news broke on Feb 11, 2013● English Wikipedia article: 10:58, 10:59, 11:02● French Wikipedia article: 11:00, 11:00, 11:01
Implies that by looking at only two language versions (the actual number of monitored versions is 42) of the Pope article, the system would have reported the news at 11:01
Twitter account of Reuters announced the news at 10:59
Vatican Radio’s announcement was made at 10:57:47
Evaluation—How well does it work?
![Page 12: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/12.jpg)
Work with realtime page view logs in addition to page edit logs(API format currently being defined by Wikimedia)
News categorization and classificationE.g., Category Living-Persons removed from person implies (sad) news
Improve false-positive rate, make connection with social networks and actual article edits stronger
Auto notification system upon breaking news candidatesPre-announcement: follow @WikiLiveMon
Future Work
![Page 13: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection](https://reader035.vdocuments.net/reader035/viewer/2022081403/55521541b4c90520548b48c9/html5/thumbnails/13.jpg)
Play with the system athttp://wikipedia-irc.herokuapp.com/
Read the paper at http://arxiv.org/abs/1303.4702
Ask questions here or [email protected] & @tomayac
Demo and thank you