vladimir alexiev | semantic enrichment of twitter microposts helps understand post-brexit reactions
TRANSCRIPT
Semantic Enrichment of Twitter Microposts Helps Understand Post-Brexit Reactions
Dr. Laura TolosiPresented by: Dr. Vladimir Alexiev
University of Sheffield, UK Universitaet des Saarlandes, Germany
MODUL University Vienna GMBH, Austria Ontotext AD, Bulgaria
ATOS Spain SA, Spain King's College London, UK
The University of Warwick, UK SwissInfo.ch, Switzerland
ihub Ltd, Kenya
About Pheme
● A pheme is a meme enhanced with truthfulness information
– … also the Greek goddess of fame and rumors
● Research project funded by the FP7-ICT Programme, now in its final year
● Concerned with veracity analysis in Social Media:
– Aspects of veracity: rumor, misinformation, disinformation, true/false information, support / deny attitude
● Consortium:
Semantic annotations for the journalism dashboard
● Streaming from Twitter on selected topics and adding semantic annotations in real time (>50 tweets/s)
● Metadata:
– Whatever comes from the Twitter API
– Entity tagging: person, location, organization, event, etc
– LOD: DBpedia + Geonames
– Veracity (via ML inference):● Rumor probability ● Controversiality: support / deny / question annotation for each tweet
– Clustering of tweets into stories
● The journalism dashboard usecase:
– Provides journalists with a real-time Twitter monitoring platform
– An intuitive interface for viewing and filtering the above metadata
Semantic annotations flow in Pheme
● Tweet-processing flow for three languages: EN, DE and BG
– Diagram includes only Ontotext's components: multilingual rumor classification and concept tagging
Semantic database
● Pheme Ontology diagram
● GraphDB SPARQL endpoint :
Post-Brexit analysis
● Between June 6th and August 25th we annotated more than 800,000 tweets about Brexit
● Advantages of using LOD annotations in Twitter analysis demonstrated via:
– Showcase 1: how are the large administrative UK regions mentioned in Twitter, after Brexit decision?
– Showcase 2: who are the people most mentioned in Twitter, after Brexit decision?
Showcase 1: Regions analysis
● SPARQL query for finding mentions of UK administrative regions: 19,674 records
Lines 8-9: date and text of tweet
Lines 10-13: tweet mentions location
Line 17: country via Geonames is UK
Line 18: location is an administrative unit of
rank 1 in Geonames
Line 19: data channel is Brexit
Showcase 1: Regions analysis
● Regions by number of mentions in our dataset:
– The opposing vote of Scotland attracts much discussion on Twitter
Showcase 1: Regions analysis
● Alternative ways of mentioning Northern Ireland in tweets
Note the not so widespread references: Norn Iron, Six Counties, The Occupied Six Counties
Showcase 1: Regions analysis
● Alternative ways of mentioning Northern Ireland in tweets
Note the not so widespread references: Norn Iron, Six Counties, The Occupied Six Counties
Source: https://en.wikipedia.org/wiki/Alternative_names_for_Northern_Ireland
Showcase 1: Regions analysis
● NLP analysis of tweets and regions: key-terms most distinctly used between Scotland and England
– Significance of association: term presence and mention Scotland / England by Fisher's test p-value
– Eg: #indyref2 is significantly more often mentioned in tweets about Scotland than England;
● it is a hashtag about a referendum for Scotland's independence.
Showcase 1: Regions analysis
● NLP analysis of tweets and regions: key-terms most distinctly used between Scotland and England
– Significance of association: term presence and mention Scotland / England by Fisher's test p-value
– Eg: #indyref2 is significantly more often mentioned in tweets about Scotland than England;
● it is a hashtag about a referendum for Scotland's independence.
Showcase 2: People analysis
● SPARQL query for retrieving mentions of known people: 53,355 records
Lines 12-13: tweet mentions Person
Line 10: Person is known from DBpedia,
not a ML inference
Lines 14-16: retrieve date of birth from
DBpedia
Line 17: data channel is Brexit
Showcase 2: People analysis
● Year-of-birth distribution of people mentioned in post-Brexit tweets
– Note the long tail on the left: historical figures?
– Peak around 1950: people aged ~65 now
– The peak at the older age of 65 relates to the frequent discussions about the different voting preferences of older and younger people
Showcase 2: People analysis
● Historical figures:
– Sir Winston Churchill, Henry VIII, Adam Smith, Adolf Hitler, Sir Arthur Harris,Ralph Vaughan Williams, George Santayana,Richard III, Aldous Huxley, Isaac Newton
– Tweets reveal insightful analogies with the past:
● Contemporary figures, most mentioned. Stephen Hawking stands out from the crowd of politicians
– Theresa May (11,452), Boris Johnson (3,961), Nigel Farage (2,926), David Cameron (2,868), Andrea Leadsom (1,388), Angela Merkel (922), Jeremy Corbyn (820), Nicola Sturgeon (780), Stephen Hawking (746).
● Support/ deny/ question -tweets mentioning these personalities:
– Surprisingly many question-like tweets mentioning Angela Merkel
– Tweets mentioning Stephen Hawking have an attitude of deny
Showcase 2: People analysis
"Our attitude towards wealth played a crucial role in Brexit. We need a rethink" - Stephen Hawking https://t.co/IA0tr0l8Jm #Brexit #UK
Showcase 2: People analysis
● Mentions of young people are quite few comparably:
– 183 people born after 1975 were mentioned
– At a quick glance, most of them are sportsmen (mostly football players) or actors, not activists for Brexit
– Most mentioned: ● Ruth Davidson (leader of the Scottish Conservative and Unionist Party), ● Will Straw (British policy researcher and Labour Party politician),● Paul Nuttall (Deputy Leader of the UK Independence Party),● Max Schrems (Austrian lawyer, author and privacy activist), ● Tim Stanley (English blogger, journalist and historian),● Tulip Siddiq (British Labour Party and Co-operative Party politician) ● Julia Reda (German politician and activist), ● Tom Cotton (American politician who is the junior United States Senator
from Arkansas) ,● Chuka Umunna (British Labour politician), ● Laura Kuenssberg (British journalist, currently the political editor of BBC
News), etc.
Showcase 2: People analysis
● Mentions of young people are quite few comparably:
– 183 people born after 1975 were mentioned
– At a quick glance, most of them are sportsmen (mostly football players) or actors, not activists for Brexit
– Most mentioned: ● Ruth Davidson (leader of the Scottish Conservative and Unionist Party), ● Will Straw (British policy researcher and Labour Party politician),● Paul Nuttall (Deputy Leader of the UK Independence Party),● Max Schrems (Austrian lawyer, author and privacy activist), ● Tim Stanley (English blogger, journalist and historian),● Tulip Siddiq (British Labour Party and Co-operative Party politician) ● Julia Reda (German politician and activist), ● Tom Cotton (American politician who is the junior United States Senator
from Arkansas) ,● Chuka Umunna (British Labour politician), ● Laura Kuenssberg (British journalist, currently the political editor of BBC
News), etc.
Confirms the previous result, that Scotland is the hottest topic.
Independent analyses of regions and people support and reinforce each other via LOD enrichment
Conclusions
● For microposts, the enrichment with LOD is extremely helpful
– It provides the context that is necessary for understanding opinion / trend, etc.
– It makes the computer “read like a human”, by recalling and relating to external common knowledge
● Reasoning about political regions, age of people mentioned, their functions, is possible only with semantic enrichment
– We can only imagine that historians and journalists would greatly benefit from the collection of quotes and analogies with the past that we discovered
Awards
● For its contribution to Pheme, Ontotext is nominated for the Innovation Radar Prize 2016 (among 40 of the best EU-funded innovators)
Ontotext's analysis of Twitter before the Brexit vote
● Ontotext's Twitter analysis before the polls speculated that, at least based on Social Media trends, the Brits want out
References
Read the complete post-Brexit vote analysis here: Deliverable D4.1.2. LOD-based reasoning about rumors