an open source gate toolkit for social media analysis · an open source gate toolkit for social...

56
An open source GATE toolkit for social media analysis Diana Maynard University of Sheffield, UK

Upload: lemien

Post on 10-May-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

AnopensourceGATEtoolkitforsocialmediaanalysis

DianaMaynardUniversityofSheffield,UK

GATEfortextengineering• ToolfordevelopinganddeploymentofTextMiningtechnology

• http://gate.ac.uk

• 20yearsold(butnotpastityet!)

• establishedlargedevelopercommunity,incl.industrialcommitters(Ontotext,Intellius,TextMiningSolutions)

• Usedworldwidebymanyorganisations tobuildbespokesolutions,e.g.TNA,PressAssociation,BBC,NHS

• Afreeopensourceframework(LGPL)andGUI

• IncludesInformationExtractiontoolsinmanylanguages• Researchteamof15people• Lotsoftoolsalsoforsocialmediaanalysis

GATEopen-sourcesocialmediatools

Name/Description Description/Domain License

GATE Numeroustoolsfortextanalytics,includingsocialmedia

LGPL+pluginspecific

TwitIE NERfortweets LGPL

GATECloud Variouscloud-basedGATEservices,includingsocialmediaanalytics

SaaSmodel

Sentimentanalysis Generictwitter-basedsentimentanalysis LGPL

TweetCollector Socialmediaanalytics SaaSmodel

YODIE(indevelopment) Cloud-basednamedentitydisambiguationandlinking;versionforsocialmedia

SaaSmodel

GATEAnnotation

NERinFrench

NERinArabic

Environmentaltermrecognition

Hashtaganalysis

● Weneedtochopupthehashtagsintoindividualwords,andfindanytermswithinthem●#palmoilhumanrights --> palm oil + human rights

GATECloudTools

• YoucantryvariousdemosofourtoolsontheGATECloud

• http://cloud.gate.ac.uk• Youcanprocesssmallamountsoftextforfree• Forlargervolumes,youcansetupanaccountandpayforthetimeyouuse

CollectyourownTweets!

• Twitter providesAPIstodownloadtweetsinrealtimebasedonvariousfilteringconditions

• Theycanbechallengingtosetupifyou'renotaprogrammer

• WehavetoolsontheGATECloudtoletyousetupaquerytocollecttweetsbyparticularpeople,matchingcertainkeywords,geo-taggedwithparticularlocations,andsoon.

• Youdon’tneedtoinstallanysoftware!• Moredetails availableonGATECloudpages

Exploringthecollecteddata• Youcanalsoget6reportsonyourdatacollectedsofar,updatedinrealtime.

1. Viewthetophashtagsasabarchartortagcloud,andaddhashtagstothetrackinglistinthecollector.

2. Viewalinegraphoftweetfrequencyperhour.3. Viewabarchartorcloudofthetoptopicsaccordingtolistsof

terms.Thelistscanbeeditedfromalinkonthetopicsgraphpage.Topicscanbeaddedastermstothetracker.

4. Viewabarchartorcloudofthemostmentionedusers.Thesecanalsobeaddedforthecollectortotrack.

5. Viewabarchartorcloudofthefrequencyofthetermsyouaretracking.

6. Viewabarchartorcloudofthetopwordsofonegrammaticaltype(adjective,verb,nounetc.)(onlyforEnglish)

Demo

Analysing Polarised SocietalDebates

• Public,polarised debatesaffectpolicymaking,thecourseofacountry,votingintents

• Historically,thepublicdebatestookplaceonTVandinvolvedonlypoliticiansormediafigures

Motivation

• Today,polarised debatesalsotakeplacepubliclyontheinternet,andanyonecanparticipate

• Analysing thesepublicstatementstellsus:• Whatparticipantsarethinking• Whatargumentstheyaremaking• Whoisthinkingwhat• Gaina“bigpicture”viewoftheresponsetoeventsorpolicies.

TheStory

• Analyse socialmediasurroundingpolarised societaldebates

• Indextheresultstounderstand,visualise andexplore:

• Whattopicsarebeingdiscussed?• Whatsentimentsarebeingexpressed?• Whoisparticipatinginthedebate?• Howdodebatesevolveovertime?

Real-timeOpinionMonitoring

16

vs replies

RepliestoTrumpre:climatechange

17

Climatechange,ISISandTrump

18

Topicrelations

19

DynamicsOverTime/Location

20

• EuropeanResearchInfrastructureforBigDataandSocialMining• 7Europeancountries,morethan100researchers• Collectionsofdatasets,creationoftools• Trans-nationalaccessprogramme (comeandvisitus!)• Researchorganised inverticalthemes(exploratories)ontopofSBDinfrastructure

• Performcross-disciplinarycutting-edgesocialmediaminingresearch

• CityofCitizens• Well-being&Economy• SocietalDebates• MigrationStudies

www.sobigdata.eu@sobigdata

Photocredit:https://www.flickr.com/photos/armydre2008

Brexitanalyser

Tweets

Tokenization

Normalization POS tagging

NErecognition

Leave/remainclassification

TopicDetection

TweetGeolocation

Userclassification

SemanticSearch

TweetsaretrackedinrealtimeusingthestreamingAPI

TweetCollection

Tweets

Tokenization

Normalization POS tagging

NErecognition

Leave/remainclassification

TopicDetection

TweetGeolocation

Userclassification

SemanticSearch

Individualtokens(analogoustoterms)areextracted

Tokenisation

Tweets

Tokenization

Normalization POS tagging

NErecognition

Leave/remainclassification

TopicDetection

TweetGeolocation

Userclassification

SemanticSearch

Spellingandabbreviationsarenormalised tohelplinguisticprocessingtools

Normalization

Tweets

Tokenization

Normalization POS tagging

NErecognition

Leave/remainclassification

TopicDetection

TweetGeolocation

Userclassification

SemanticSearch

Parts-of-speechareidentified.Thisisnecessarytosupportlaterprocessing

POStagging

Tweets

Tokenization

Normalization POS tagging

NErecognition

Leave/remainclassification

TopicDetection

TweetGeolocation

Userclassification

SemanticSearch

Wediscovermentionsofentitiessuchaspeople,locations,organisations andproducts

NamedEntities

Tweets

Tokenization

Normalization POS tagging

NErecognition

Leave/remainclassification

TopicDetection

TweetGeolocation

Userclassification

SemanticSearch

Intenttovoteleaveorremainclassifiedby#hashtag

VotingIntent

Tweets

Tokenization

Normalization POS tagging

NErecognition

Leave/remainclassification

TopicDetection

TweetGeolocation

Userclassification

SemanticSearch

Tweetsarematchedagainstadetailedlistoftopics

TopicDetection

Tweets

Tokenization

Normalization POS tagging

NErecognition

Leave/remainclassification

TopicDetection

TweetGeolocation

Userclassification

SemanticSearch

TweetsarelinkedtoNUTSregionsbasedonplacetagsanduserhomelocations

Geolocation

Tweets

Tokenization

Normalization POS tagging

NErecognition

Leave/remainclassification

TopicDetection

TweetGeolocation

Userclassification

SemanticSearch

Weclassifyusersintoegjournalist,charity,memberofpublic

UserClassification

Tweets

Tokenization

Normalization POS tagging

NErecognition

Leave/remainclassification

TopicDetection

TweetGeolocation

Userclassification

SemanticSearch

TweettextandannotationsareindexedinsemanticsearchengineMímirforsearchand

visualisation

SemanticSearch

SemanticsearchwithMímir

• Mímir:Multiparadigm IndexingandRetrieval

• Complexqueries– cansearchoverannotationslike

{DocumentTimestamp hour_timestamp >= 2016062223 hour_timestamp < 2016062323} OVER ( {DocumentKind tweet_kind = original} AND {NUTS2NUTS2 = "UKE3"})

• Canalsomixinfulltextandsemanticqueries.Verypowerful!

• WhichpoliticiansfromtheNorthofEnglandovertheageof40talkedmostpositivelyaboutclimatechange?

• Allowsustodrilldownandeasilyseetweetswithcertainproperties

Votingintentionpertopic

VotingIntentionovertime

Ego-networkanalysis

• AnalysisusingpreviousworkonselectedBrexitdata• ValerioArnaboldi• InstituteofInformaticsandTelematics(IIT)• ItalianNationalResearchCouncil(CNR)

• Understandingthequantityandqualityofrelationshipsofdebateparticipants

• Remain,leaveandneutraluserswereselectedbySheffieldusingtheirpipelineandprocessedbyCNR-IIT

• Niceexampleofbringingseveralsystemstogether

Remainers aremoresocial

• Theymaintainlargeregonetworks

• Largeractivenetworksize• numberofpeopleactivelycontactedbytheegos

• withafrequencyofatleastonedirecttweetperyear

• consideringmentions,replies,andretweets

• Effectremainsevenafterfilteringoutantisocialaccounts

Activenetworksizeinusers

Leavershadfewersocialcircles

• Socialcircle:• Groupofusersinanego-network

• Withsimilarlevelsofinteractiontooneanother

• Mostfrequentinteractionswithasmallersocialcircle

• Moreoccasionalinteractionswithalargercircle

Averagenumberofsocialcircles

Many“leave”accountsdidn’tsocialise atall

• Alotofaccountswerenotusedsocially

• Thisisreflectedinthenumberofaccountswithsocialcircles

• Theleavetwitterers werewellbelowtherandomsample. %ofaccountswithatleast1socialcircle

BrexitDemo

• http://demos.gate.ac.uk/sobigdata/brexit/• Morecoolvisualisations (workinprogress)

GATEsocialmediaanalysistoolkit:analysingtheUKelections

MIMIRquery

• Dataset:everytweetbyMP/Candidate/Party,plusallreplies/retweets

• FindalltweetswhereaConservativeMPtalkedabouttheeconomy

Parties/themesco-occurrence

UK economyEurope

Tax and revenueNHS

Borders and ImmigrationScotland

EmploymentCommunity and society

Public healthMedia and communications

Labo

ur P

arty

Can

dida

teLa

bour

Par

ty M

PCo

nser

vativ

e Pa

rty

Cand

idat

eCo

nser

vativ

e Pa

rty

MP

UKIP

Can

dida

teOt

her M

PSN

P Ca

ndid

ate

Gree

n Pa

rty

Cand

idat

eLi

bera

l Dem

ocra

ts C

andi

date

SNP

Othe

r

SemanticqueryingviaDBpedia

• Queryingforinformationnotexplicitinthetext

• “FindalltopicsintweetsbytheMPfromSheffieldHallam”

• Wedon’tneedtoknowwhothisMPis

• IftheSheffieldHallamMPchangedatsomepoint,thiswouldbecaptured

Analysis of the EarthHour campaign

l Analysis of hashtags and topics mentioned

l The main activities and themes of the campaign drove most of the social media conversations

l Users engaged in the campaign but did not necessarily engage with climate change and sustainability issues.

l Lack of correlation between Durex campaign and climate change engagement

BehaviourAnalysis• Basedontheassumptionthatusersindifferentbehaviouralstagescommunicatedifferently(differentemotions,directives,etc.)

Pajarito@lindopajarito .2h

Ourbuildingneeds40%ofallenergyconsumedinSwitzerland!L

DJPajarito@DJPajaritoGenial .12hI'msoproudwhenIremembertosaveenergyandIknowhoweversmallit'shelping.

Desirability:Negativesentiment(expressingpersonalfrustration-anger/sadness)

Buzz:Positivesentiment(happiness/joy).I/we+presenttense

HotelPajarito@HotelPajarito .18hJoinustodaytodaytoswitchofalightforEH!J

Invitation:Positivesentiment(happy)+useofvocativesYou are an organisation, not an individual!

What’snext?

• AnumberoftoolshavealreadybeenbroughttogetheraroundBrexitandotherdebates

• Moretoolsstillbeingdeveloped• ToolsavailablethroughtheSoBigData VREandGATECloud

• Focusonanalysisratherthanpredictingresults• 2017UKelections:imminentresultscomingthisweekonpartymanifestos!

• Othercurrentworkanalysing hatespeecharoundBrexit,andcorrelatingsocialmediaandnewsmediadiscussions

WhatareLabour talkingabout?

1. Economy2. Business3. Health4. Employment5. Schools

WhatareLibDemstalkingabout?

1. Economy2. Health3. Schools4. Business5. Europe

WhatarePlaidCymru talkingabout?

1. Wales2. UKeconomy3. Business4. Environment5. Health

\whatareConservativestalkingabout?

1. Economy2. Health3. Business4. Schools5. Europe

Linksformoreinfo

• GATEathttp://gate.ac.uk• GateCloud athttps://cloud.gate.ac.uk• ComeonaGATEtrainingcourseinJune!• BrexitVisualisations athttp://demos.gate.ac.uk/sobigdata/brexit/

• BrexitstudyblogpostfromNESTAathttp://www.nesta.org.uk/blog/network-analysis-top-eu-referendum-tweeters

• BrexitstudyblogpostsfromSheffieldathttp://gate4ugc.blogspot.co.uk/search/label/Brexit

• UKelectionsmonitorathttp://gate.ac.uk/projects/pft

Publications• DianaMaynard,IanRoberts,MarkA.Greenwood,DominicRoutandKalinaBontcheva.AFrameworkforReal-timeSemanticSocialMediaAnalysis.WebSemantics:Science,ServicesandAgentsontheWorldWideWeb,2017.

• K.Bontcheva,L.Derczynski,A.Funk,M.A.Greenwood,D.Maynard,N.Aswani.TwitIE:AnOpen-SourceInformationExtractionPipelineforMicroblogText.ProceedingsoftheInternationalConferenceonRecentAdvancesinNaturalLanguageProcessing(RANLP2013).

• D.MaynardandK.Bontcheva.Understandingclimatechangetweets:anopensourcetoolkitforsocialmediaanalysis.InProc.ofEnviroInfo 2015,Copenhagen,Sep.2015

• DianaMaynard,Kalina Bontcheva,IsabelleAugenstein.NaturalLanguageProcessingfortheSemanticWeb.MorganandClaypool,December2016.ISBN:9781627059091(containsachapteronsocialmediaanalysis)

• Available(andmore)athttps://gate.ac.uk/gate/doc/papers.html

Acknowledgements

Thisworksupportedby:• the European Union/EU under the Information and Communication

Technologies (ICT) theme of the 7th Framework and H2020 Programmes for R&D

• KNOWMAK (726992)https://gate.ac.uk/projects/knowmak/• DecarboNet (610829) http://www.decarbonet.eu● Pheme (611233) http://www.pheme.eu● SoBigData (654024) http://www.sobigdata.eu● COMRADES (687847) http://www.comrades-project.eu● OpenMinted (654021)http://openminted.eu● Kconnect (644753) http://www.kconnect.eu/

● Nesta http://nesta.org.uk