profiling personality of social media...

41
Profiling Personality of Social Media Authors Walter Daelemans CLiPS Computational Linguistics Group [email protected] SLSP 2016, Pilsen October 11, 2016

Upload: others

Post on 21-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

ProfilingPersonalityofSocialMediaAuthors

WalterDaelemansCLiPS ComputationalLinguisticsGroup

[email protected],PilsenOctober11,2016

Page 2: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

www.analyzewords.com (Pennebaker)Seealso:https://personality-insights-livedemo.mybluemix.net/ (Watson)

Page 3: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Figure6.Words,phrases,andtopicsmostdistinguishingextraversionfromintroversionandneuroticismfromemotionalstability.

SchwartzHA,EichstaedtJC,KernML,DziurzynskiL,RamonesSM,etal.(2013)Personality,Gender,andAgeintheLanguageof SocialMedia:TheOpen-VocabularyApproach.PLoSONE8(9):e73791.doi:10.1371/journal.pone.0073791http://journals.plos.org/plosone/article?id=info:doi/10.1371/journal.pone.0073791

Page 4: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Whatwedon'twanttodo

• Horoscopes– E.g.theprofilingofPennebaker /IBMpersonality

• Weneedgoldstandarddata

• Content-basedassignment– E.g.emotionallystableAmericanpeopletalkmoreabout“sports”,“vacation”,“beach”,“church”,or“team”

• It’snotstyleandcanbefakedeasily

Page 5: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Contents

• Profiling andComputationalStylometry– MethodsandIssues– SocialMediaLanguage

• PersonalityProfilinginSocialMedia– Applications– StateoftheArt

• PAN• CGIandTwistycorpora

Page 6: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

LanguageVariationIndividual Variation

Bayes rule (1763)

WhatisProfiling?

Spelling,punctuation,lexical choice,sentence structure,themes,tone,discoursestructure,…

Sociolinguistics

Profiling

P 𝑖 𝑙 = % 𝑙 𝑖 %(')%())

Page 7: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

SocialMedialanguage

• Everybody “writes”allthetimenow,notonlyprofessionalwriters

• Hugeincreaseinsubjectivelanguage(opinions)asopposedtoobjective,factuallanguage

• Opensharingofprivateinformation(metadata)– Magnificentsourceofdataforcomputationalstylometry!

Page 8: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

ComputationalSociolinguistics!

Non-standardlanguageuseinchat

Page 9: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

ComputationalStylometry

• Writingstyle:Acombination ofinvariantandunconsciousdecisionsinlanguageproductionatalllinguisticlevels(discourse,syntacticstructures,lexicalchoice,…)associated withspecificauthorsorauthortraits:– Age,gender,educationlevel,nativelanguage,personality,emotionalstate,mentalhealth,region,deception,ideology,politicalconviction,religiousbeliefs,sexualpreferences,…

Page 10: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Basicresearchquestions• Doesanidiolect/stylome existand(how)canit

bemeasured?• Canwehandlegenre,register&topic

interference?• Canwehandlewithin-profileinterference?• Isdetectionrobust?(adversarialstylometry)• Canwehandledynamicaspects?(language

changeovertimewithage,illness,languageinput,context,…)

• Canweexplainwhy/howtrainedmodelswork?

Page 11: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Method:TextCategorization

Document

SyntacticFunctionwordpatternsPartofspeechn-gramsSyntacticstructures

Lexicallemman-gramsreadabilityfeatureslexicalrichness

Semanticwordsensepatternssemanticdictionariesdistributedvectorswordembeddingssemanticrolepatterns

Discourserhetoricalstructuresdiscoursemarkerpatterns

FeatureconstructionDocumentRepresentation

Superficialtokenn-gramscharactern-gramspunctuationpatterns

Inputfromlinguistictheory

Page 12: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Method:TextCategorization

DocumentRepresentation

SyntacticFunctionwordpatternsPartofspeechn-gramsSyntacticstructures

Lexicallemman-gramsreadabilityfeatureslexicalrichness

Semanticwordsensepatternssemanticdictionariesdistributedvectorswordembeddingssemanticrolepatterns

Discourserhetoricalstructuresdiscoursemarkerpatterns

FeatureSelection

FeatureVector

Superficialtokenn-gramscharactern-gramspunctuationpatterns

Feedbacktolinguistictheory

Page 13: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Method:TextCategorization

• Objectiveevaluationonthebasisof“goldstandarddata”andevaluationmetrics

FeatureVectors classes

ModelNewDocuments Predictedclasses

Feedbacktolinguistictheory

SupervisedMachineLearning

Page 14: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

ExplanationinStylometry

• Quantitativeevaluationsandlistsofimportantfeaturesdon’ttellusalotabouthowitworks

• Whycanage,gender,personality,authorshipetc.bedeterminedwithaparticularsetoflinguisticfeatures?

• Landmark:genderinfictionandnon-fiction– Koppel&Argamon etal.2002

Page 15: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

ExplanationinStylometry

Relational Informative

Verbs Nouns

Subjects Modifiers

DeterminersPronouns Prepositions

Gender

Female Male

Page 16: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Personality• Setofmentaltraits(types)that

– Explainandpredictpatternsofthought,feelingandbehavior

– Remainrelativelystableovertimeandcontext• Sourceofvariationbetweenpeoplethat

– Influencessuccessinrelations,jobs,studies,…– Explains35%ofvarianceinlifesatisfaction

• Compare:income(4%),employment(4%),maritalstatus(1%to4%)

• Detectioninteractswiththatofdeception,opinion,emotion,…

Page 17: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

ApplicationsofPersonalityDetection

• Targetedadvertising• Adaptiveinterfacesandrobotbehavior

– music,style,tone,color,…• Psychologicaldiagnosis,forensics

– psychopaths,massmurderers,bullies,victims– depression,suicidaltendencies

• Humanresourcesmanagement• ResearchinLiteraryscience,socialpsychology,sociolinguistics,…

Page 18: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Example:HumanResources• Situation:thousandsofapplicantsfor1job• Method

– Collectwrittenanswerstoquestionsandtranscriptsofaudiomessages

– Deeplearningofwordembeddings onbackgroundtext

– Deeplearningof“applicantembeddings”oncollectedtext

– Similarlyfor“anchors”(modelanswers,goodcandidates)

– Nearestneighborranking

Page 19: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

BackgroundCorpus

Applicantanswersandtranscripts

ApplicantEmbeddingMap

Page 20: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Datacollection• Source

– Socialmedia• Twitter,Facebookstatuses

– Essays• Method

– Self-reporting• questionnaire

– Expertannotation(consensus)– Distantsupervision

• Patternsextractedbyexpertsfromquestionnairequestions(Neuman &Cohen)orfromcorrelationstudies(Celli)

• Classsystem(ClassificationofRegression)– MBTI(Myers-BriggsTypeIndicator)– FFM(FiveFactorModel,BigFive)

Page 21: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

BigFive(FFM)• Extraversion(vs.Introversion)

– Sociable,active,energetic,positive• Neuroticism

– Sad,anxious,tense• Agreeableness

– Altruism,trust,modesty,sympathy• Conscientiousness

– Goal-oriented,organized,incontrol• Openness

– Originality,breadthanddepthofmentallifeandexperience

Page 22: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Meyers-Briggspreferences

• Introversion&Extraversion• iNtuition &Sensing• Feeling&Thinking• Judging&Perceiving

• Leadsto16types:ENTJ(1.8%)…ESFJ(12.3%)• Validityandreliabilityhavebeenquestioned

Page 23: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

6 ESFP4 ISFP4 INFP4 ESTJ3 INTP (architect)1 ESTP (promoter)1 ENTP (inventor)0 ISTP (crafter)

Atypicalsampleofourstudents

28 ESFJ (provider)23 ENFJ (teacher)16 ISFJ (protector)15 INTJ (mastermind)15 INFJ9 ENFP8 ISTJ8 ENTJ

Page 24: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

FeatureCorrelations• Extraverts (as opposed to introverts)

– Produce more language (verbosity)– Smaller vocabulary (TTR), fewer hapaxes– Fewer negative emotion words – More positive emotion words– Fewer hedges (confidence)– More agreement and compliments– Fewer negation and causation– Less concrete– Less formal, more contextualised / relational

• More pronouns, verbs, adverbs, interjections• More present tense• Fewer numbers, less quantification• Fewer negation and causation words

Page 25: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

FeatureCorrelations• Neurotics

– More“I”– Morenegativeemotionwords,fewerpositiveemotionwords– Moreconcreteandfrequentwords

• Agreeable– Morepositiveemotionwords,fewernegativeemotionwords– Fewerdeterminers

• Conscientious– Fewernegations– Fewernegativeemotionwords– Fewerhedges(?)

• Openness– Longerwords– Morehedges– Fewer“I”andpresenttense

Page 26: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

PAN2015

• Personalityaspartofprofiling(alsoageandgender)

• Twitter• 5numericFFMvalues

– Regression(oneclassifierforeachfactor)– Evaluation:RMSE

• English(194users),Spanish(140),Italian(50),Dutch(44)

• 22teamsOverviewofthe3rdAuthorProfilingTaskatPAN2015FRangel,PRosso,MPotthast,BStein,WDaelemans - CLEF,2015http://pan.webis.de/clef15/pan15-web/author-profiling.html

Page 27: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

DocumentRepresentation• Preprocessing

– RemoveHTMLcodes,hashtags,URLs,mentions,…• Features

– charactern-grams– wordn-grams– tf-idf n-grams– syntacticn-grams(lemma,pos,relations,…)– stylisticfeatures

• punctuation,case• emoticons• word,sentencelength,verbosity• characterflooding

– topicmodeling(LSA)– familyvocabulary,LIWC,MRC– frequentterms,discriminativewords,Nes,othervocabularies...– secondorderfeatures(relationshipsamongterms,documents,profiles)

Page 28: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

DocumentRepresentation• Preprocessing

– RemoveHTMLcodes,hashtags,URLs,mentions,…• Features

– charactern-grams– wordn-grams– tf-idf n-grams– syntacticn-grams(lemma,pos,relations,…)– stylisticfeatures

• punctuation,case• emoticons• word,sentencelength,verbosity• characterflooding

– topicmodeling(LSA)– familyvocabulary,LIWC,MRC– frequentterms,discriminativewords,Nes,othervocabularies...– secondorderfeatures(relationshipsamongterms,documents,profiles)

Álvarez-Carmonaetal.(2015)

Page 29: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

MachineLearningMethod

• SVMs• DecisionTrees(RandomForests)• Bagging• LinearDiscriminantAnalysis• StochasticGradientDescent• Linear,Logistic,RidgeforRegression• Distance-basedapproaches

Page 30: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

BestResults

O C E A NEnglish .12 .11 .14 .13 .20Spanish .11 .10 .13 .10 .16Italian .10 .11 .07 .05 .16Dutch .04 .06 .08 .00 .06

Page 31: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Problem

• Samefeaturesareinformativefordifferentauthoraspects(e.g.personality &gender)– Women~ extraverts

• positiveandnegativeemotionwords,pronounuse,…– Jointlearning,ensemblemethods,…

• Maincauseofoverfitting• Largereferencecorpusneededofauthorswithvarioustraitsofinterest– Samplestratification

• Example:CLiPS Stylometry Investigation(CSI)corpus

Page 32: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

CLiPSCSICorpus(withBenVerhoeven)

• http://www.clips.uantwerpen.be/datasets/csi-corpus• Corpuswithtwogenres:essaysandreviews(truthfulanddeceptive)– Dutchnativespeakers,studentsoflanguageandliterature

• Meta-data– Age,gender,sexualorientation,regionoforigin,personalityprofile(BigFive&MBTI)

• Yearlyexpansion– Currently:1800documents;660authors;770,000words

• SuccessorofPersonaeCorpus(2008,withKimLuyckx)

Page 33: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

MiningpersonalitydatafromTwitter(withVerhoeven andPlank)

• Peopletweetabouttheirownpersonality

Page 34: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

TwiSty Corpusapproach• SearchuserswithatweetmentioningtheirMBTIpersonalitytype(keyword-based)+gender– manualchecking

• Fetchrecenttweetsfromthoseusers(~2000average)– applylanguageidentificationforselectingcleanlanguagesample

• ES,PT,FR,NL,IT,DE• Sensationalismbias?• TwiSty isavailablefromhttp://www.clips.uantwerpen.be/datasets

Page 35: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Somecorpusproperties

0 20 40 60 80 100

ESPT

FRNL

ITDE

IversusE

0 20 40 60 80 100

ESPT

FRNL

ITDE

T versusF

Page 36: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

PredictionExperiment

• LinearSVC,wordandcharactern-grams

Page 37: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant
Page 38: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Cognitive-biologicalApproach(Yair Neuman)

• “Personality”categorizationoriginatesfrom– Threat/Trustmanagement– Riskassessmentprocess

• Fight,flight,avoid

– (+complexityofabstractionandinferenceuniquetohumans)

• inthecontextofInterpersonalrelations

Page 39: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Conclusions

• Computationalpersonality– Promisingfortrend-basedapplicationsbutnotgoodenoughyetforindividualprofiling

– Textcategorizationmodelisnotsufficient• Hardtoobtainsufficientreliableannotateddata• Unsupervisedmodels?Clevermining?• threat- trustmodel?

– Computationaltheoryofwritingstyleneedsalarge,preferablymultilingual,butinanycasebalancedcorpus

Page 40: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Stylometry researchers@CLiPS:

Page 41: Profiling Personality of Social Media Authorsgrammars.grlmc.com/SLSP2016/Download/slides/pilsen... · – Deep learning of word embeddingson background text – Deep learning of “applicant

Questions?