automated personality classification a. kartelj and v. filipovic school of mathematics, university...

26
Automated Personality Classification A. KARTELJ and V. FILIPOVIC School of Mathematics, University of Belgrade, Serbia and V. MILUTINOVIC School of Electrical Engineering, University of Belgrade, Serbia

Upload: vanessa-fisher

Post on 27-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

  • Slide 1
  • Automated Personality Classification A. KARTELJ and V. FILIPOVIC School of Mathematics, University of Belgrade, Serbia and V. MILUTINOVIC School of Electrical Engineering, University of Belgrade, Serbia
  • Slide 2
  • Agenda Problem overview Classification of the existing solutions Presentation of the existing solutions Comparison of the solutions Work in progress: Bayesian Structure Learning for the APC Future work: Video Based APC Conclusions MULTI 201223.10.2012
  • Slide 3
  • Problem Overview MULTI 201233.10.2012
  • Slide 4
  • The Big 5 Model MULTI 201243.10.2012
  • Slide 5
  • The Steps in Our Research 1. Survey paper (under review at ACM CSUR) 2. Research paper: A new APC model based on Bayesian structure learning (in progress) 3. Real-purpose application of the APC model from step 2 4. Go to step 3 MULTI 201253.10.2012
  • Slide 6
  • Elements of APC Corpus: Essay, weblog, email, news group, Twitter counts... Personality measurement: Questionnaire (internet and written). We are searching for an alternative! Model: Stylistic analysis, linguistic features, machine learning techniques MULTI 201263.10.2012
  • Slide 7
  • Applications MULTI 201273.10.2012
  • Slide 8
  • Mining Peoples Characteristics MULTI 201283.10.2012
  • Slide 9
  • Classification of Solutions MULTI 201293.10.2012 C1 criterion separates solutions by type of conversation (1 = self-reflexive, N = continuous) C2 criterion separates solutions by approach (TD = top-down, DD = data-driven, or HY = hybrid)
  • Slide 10
  • Linguistic Styles: Language Use as an Individual Difference Pennebaker and King [1999] MULTI 2012103.10.2012
  • Slide 11
  • LIWC and MRC Features FeatureTypeExample Anger wordsLIWCHate, kill Metaphysical issuesLIWCGod, heaven, coffin Physical state / functionLIWCAche, breast, sleep Inclusive wordsLIWCWith, and, include Social processesLIWCTalk, us, friend Family membersLIWCMom, brother, cousin Past tense verbsLIWCWalked, were, had References to friendsLIWCPal, buddy, coworker Imagery of wordsMRCLow: future, peace High: table, car Syllables per wordMRCLow: a High: uncompromisingly ConcretenessMRCLow: patience, candor High: ship Frequency of useMRCLow: duly, nudity High: he, the MULTI 2012113.10.2012
  • Slide 12
  • What Are They Blogging About? Personality, Topic and Motivation in Blogs Gill et al. [2009] MULTI 2012123.10.2012
  • Slide 13
  • Taking Care of the Linguistic Features of Extraversion Gill and Oberlander [2002] MULTI 2012133.10.2012
  • Slide 14
  • Personality Based Latent Friendship Mining Wang et al. [2009] MULTI 2012143.10.2012
  • Slide 15
  • A Comparative Evaluation of Personality Estimation Algorithms for the TWIN Recommender System Roshchina et al. [2011] MULTI 2012153.10.2012
  • Slide 16
  • Predicting Personality with Social Media Golbeck et al. [2011] MULTI 2012163.10.2012
  • Slide 17
  • Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Quercia et al. [2011] MULTI 2012173.10.2012
  • Slide 18
  • PaperInputCorpusFeaturesAlgorithmSoft.Cit.ISAR [Pennebaker and King 1999]textessaysLIWCcorrelationsn/a455HHHM [Mairesse et al. 2007]text, speechessaysLIWC, MRCC4.5, NB, SMO, M5Weka99MMHM [Gill et al. 2009]textweblogs (14.8words)LIWClinear regressionn/a26HHMM [Yarkoni 2010]textweblogs (100K words)LIWCcorrelationsn/a21HMMM [Gill and Oberlander 2002]textemails (105 students)bigramsbigram analysisn/a49LMML [Nowson et al. 2005]textweblogs (410K words)word listcorrelationsn/a48LHHL [Oberlander 2006]textweblogs (410K words)N-gramsNB, SMOWeka53HMHM [Wang et al. 2009]text,weblogs (200 pairs)lexical freq., TFIDF logistic regressionMinitab1HMMM [Iacobelli et al. 2011]textweblogs (3000)LIWC, bigrams,SVM, SMO, NB..Weka1HHMH [Argamon et al. 2005]textessaysword list, conj.SMOWeka38HMMM [Argamon et al. 2007]textessaysword list, conj.SMO Weka, ATMan 45HMMM [Mairesse and Walker 2006] text, conv. extracts 96 persons ( 100Kwords) LIWC, MRC, utterance RankBoostn/a22MMHM [Rigby and Hassan 2007]textmail. lists (140K emails)LIWCC4.5Weka, SPSS30MHML [Roshchina et al. 2011]textTripAdvisor reviewsLIWC, MRCLinear, M5, SVMWeka2HMLM [Quercia et al. 2011]meta335 Twitter usersTwitter countsM5 rulesWeka5MHMM [Golbeck et al. 2011]text, meta279 FB users 5 classes (161 in total) M5 rules, Gaussian processes Weka12HMMM [Celli 2012]text1065 posts22 ling. Features majority-based classification n/a1MMMM MULTI 2012183.10.2012
  • Slide 19
  • Naive Bayes Classifier MULTI 2012193.10.2012
  • Slide 20
  • Naive Bayes and Bayesian Network MULTI 2012203.10.2012
  • Slide 21
  • Bayesian Network for the APC MULTI 2012213.10.2012
  • Slide 22
  • Bayesian Network Structure Learning 1. Obtain corpus (training set T) 2. Fit T to appropriate network structure by: a)ILP formulation + solver (CPLEX, Gurobi) on smaller instances b)Apply metaheuristic on larger instances 3. Validate quality of metaheuristic approach 4. Compare obtained APC accuracy with other approaches MULTI 2012223.10.2012
  • Slide 23
  • Other Ideas MULTI 201223 Games with a purpose (GWAP) Clustering personality characteristics 3.10.2012
  • Slide 24
  • Packing everything together: Video Based APC MULTI 2012243.10.2012
  • Slide 25
  • Conclusions Classification of the existing solutions (Survey paper) Filling the gaps inside classification tree Introducing Bayesian Structure Learning for the APC Utilizing metaheuristics in dealing with high dimensionality APC potential: social networks, recommender, and expert systems MULTI 2012253.10.2012
  • Slide 26
  • THANK YOU! Aleksandar Kartelj [email protected]@matf.bg.ac.rs Vladimir Filipovic [email protected]@matf.bg.ac.rs Veljko Milutinovic [email protected]@etf.bg.ac.rs