[email protected] predicting user …nikosaletras.com/resources/kingston.pdfinferring user...
TRANSCRIPT
![Page 1: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/1.jpg)
PREDICTING USER DEMOGRAPHICS IN SOCIAL NETWORKS: APPLICATIONS IN MARKETING
NIKOLAOS ALETRAS [email protected]
![Page 2: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/2.jpg)
INTRODUCTION
![Page 3: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/3.jpg)
INTRODUCTION
THE BIG PICTURE
▸ World Population (2016 estimate1): 7.4B
▸ Internet Users (2016 estimate2) : 3.6B
▸ Social Media Active Users (2016 estimate): 2.3B
1 http://www.prb.org/pdf16/prb-wpds2016-web-2016.pdf 2 http://www.internetworldstats.com/stats.htm
![Page 4: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/4.jpg)
INTRODUCTION
SOCIAL MEDIA AND BUSINESS
▸ Social networks earnings from advertising: Billions of $
▸ Big growth in social media marketing campaigns
▸ 90% of retail brands use 2 or more social media networks
▸ 96% of people discussing brands online do NOT follow brands’ profiles
![Page 5: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/5.jpg)
INTRODUCTION
SOCIAL NETWORKS
![Page 6: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/6.jpg)
INTRODUCTION
USER GENERATED CONTENT
▸ Social media status updates (mainly text)
▸ Photos
▸ Videos
▸ Check-ins (location)
▸ Search queries
▸ Product/Service/Business reviews
![Page 7: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/7.jpg)
INTRODUCTION
USER DEMOGRAPHICS
▸ Groups of people with different characteristics,
▸ Age,
▸ Gender,
▸ Location,
▸ Occupation,
▸ Income,
▸ Socioeconomic class
![Page 8: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/8.jpg)
INTRODUCTION
INFERRING USER DEMOGRAPHICS
▸ Define a predictive task
▸ Given user data predict her attribute
▸ Data
▸ Collect annotations of pairs of users and the desired demographic to be used for training.
▸ Features
▸ Extract features for the available data of each user, i.e. number of tweets posted, number of times used the word “splendid” etc..
▸ Train Machine Learning models for classification using the annotated data.
▸ Test the models on unseen data.
![Page 9: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/9.jpg)
INTRODUCTION
APPLICATIONS IN TARGETED ADVERTISING
▸ More intelligent ad or product recommender systems.
▸ Target people with specific characteristics
▸ Promotion of public policies.
▸ Vaccination campaigns
▸ Targeted online political campaigns.
▸ Trump vs Clinton
![Page 10: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/10.jpg)
INTRODUCTION
IN GENERAL
▸ Enable large scale studies in social sciences
▸ analyse human behaviour on a large scale
▸ computational social science
▸ Tackle real world problems
▸ education
▸ health intervention/surveillance
▸ economic development
![Page 11: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/11.jpg)
INTRODUCTION
IN GENERAL
▸ Integration to other predictive tasks
▸ voting intention
▸ sentiment analysis
▸ health (e.g. infectious disease outbreak prediction)
![Page 12: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/12.jpg)
INTRODUCTION
FOCUS ON SOCIOECONOMIC ATTRIBUTES
▸ Social status influences language use (Bernstein, 1960; Bernstein, 2003; Labov, 2006)
▸ Hypothesis
▸ Language use in Twitter can be indicative of user demographics.
▸ User attributes:
▸ Occupational class
▸ Income
▸ Socioeconomic class
![Page 13: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/13.jpg)
DATA
![Page 14: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/14.jpg)
DATA
HOW TO COLLECT DATA
▸ Data
▸ Collect annotations of pairs of users and the desired demographic to be used for training.
▸ Why?
▸ To train and evaluate!
▸ But how we map Twitter users to their socioeconomic characteristics?
![Page 15: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/15.jpg)
DATA
SOC TAXONOMY
‣ Standard Occupational Classification (SOC):
‣ Taxonomy of jobs, grouped by skill requirements
‣ Developed by the UK Office for National Statistics
![Page 16: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/16.jpg)
DATA
SOC TAXONOMY
‣ C1 Corporate Managers and Directors —> chief executive, bank manager
‣ C2 Professional Occupations —> mechanical engineer, pediatrist, research scientist
‣ C3 Associate Professional and Technical Occupations —> system administrator, dispensing optician
‣ C4 Administrative and Secretarial Occupations —> legal clerk, company secretary
‣ C5 Skilled Trades Occupations —> electrical fitter, tailor
‣ C6 Caring, Leisure, Other Service Occupations —> school assistant, hairdresser
‣ C7 Sales and Customer Service Occupations —> sales assistant, telephonist
‣ C8 Process, Plant and Machine Operatives —> factory worker, van driver
‣ C9 Elementary Occupations —> shelf stacker, bartender
![Page 17: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/17.jpg)
DATA
MAP USERS TO OCCUPATIONAL CLASS
‣ Manual annotation
‣ Use job titles from SOC to retrieve Twitter accounts
‣ Read the profile info and/or tweets
‣ Remove organisations/companies
‣ Keep only users that annotators agree they belong to a specific class
![Page 18: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/18.jpg)
DATA
MAP USERS TO MEAN INCOME
‣ Use the SOC class as a proxy to find user’s mean income and socioeconomic class
‣ Annual Survey of Hours & Earnings + SOC —> Mean income in £
‣ Production manager (£50,952/year)
‣ Sales Supervisor (£18,383/year)
![Page 19: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/19.jpg)
DATA
MAP USERS TO SOCIOECONOMIC CLASS
‣ Use the SOC class as a proxy to find user’s mean income and socioeconomic class
‣ Socioeconomic coding + SOC —> Socioeconomic class (upper, medium, lower)
‣ Bank manager —> upper
‣ Government clerk —> medium
‣ Factory cleaner —> lower
![Page 20: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/20.jpg)
DATA
DATA SETS
‣ Data Set 1
‣ 5,191 Twitter users - SOC class - Mean income
‣ 10M tweets (maximum 200 tweets/user)
‣ Publicly available
‣ Data Set 2
‣ 1,342 Twitter users - SOC class - Socioeconomic class
‣ 2M tweets
‣ Publicly available
![Page 21: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/21.jpg)
MODELS
![Page 22: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/22.jpg)
MODELS
SUPERVISED LEARNING
‣ Supervised learning is the Machine Learning task to “learning” a function from labelled training examples.
‣ e.g. Twitter users and their occupational class
![Page 23: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/23.jpg)
MODELS
SUPERVISED LEARNING - EXAM ANALOGY
‣ Imagine you want to prepare for the exam in this module.
‣ Your “training data” consist of all the available past exam papers.
‣ During training (studying), you learn by studying past exam papers.
‣ You can test yourself by holding out a number of past exams (development set).
‣ Evaluation is performed on the exam day (test data)! Your score is computed by your examiner.
![Page 24: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/24.jpg)
MODELS
SUPERVISED LEARNING PIPELINE
▸ Data
▸ Collect annotations of pairs of users and the desired demographic to be used for training.
▸ Split the data into training, development and testing sets (usually 80-10-10)
▸ Feature representation
▸ Extract features for the available data of each user, i.e. number of tweets posted, number of times used the word “splendid” etc..
▸ That results into a vector
▸ Train Machine Learning models for classification using the annotated data.
▸ Tune any parameters of the model in the development set.
▸ Evaluate the performance of the models on the test set.
![Page 25: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/25.jpg)
MODELS
SUPERVISED MODELS
‣ Traditional linear models (e.g. logistic regression)
‣ Suport Vector Machines (SVMs)
‣ Gaussian Processes (GPs)
‣ Neural Networks
![Page 26: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/26.jpg)
MODELS
SUPERVISED MODELS
‣ We look for two main characteristics:
‣ Model non-linearities
‣ Interpretability
‣ We use Gaussian Process for classification
![Page 27: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/27.jpg)
PREDICTING THE OCCUPATIONAL CLASS
![Page 28: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/28.jpg)
OCCUPATIONAL CLASS
PREDICTING THE OCCUPATIONAL CLASS
Users Feature vectors GPs SOC class
C1 - C9
‣ 5,191 users mapped to a SOC class
‣ ~10M tweets
![Page 29: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/29.jpg)
OCCUPATIONAL CLASS
USER DISTRIBUTION IN THE OCCUPATIONAL CLASSES
46.9
51.7 52.7
0%
10%
20%
30%
40%
C1 C2 C3 C4 C5 C6 C7 C8 C9
Distribution of users in the 9 SOC classes
![Page 30: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/30.jpg)
OCCUPATIONAL CLASS
C2 Professional Occupations
![Page 31: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/31.jpg)
OCCUPATIONAL CLASS
![Page 32: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/32.jpg)
OCCUPATIONAL CLASS
FEATURES
‣ User profile (18)
‣ number of followers/friends/listings/tweets
‣ proportion of retweets/hashtags/@-replies/links
‣ average of tweets a day/retweets per tweet
‣ Topics - Word Clusters (200)
‣ Spectral clustering on a word similarity matrix.
‣ Words represented as Word2Vec embeddings (Mikolov et al., 2013).
‣ Similarity is computed as the cosine of the word embeddings.
![Page 33: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/33.jpg)
OCCUPATIONAL CLASS
PERFORMANCE
![Page 34: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/34.jpg)
OCCUPATIONAL CLASS
PERFORMANCE
Accu
racy
(%)
0
15
30
45
60
Feature Type
User Profile Word2Vec Clusters
Logistic Regression SVM (RBF) Gaussian Process (ARD)
![Page 35: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/35.jpg)
OCCUPATIONAL CLASS
PERFORMANCE
Accu
racy
(%)
0
15
30
45
60
Feature Type
User Profile Word2Vec Clusters
Logistic Regression SVM (RBF) Gaussian Process (ARD)
34.231.534
![Page 36: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/36.jpg)
OCCUPATIONAL CLASS
PERFORMANCE
Accu
racy
(%)
0
15
30
45
60
Feature Type
User Profile Word2Vec Clusters
Logistic Regression SVM (RBF) Gaussian Process (ARD)
52.7
34.2
51.7
31.5
46.934
![Page 37: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/37.jpg)
OCCUPATIONAL CLASS
PERFORMANCE
Accu
racy
(%)
0
15
30
45
60
Feature Type
User Profile Word2Vec Clusters
Logistic Regression SVM (RBF) Gaussian Process (ARD)
52.7
34.2
51.7
31.5
46.934
![Page 38: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/38.jpg)
OCCUPATIONAL CLASS
MOST PREDICTIVE TOPICS
Rank Label Topic
1 Arts art, design, print, collection, poster, painting, custom, logo, printing drawing
2 Health risk, cancer, mental, stress, patients, treatment, surgery, disease, drugs, doctor
3 Beauty Care beauty, natural, dry, skin, massage, plastic, spray, facial, treatments, soap
4 Higher Education
students, research, board, student, college, education, library, schools, teaching, teachers
5 Software Engineering
service, data, system, services, access, security, development, software, testing, standard
Most predictive Topics given by ARD ranking
![Page 39: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/39.jpg)
OCCUPATIONAL CLASS
Rank Label Topic
7 Football van, foster, cole, winger, terry, reckons, youngster, rooney, fielding, kenny
8 Corporate patent, industry, reports, global, survey, leading, firm, 2015, innovation, financial
9 Cooking recipe, meat, salad, egg, soup, sauce, beef, served, pork, rice
12 Elongated Words
wait, till, til, yay, ahhh, hoo, woo, woot, whoop, woohoo
16 Politics human, culture, justice, religion, democracy, religious, humanity, tradition, ancient, racism
Most predictive Topics given by ARD ranking
![Page 40: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/40.jpg)
OCCUPATIONAL CLASS
FEATURE ANALYSIS
0.001 0.01 0.050
0.2
0.4
0.6
0.8
1
Topic proportion
Use
r pro
babi
lity
Higher Education (#21)
C1C2C3C4C5C6C7C8C9
TOPIC MORE PREVALENT IN A CLASS C1-C9 —> CDF LINE CLOSER TO THE BOTTOM-RIGHT CORNER
![Page 41: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/41.jpg)
OCCUPATIONAL CLASS
FEATURE ANALYSIS
0.001 0.01 0.050
0.2
0.4
0.6
0.8
1
Topic proportion
Use
r pro
babi
lity
Arts (#116)
C1C2C3C4C5C6C7C8C9
TOPIC MORE PREVALENT IN A CLASS C1-C9 —> CDF LINE CLOSER TO THE BOTTOM-RIGHT CORNER
![Page 42: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/42.jpg)
OCCUPATIONAL CLASS
FEATURE ANALYSIS
0.001 0.01 0.050
0.2
0.4
0.6
0.8
1
Topic proportion
Use
r pro
babi
lity
Elongated Words (#164)
C1C2C3C4C5C6C7C8C9
TOPIC MORE PREVALENT IN A CLASS C1-C9 —> CDF LINE CLOSER TO THE BOTTOM-RIGHT CORNER
![Page 43: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/43.jpg)
TEXT
FEATURE ANALYSIS
Jensen-Shannon Divergence between topic distributions across classes
![Page 44: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/44.jpg)
TEXT
FEATURE ANALYSIS
Jensen-Shannon Divergence between topic distributions across classes
![Page 45: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/45.jpg)
TEXT
FEATURE ANALYSIS
Jensen-Shannon Divergence between topic distributions across classes
![Page 46: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/46.jpg)
TEXT
FEATURE ANALYSIS
Jensen-Shannon Divergence between topic distributions across classes
![Page 47: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/47.jpg)
TEXT
FEATURE ANALYSIS
Jensen-Shannon Divergence between topic distributions across classes
![Page 48: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/48.jpg)
PREDICTING THE INCOME
![Page 49: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/49.jpg)
INCOME
PREDICTING THE INCOME
Users Feature vectors GPs £
‣ 5,191 Twitter users mapped to an average income in GBP (£)
‣ ~10M tweets
![Page 50: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/50.jpg)
INCOME
PREDICTING THE INCOME
10k 30k 50k 100k0
200
400
600
800
1000
Yearly income (£)
No.
Use
rs
![Page 51: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/51.jpg)
INCOME
FEATURES
▸ Profile (8): #followers, #followees, times listed etc.
▸ Shallow textual features (10): proportion of hashtags, @-replies etc.
▸ Inferred psycho-demographic features (15): gender, age, education level, religion etc.
▸ Emotions (9): joy, anger, fear, disgust etc.
▸ Word Clusters - Topics (200): Word Embeddings —> Similarity matrix —> Spectral Clustering
![Page 52: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/52.jpg)
INCOME
PERFORMANCE
MEAN ABSOLUTE ERROR (MAE) OF INCOME INFERENCE
![Page 53: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/53.jpg)
INCOME
PERFORMANCE
MAE
9000
9750
10500
11250
12000
Feature Type
Profile Demo Emotion Shallow Topics All Features
MEAN ABSOLUTE ERROR (MAE) OF INCOME INFERENCE
![Page 54: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/54.jpg)
INCOME
PERFORMANCE
MAE
9000
9750
10500
11250
12000
Feature Type
Profile Demo Emotion Shallow Topics All Features
£11,291
MEAN ABSOLUTE ERROR (MAE) OF INCOME INFERENCE
![Page 55: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/55.jpg)
INCOME
PERFORMANCE
MAE
9000
9750
10500
11250
12000
Feature Type
Profile Demo Emotion Shallow Topics All Features
£10,110
£11,291
MEAN ABSOLUTE ERROR (MAE) OF INCOME INFERENCE
![Page 56: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/56.jpg)
INCOME
PERFORMANCE
MAE
9000
9750
10500
11250
12000
Feature Type
Profile Demo Emotion Shallow Topics All Features
£10,980
£10,110
£11,291
MEAN ABSOLUTE ERROR (MAE) OF INCOME INFERENCE
![Page 57: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/57.jpg)
INCOME
PERFORMANCE
MAE
9000
9750
10500
11250
12000
Feature Type
Profile Demo Emotion Shallow Topics All Features
£11,456£10,980
£10,110
£11,291
MEAN ABSOLUTE ERROR (MAE) OF INCOME INFERENCE
![Page 58: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/58.jpg)
INCOME
PERFORMANCE
MAE
9000
9750
10500
11250
12000
Feature Type
Profile Demo Emotion Shallow Topics All Features
£9,621
£11,456£10,980
£10,110
£11,291
MEAN ABSOLUTE ERROR (MAE) OF INCOME INFERENCE
![Page 59: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/59.jpg)
INCOME
PERFORMANCE
MAE
9000
9750
10500
11250
12000
Feature Type
Profile Demo Emotion Shallow Topics All Features
£9,535£9,621
£11,456£10,980
£10,110
£11,291
MEAN ABSOLUTE ERROR (MAE) OF INCOME INFERENCE
![Page 60: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/60.jpg)
INCOME
PERFORMANCE
0.21
0.28
0.22
0.27
0.20
0.50 0.51
0.33
0.26
0.32 0.36
0.26
0.61 0.61
0.37 0.36 0.33
0.37 0.36
0.61 0.63
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
Profile Psycho-Demo Personality Emotions Shallow Topics All Features
LR SVM-RBF GP
CORRELATION BETWEEN ACTUAL AND PREDICTED INCOME
![Page 61: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/61.jpg)
INCOME
FEATURE ANALYSIS
e1: positive (l=46.27) e2: neutral (l=57.64) e3: negative(l=76.34)
e4: joy (l=36.37) e5: sadness (l=67.05) e6: disgust (l=116.66)
e7: anger (l=95.50) e8: surprise (l=83.61) e9: fear (l=31.74)
280003500042000
280003500042000
280003500042000
0.1 0.2 0.3 0.4 0.5 0.4 0.5 0.6 0.7 0.8 0.9 0.05 0.10 0.15 0.20
0.5 0.6 0.7 0.8 0.05 0.10 0.010 0.015 0.020 0.025 0.030
0.01 0.02 0.03 0.04 0.05 0.10 0.15 0.20 0.25 0.05 0.10 0.15Feature value
Inco
me
LINEAR VS GP FIT
RELATION OF INCOME AND EMOTION
![Page 62: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/62.jpg)
INCOME
FEATURE ANALYSIS
LINEAR VS GP FIT
Topic 107 (Justice) Topic 124 (Corporate 1) Topic 139 (Politics)
Topic 163 (NGOs) Topic 196 (Web analytics/Surveys) Topic 99 (Swearing)
30000
40000
50000
30000
40000
50000
0.00 0.02 0.04 0.06 0.00 0.02 0.04 0.000 0.025 0.050 0.075
0.000 0.025 0.050 0.075 0.100 0.00 0.01 0.02 0.03 0.04 0.00 0.03 0.06 0.09 0.12Feature value
Inco
me
RELATION OF INCOME AND TOPICS
![Page 63: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/63.jpg)
INCOME
FEATURE ANALYSIS
LINEAR VS GP FIT
u1: No.followers (l=47.76) u2: No.friends (l=84.48) u3: No.listings (l=2.65) u4: Foll/fr.ratio (l=5.16)
u5: No.favs (l=96.41) u6: Tw/day (40.96) u7: No.tweets (l=15.94) u8:English Tw. (l=3.12)
28000
36000
44000
28000
36000
44000
0 2000 4000 0 500 1000 1500 2000 2500 0 50 100 150 2 4 6
0 1000 2000 3000 0 5 10 15 20 0 10000 20000 30000 0.25 0.50 0.75 1.00Feature value
Inco
me
RELATION OF INCOME AND PROFILE
![Page 64: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/64.jpg)
INCOME
FEATURE ANALYSISRELATION OF INCOME AND PSYCHO-DEMOGRAPHIC FEATURES
●
●
●
●
●
●
●
●
●
●
●
●
●
●30023
36408
30670
32804
34949
32154
24944
32621
27792
35028
31880
34627
32029
32985
Income: Above AverageIncome: Below Average
Religion: UnaffiliatedReligion: Christian
Gender: MaleGender: Female
Ethnicity: CaucassianEthnicity: African American
Education: DegreeEducation: High School
Age: > 35Age: 30−35Age: 25−30
Age: < 25
20000 25000 30000 35000 40000Mean group income (95% CI)
![Page 65: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/65.jpg)
PREDICTING THE SOCIOECONOMIC CLASS
![Page 66: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/66.jpg)
SOCIOECONOMIC CLASS
PREDICTING THE SOCIOECONOMIC CLASS
Users Feature vectors GPs
upper medium
lower
‣ 1,342 Twitter users mapped to a socioeconomic class label
‣ ~2M tweets
upper medium
+ lower
3-WAY
2-WAY
![Page 67: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/67.jpg)
SOCIOECONOMIC CLASS
FEATURES
▸ User Profile (4)
▸ User bio (523)
▸ Text in tweets (560)
▸ Topics - Word Clusters (200)
▸ User Impact on the platform (4)
▸ Total of 1,291 features
![Page 68: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/68.jpg)
SOCIOECONOMIC CLASS
PERFORMANCE
Classification Accuracy (%) Precision (%) Recall (%) F1
2-way 82.05 (2.4) 82.2 (2.4) 81.97 (2.6) .821 (.03)
3-way 75.09 (3.3) 72.04 (4.4) 70.76 (5.7) .714 (.05)
CLASSIFICATION PERFORMANCE (10-FOLD CV)
![Page 69: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/69.jpg)
SOCIOECONOMIC CLASS
PERFORMANCE
T1 T2 P
O1 584 115 83.5%
O2 126 517 80.4%
R 82.3% 81.8% 82.0%
T1 T2 T3 P
O1 606 84 53 81.6%
O2 49 186 45 66.4%
O3 55 48 216 67.7%
R 854% 58.5% 68.8% 75.1%
CONFUSION MATRICES (AGGREGATE)
O = output (inferred), T = target, P = precision, R = recall {1, 2, 3} = {upper, middle, lower} socioeconomic status
![Page 70: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/70.jpg)
CONCLUSIONS
‣User-generated content is extremely useful resource ‣infer user demographics ‣social science research ‣commercial tasks
‣User socio-economic status influences language use in social media ‣Non-linear models (Gaussian Processes) ‣better modelling of demographic inference tasks ‣interpretability
‣Topic features provide better representations and performance ‣Qualitative analysis ‣Insights to interesting patterns
![Page 71: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/71.jpg)
ACKNOWLEDGEMENTS
Daniel Preotiuc-Pietro……………………………….Bloomberg
Vasileios Lampos……………………………………………..UCL
Ingemar J. Cox…………………….UCL & Uni. of Copenhagen
Jens K. Geyti………………………………………………….UCL
Bin Zou……..………………………………………………….UCL
Svitlana Volkova……………………………………………..PNNL
Yoram Bachrach…………………………….Microsoft Research
![Page 72: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/72.jpg)
PUBLICATIONS
D. Preoţiuc-Pietro, V. Lampos and N. Aletras (2015). An Analysis of the User Occupational Class through Twitter Content. In ACL.
D. Preoţiuc-Pietro, S. Volkova, V. Lampos, Y. Bachrach, N. Aletras (2015). Studying User Income through Language, Behaviour and Affect in Social Media. PLOS ONE.
V. Lampos, N. Aletras, J. K. Geyti, B. Zou, I. J. Cox (2016). Inferring the Socioeconomic Status of Social Media Users based on Behaviour and Language. In ECIR.
![Page 73: NIKOS.ALETRAS@GMAIL.COM PREDICTING USER …nikosaletras.com/resources/kingston.pdfINFERRING USER DEMOGRAPHICS Define a predictive task Given user data predict her attribute Data Collect](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed2dca3a079355bb26d9ec1/html5/thumbnails/73.jpg)
THANK YOU QUESTIONS?