online bayesian models for personal analytics in social media svitlana volkova and benjamin van...

Online Bayesian Models for Personal Analytics in Social Media

Svitlana Volkova and Benjamin Van Durme

[email protected] http://www.cs.jhu.edu/~svitlana/

Center for Language and Speech Processing, Johns Hopkins University,

Human Language Technology Center of Excellence

mailto:[email protected]

http://www.cs.jhu.edu/~svitlana/


Social Media Predictive Analytics

• Personalized, diverse and timely data • Can reveal user interests, preferences and

opinions

Social Network Prediction App - https://apps.facebook.com/snpredictionapp/

DemographicsPro – http://www.demographicspro.com/WolphralAlpha Analytics – http://www.wolframalpha.com/facebook/

https://apps.facebook.com/snpredictionapp/

https://apps.facebook.com/snpredictionapp/

http://www.demographicspro.com/

http://www.wolframalpha.com/facebook/

User Attribute Prediction Task

Political PreferenceRao et al., 2010; Conover et al., 2011, Pennacchiotti and Popescu, 2011; Zamal et al.,

2012; Cohen and Ruths, 2013; Volkova et. al, 2014

.

.

.

Communications

GenderGarera and Yarowsky, 2009;

Rao et al., 2010; Burger et al., 2011; Van Durme, 2012;

Zamal et al., 2012; Bergsma and Van Durme, 2013

AgeRao et al., 2010; Zamal et al., 2012; Cohen and Ruth, 2013;

Nguyen et al., 2011, 2013; Sap et al., 2014

…

…

…

…

AAAI 2015 Demo (joint work with Microsoft Research) Income, Education Level, Ethnicity, Life Satisfaction, Optimism, Personality, Showing Off, Self-Promoting

OutlineI. Our Approach

II. Dynamic (Streaming) Models

III.Experimental Results

IV. Practical Recommendations

Existing Approaches ~1K Tweets*

….…….…….…….…….…….…….…….…

How long does it take for an average Twitter user to produce thousands of tweets?

*Rao et al., 2010; Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Burger et al., 2011; Zamal et al., 2012; Nguyen et al., 2013

Tweets as a

document

What if we want to make reliable predictions immediately after 10 tweets?

Attributed Social Networks

*Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Zamal et al., 2012; Volkova et al., 2014.

Our Approach

Static (Batch)

Predictions

Streaming (Online)

Inference

Dynamic (Iterative) Learning and

Prediction• Offline

training• Offline

predictions• No or limited

network information

• Offline training• Online

predictions in time (ACL’14)

• Exploring 6 types of neighborhoods

① Streaming nature of SM: dynamic training and prediction

② Network structure: joint user-neighbour streams③ Trade-off between prediction time vs. model

quality

• Online predictions• Relying on

neighbors + Iterative re-training+ Active learning+ Interactive

rationale annotation

Online Predictions:Iterative Bayesian Updates

Time

…

?

?

Iterative Batch Learning

Time

R

D

?

?

t1

…

t1

Labeled

Unlabeled

t1

t1

Iterative Batch Retraining (IB)

Iterative Batch with Rationale Filtering (IBR)

?

tm…

tmt2 …

t2 …

tmt2 …

Rationales

Rationales are explicitly highlighted ngrams in tweets that best justified why the annotators made their labeling

decisions

Active LearningL

ab

ele

dU

nla

bele

d

1-Jan-2011

1-Feb-2011

1-Nov-2011

1-Dec-2011

Time

…

…

Active Without Oracle (AWOO)

Active With Rationale Filtering (AWR)

Active With Oracle (AWO)

Performance Metrics

• Accuracy over time:

• Find optimal models:– Data steam type (user, friend, user + friend)– Time (more correctly classified users faster)– Prediction quality (better accuracy over time)

Results: Iterative Batch Learning

Mar Jun Sep50

100

150

200

250

300

0.0

0.2

0.4

0.6

0.8

1.0

user

Co

rrectl

y c

lassifi

ed

Accu

racy

Mar Jun Sep50

100

150

200

250

300

0.0

0.2

0.4

0.6

0.8

1.0

user

Co

rrectl

y c

lassifi

ed

Accu

racy

IB: higher recall IBR: higher precision

Time: # correctly classified users increases over time

IB faster, IBR slower

Data stream selection:User + friend stream > user stream

Results: Active Learning AWOO: higher recall AWR: higher precision

Time:Unlike IB/IBR models, AWOO/AWR

models classify more users correctly faster (in Mar) but then plateaus

Mar Jun Sep50

100

150

200

250

300

0.0

0.2

0.4

0.6

0.8

1.0

user

Co

rrectl

y c

lassifi

ed

Accu

racy

Mar Jun Sep50

100

150

200

250

300

0.0

0.2

0.4

0.6

0.8

1.0

user

Co

rrectl

y c

lassifi

ed

Accu

racy

Mar Jun Sep0.5

0.6

0.7

0.8

0.9

1.0

IB: userIBR: user

Accu

racy

Mar Jun Sep0.5

0.6

0.7

0.8

0.9

1.0

AWOO: userAWR: user

Accu

racy

_x0003_Mar

_x0003_Jun

_x0003_Sep

0.5

0.6

0.7

0.8

0.9

1.0

IB: user + friend

Acc

ura

cy

_x0003_Mar

_x0003_Jun

_x0003_Sep

0.5

0.6

0.7

0.8

0.9

1.0

AWOO: user + friend

Acc

ura

cy

batch < activeu

ser

+ f

rien

d >

use

rResults: Model Quality

Summary

• Active learning > iterative batch

• N, UN > U: “neighbors give you away”

• Higher confidence => higher precision, lower confidence => higher recall (as expected)

• Rationales significantly improve results

Practical Recommendations• If you want to deliver ads fast but to be less

confident in user attribute predictions:– use models with higher recall (AWOO, IB)– apply lower decision threshold e.g., 0.55

• If you want to deliver ads to a true target crowd but latter in time: – use models with higher precision (AWR, IBR)– apply higher decision threshold e.g., 0.95 – models with rational filtering (IBR, AWR) require less

computation (lower-dimensional feature vectors), are more accurate but annotations cost money (Mechanical Turk)

• For highly assortative attributes e.g., political preference use a joint user-neighbor stream

Thank you!Labeled Twitter network data for gender, age, political preference

prediction: http://www.cs.jhu.edu/~svitlana/

Interested in using our models for your research or collaboration: code and pre-trained models for inferring demographic attributes,

personality and 6 Ekman’s emotions available on request: [email protected]

AAAI Technical DemoInferring Latent User Properties from Texts Published in

Social MediaWednesday, January 28 6:30 – 8:00 Zilker Ballroom

I am on a job market. Hire me!

Email: [email protected]







online bayesian models for personal analytics in social media svitlana volkova and benjamin van...

Documents

prediction time

time active

user friend time

oracle awo slide

labeling decisions slide

selfpromoting slide

time acl14

dynamic training