online bayesian models for personal analytics in social media svitlana volkova and benjamin van...

18
Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme [email protected] http://www.cs.jhu.edu/~svitlana / Center for Language and Speech Processing, Johns Hopkins University, Human Language Technology Center of Excellence

Upload: godfrey-long

Post on 23-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Online Bayesian Models for Personal Analytics in Social Media

Svitlana Volkova and Benjamin Van Durme

[email protected] http://www.cs.jhu.edu/~svitlana/

Center for Language and Speech Processing, Johns Hopkins University,

Human Language Technology Center of Excellence

Page 2: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Social Media Predictive Analytics

• Personalized, diverse and timely data • Can reveal user interests, preferences and

opinions

Social Network Prediction App - https://apps.facebook.com/snpredictionapp/

DemographicsPro – http://www.demographicspro.com/WolphralAlpha Analytics – http://www.wolframalpha.com/facebook/

Page 3: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

User Attribute Prediction Task

Political PreferenceRao et al., 2010; Conover et al., 2011, Pennacchiotti and Popescu, 2011; Zamal et al.,

2012; Cohen and Ruths, 2013; Volkova et. al, 2014

.

.

.

Communications

GenderGarera and Yarowsky, 2009;

Rao et al., 2010; Burger et al., 2011; Van Durme, 2012;

Zamal et al., 2012; Bergsma and Van Durme, 2013

AgeRao et al., 2010; Zamal et al., 2012; Cohen and Ruth, 2013;

Nguyen et al., 2011, 2013; Sap et al., 2014

AAAI 2015 Demo (joint work with Microsoft Research) Income, Education Level, Ethnicity, Life Satisfaction, Optimism, Personality, Showing Off, Self-Promoting

Page 4: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

OutlineI. Our Approach

II. Dynamic (Streaming) Models

III.Experimental Results

IV. Practical Recommendations

Page 5: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Existing Approaches ~1K Tweets*

….…….…….…….…….…….…….…….…

How long does it take for an average Twitter user to produce thousands of tweets?

*Rao et al., 2010; Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Burger et al., 2011; Zamal et al., 2012; Nguyen et al., 2013

Tweets as a

document

What if we want to make reliable predictions immediately after 10 tweets?

Page 6: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Attributed Social Networks

*Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Zamal et al., 2012; Volkova et al., 2014.

Page 7: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Our Approach

Static (Batch)

Predictions

Streaming (Online)

Inference

Dynamic (Iterative) Learning and

Prediction• Offline

training• Offline

predictions• No or limited

network information

• Offline training• Online

predictions in time (ACL’14)

• Exploring 6 types of neighborhoods

① Streaming nature of SM: dynamic training and prediction

② Network structure: joint user-neighbour streams③ Trade-off between prediction time vs. model

quality

• Online predictions• Relying on

neighbors + Iterative re-training+ Active learning+ Interactive

rationale annotation

Page 8: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Online Predictions:Iterative Bayesian Updates

Time

?

?

Page 9: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Iterative Batch Learning

Time

R

D

?

?

t1

t1

Labeled

Unlabeled

t1

t1

Iterative Batch Retraining (IB)

Iterative Batch with Rationale Filtering (IBR)

?

tm…

tmt2 …

t2 …

tmt2 …

Page 10: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Rationales

Rationales are explicitly highlighted ngrams in tweets that best justified why the annotators made their labeling

decisions

Page 11: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Active LearningL

ab

ele

dU

nla

bele

d

1-Jan-2011

1-Feb-2011

1-Nov-2011

1-Dec-2011

Time

Active Without Oracle (AWOO)

Active With Rationale Filtering (AWR)

Active With Oracle (AWO)

Page 12: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Performance Metrics

• Accuracy over time:

• Find optimal models:– Data steam type (user, friend, user + friend)– Time (more correctly classified users faster)– Prediction quality (better accuracy over time)

Page 13: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Results: Iterative Batch Learning

Mar Jun Sep50

100

150

200

250

300

0.0

0.2

0.4

0.6

0.8

1.0

user

Co

rrectl

y c

lassifi

ed

Accu

racy

Mar Jun Sep50

100

150

200

250

300

0.0

0.2

0.4

0.6

0.8

1.0

user

Co

rrectl

y c

lassifi

ed

Accu

racy

IB: higher recall IBR: higher precision

Time: # correctly classified users increases over time

IB faster, IBR slower

Data stream selection:User + friend stream > user stream

Page 14: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Results: Active Learning AWOO: higher recall AWR: higher precision

Time:Unlike IB/IBR models, AWOO/AWR

models classify more users correctly faster (in Mar) but then plateaus

Mar Jun Sep50

100

150

200

250

300

0.0

0.2

0.4

0.6

0.8

1.0

user

Co

rrectl

y c

lassifi

ed

Accu

racy

Mar Jun Sep50

100

150

200

250

300

0.0

0.2

0.4

0.6

0.8

1.0

user

Co

rrectl

y c

lassifi

ed

Accu

racy

Page 15: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Mar Jun Sep0.5

0.6

0.7

0.8

0.9

1.0

IB: userIBR: user

Accu

racy

Mar Jun Sep0.5

0.6

0.7

0.8

0.9

1.0

AWOO: userAWR: user

Accu

racy

_x0003_Mar

_x0003_Jun

_x0003_Sep

0.5

0.6

0.7

0.8

0.9

1.0

IB: user + friend

Acc

ura

cy

_x0003_Mar

_x0003_Jun

_x0003_Sep

0.5

0.6

0.7

0.8

0.9

1.0

AWOO: user + friend

Acc

ura

cy

batch < activeu

ser

+ f

rien

d >

use

rResults: Model Quality

Page 16: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Summary

• Active learning > iterative batch

• N, UN > U: “neighbors give you away”

• Higher confidence => higher precision, lower confidence => higher recall (as expected)

• Rationales significantly improve results

Page 17: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Practical Recommendations• If you want to deliver ads fast but to be less

confident in user attribute predictions:– use models with higher recall (AWOO, IB)– apply lower decision threshold e.g., 0.55

• If you want to deliver ads to a true target crowd but latter in time: – use models with higher precision (AWR, IBR)– apply higher decision threshold e.g., 0.95 – models with rational filtering (IBR, AWR) require less

computation (lower-dimensional feature vectors), are more accurate but annotations cost money (Mechanical Turk)

• For highly assortative attributes e.g., political preference use a joint user-neighbor stream

Page 18: Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu svitlana

Thank you!Labeled Twitter network data for gender, age, political preference

prediction: http://www.cs.jhu.edu/~svitlana/

Interested in using our models for your research or collaboration: code and pre-trained models for inferring demographic attributes,

personality and 6 Ekman’s emotions available on request: [email protected]

AAAI Technical DemoInferring Latent User Properties from Texts Published in

Social MediaWednesday, January 28 6:30 – 8:00 Zilker Ballroom

I am on a job market. Hire me!

Email: [email protected]