khalid el-arini carnegie mellon university joint work with: ulrich paquet, ralf herbrich, jurgen van...

Khalid El-AriniCarnegie Mellon University

Joint work with:Ulrich Paquet, Ralf Herbrich, Jurgen Van Gael, Blaise Agüera

y Arcas

Transparent User Models for

Personalization

Personalization is ubiquitous.

• YouTube: 72+ hours/minute of new video• Facebook: 950 million+ users• Twitter: 400+ million tweets/day• Shopping:

[1994]: 500K unique consumer goods sold in U.S.[2010]: Amazon alone offered 24 million.

Personalization is invaluable.

Keyword search is not enough.

Personalization is often wrong.

- J. Zaslow, November 26, 2002

“Basil…is not a neo-Nazi. Lukas…is not a shadowy stalker.David…is not Korean.

intent on giving them such labels.”

“there's just one way to change its mind: outfox it.” - J. Zaslow, November 26, 2002

What recourse do we have?

Can we do better?

You behave like a

vegan hipster

Vegan? Really? Why?

You: • tweeted with #meatlessmonday• follow @WholeFoods• …

We propose an alternative.

Why am I getting this?

We propose an alternative.

Why am I getting this?

You behave like a

Brooklyn hipster

Goal: Achieve transparency via interpretable user features, learned from user activity

You behave like a

Brooklyn hipster

Goal: Achieve transparency via interpretable user features, learned from user activity

Badges

Approach Model Experiments Summary

1. Define a vocabulary of badges

Apple fanboy

vegan runner photographer

Rich, interpretable and explainable

2. Identify exemplars

How do I find vegans?

observed label

Take advantage of how users describe themselves

Most vegans don’t label themselves as “vegan” on Twitter…

we want to infer the attributes of these users

2. Identify exemplars3. Model characteristic

behavior• Hashtags #meatlessmonday• Retweets RT @WholeFoods

• We have no negative training examples.Use a generative model.

• Actions can be explained by multiple badges, even for the same user.

Noisy-or to combine badges.• How do we deal with user corrections?

Observing a latent variable.

Model sketch

i=1…B

B badges

u=1…N

i=1…B

N users

u=1…N

i=1…B

F actions j=1…F

j=1…F

u=1…N

i=1…BDoes user u have badge i?

j=1…F

bi(u) λi(u)

u=1…N

i=1…B

j=1…F

j=1…FDoes user u have label for

badge i in his profile?

bi(u) λi(u)

j=1…F u=1…N

i=1…B

Has user u performed action j?

j=1…F

bi(u) λi(u)

j=1…F

u=1…N

i=1…B

Does badge i explain action j?

sijφij

bi(u) wi(u)

αφβφj=1…F

j=1…F

u=1…N

i=1…B

What’s the probability that a user with badge i performs action j?

sijφijφbg aj(u)

bi(u) wi(u)

αφβφj=1…F

j=1…F

u=1…N

i=1…B

What is the background probability for each action?

sijφijφbg aj(u)

bi(u) wi(u)

αφβφj=1…F

j=1…F

u=1…N

i=1…B

noisy or:Can at least one of my badges (or the background) explain it?

sijφijφbg aj(u)

bi(u) λi(u)

αφβφj=1…F

j=1…F

u=1…N

i=1…B

sijφijφbg aj(u)

bi(u) λi(u)

αφβφj=1…F

j=1…F

u=1…N

i=1…B

Beta priors to control sparsity

sijφijφbg aj(u)

bi(u) λi(u)

γiT γiF

αφβφ

αT βT αF βF

j=1…F

u=1…N

i=1…B

Beta prior to encode low recall (e.g., 10%)

Beta prior to encode high precision

(e.g., 99.9%)

ηisijφijφbg aj(u)

bi(u) λi(u)

γiT γiFωi

αφβφ

αη βη αω βω αT βT αF βF

j=1…F

u=1…N

i=1…B

• Collapsed Gibbs sampler (with MH steps)

Inference

sijφijφbg

bi(u) λi(u)

γiT γiFωi

αφβφ

j=1…F

u=1…N

i=1…BYou behave like a

vegan hipster.

bi(u) λi(u)

γiT γiFωi

αφβφ

j=1…F

u=1…N

i=1…BYou behave like a

vegan hipster.

• Start with 7 million Twitter users• Manually define 31 sample badges

by specifying labels

Data description

• Start with 7 million Twitter users• Manually define 31 sample badges by

specifying labels• Gather 2 million tweets from August

2011• Recall: actions are hashtags and

retweets

Remove infrequent actions and inactive users, leaving us with:

75,880 users32,030 actions

Data description

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 310

Chart Title

Badges

artist

photographer

country music fan

book worm

Badge statistics

Can we learn badges?

Vegetarian badge

Runner badge

Hacker badge

Manchester United badge

Do all badges look this good?

No, but most do.

45wine lover

Over-generalized

Overwhelmed

Ruby on Rails

Can we just use the labels directly?

Inferred Apple fanboy badge

Self-described Apple fanboys

• Compare to labeled LDA [Ramage+ 2009]– LDA extension where each document is

labeled with multiple tags– One-to-one mapping between topics and tags– Document explained only by topics

associated with its tags

• Hold out random 10% of labels, treat as ground truth, and try to predict them

Comparative Analysis

Rank of held-out labels be

Better predictiveperformance

erBetter predictions for active

Sparse badges

Apple fanboy (badges) Apple fanboy (l-lda)

Leveraged how users describe themselves

Leveraged how users describe themselves to build interpretable user features You behave like a

vegan hipster

Empirically showed we can infer a user’s attributes from his behavior

谢谢

What recourse do we have?

Collaborative filtering

Content-based filtering

Can we do better?

Most vegans don’t label themselves as “vegan” on Twitter……but what about non-vegans?

“I drink too much and hate vegans.”

khalid el-arini carnegie mellon university joint work with: ulrich paquet, ralf herbrich, jurgen van...

b i u i u u

b i u u

j u b i u i u j

j u b i u w i u j

b slide

user u

f slide

badge i

Documents

nyonya arini dengan keluhan gatal

skripsi annisa cahyaning arini (0610233029)

swot pkm utara 1 arini

contra la especulacion en agüera

ovarian cyst arini

sergio agÜera carmona - ranf

romania dorna arini

arini hapsari mmt 4

arini salsabila

josé carlos agüera ros

arini estetia putri

dorna arini

marmalade (arini)

perkecambaha arini

2 arini 76-81.pdf

arini hidayati doc

parcul sub arini

1301100029 arini nurul aufiya.pptx

rain over me arini putri

drug usage in rf-arini-jul09