khalid el-arini carnegie mellon university joint work with: ulrich paquet, ralf herbrich, jurgen van...
Post on 17-Dec-2015
217 Views
Preview:
TRANSCRIPT
Khalid El-AriniCarnegie Mellon University
Joint work with:Ulrich Paquet, Ralf Herbrich, Jurgen Van Gael, Blaise Agüera
y Arcas
Transparent User Models for
Personalization
Personalization is ubiquitous.
3
• YouTube: 72+ hours/minute of new video• Facebook: 950 million+ users• Twitter: 400+ million tweets/day• Shopping:
[1994]: 500K unique consumer goods sold in U.S.[2010]: Amazon alone offered 24 million.
Personalization is invaluable.
Keyword search is not enough.
Personalization is often wrong.
- J. Zaslow, November 26, 2002
“Basil…is not a neo-Nazi. Lukas…is not a shadowy stalker.David…is not Korean.
intent on giving them such labels.”
“there's just one way to change its mind: outfox it.” - J. Zaslow, November 26, 2002
What recourse do we have?
Can we do better?
You behave like a
vegan hipster
Vegan? Really? Why?
You: • tweeted with #meatlessmonday• follow @WholeFoods• …
We propose an alternative.
Why am I getting this?
We propose an alternative.
Why am I getting this?
You behave like a
Brooklyn hipster
Goal: Achieve transparency via interpretable user features, learned from user activity
You behave like a
Brooklyn hipster
Goal: Achieve transparency via interpretable user features, learned from user activity
Badges
10
Approach Model Experiments Summary
11
1. Define a vocabulary of badges
Apple fanboy
…
vegan runner photographer
Rich, interpretable and explainable
12
1. Define a vocabulary of badges
2. Identify exemplars
How do I find vegans?
observed label
Take advantage of how users describe themselves
14
Most vegans don’t label themselves as “vegan” on Twitter…
we want to infer the attributes of these users
15
1. Define a vocabulary of badges
2. Identify exemplars3. Model characteristic
behavior• Hashtags #meatlessmonday• Retweets RT @WholeFoods
16
Approach Model Experiments Summary
• We have no negative training examples.Use a generative model.
• Actions can be explained by multiple badges, even for the same user.
Noisy-or to combine badges.• How do we deal with user corrections?
Observing a latent variable.
Model sketch
18
i=1…B
B badges
19
u=1…N
i=1…B
N users
20
u=1…N
i=1…B
F actions j=1…F
j=1…F
21
bi(u)
u=1…N
i=1…BDoes user u have badge i?
j=1…F
j=1…F
22
bi(u) λi(u)
u=1…N
i=1…B
j=1…F
j=1…FDoes user u have label for
badge i in his profile?
23
aj(u)
bi(u) λi(u)
j=1…F u=1…N
i=1…B
Has user u performed action j?
j=1…F
24
sij
aj(u)
bi(u) λi(u)
j=1…F
j=1…F
u=1…N
i=1…B
Does badge i explain action j?
25
sijφij
aj(u)
bi(u) wi(u)
αφβφj=1…F
j=1…F
u=1…N
i=1…B
What’s the probability that a user with badge i performs action j?
26
sijφijφbg aj(u)
bi(u) wi(u)
αφβφj=1…F
j=1…F
u=1…N
i=1…B
What is the background probability for each action?
27
sijφijφbg aj(u)
bi(u) wi(u)
αφβφj=1…F
j=1…F
u=1…N
i=1…B
noisy or:Can at least one of my badges (or the background) explain it?
28
sijφijφbg aj(u)
bi(u) λi(u)
αφβφj=1…F
j=1…F
u=1…N
i=1…B
29
sijφijφbg aj(u)
bi(u) λi(u)
αφβφj=1…F
j=1…F
u=1…N
i=1…B
Beta priors to control sparsity
30
sijφijφbg aj(u)
bi(u) λi(u)
γiT γiF
αφβφ
αT βT αF βF
j=1…F
j=1…F
u=1…N
i=1…B
Beta prior to encode low recall (e.g., 10%)
Beta prior to encode high precision
(e.g., 99.9%)
31
ηisijφijφbg aj(u)
bi(u) λi(u)
γiT γiFωi
αφβφ
αη βη αω βω αT βT αF βF
j=1…F
j=1…F
u=1…N
i=1…B
32
• Collapsed Gibbs sampler (with MH steps)
Inference
sijφijφbg
bi(u)
33
ηisijφijφbg aj(u)
bi(u) λi(u)
γiT γiFωi
αφβφ
αη βη αω βω αT βT αF βF
j=1…F
j=1…F
u=1…N
i=1…BYou behave like a
vegan hipster.
34
ηisijφijφbg aj(u)
bi(u) λi(u)
γiT γiFωi
αφβφ
αη βη αω βω αT βT αF βF
j=1…F
j=1…F
u=1…N
i=1…BYou behave like a
vegan hipster.
35
Approach Model Experiments Summary
36
• Start with 7 million Twitter users• Manually define 31 sample badges
by specifying labels
Data description
• Start with 7 million Twitter users• Manually define 31 sample badges by
specifying labels• Gather 2 million tweets from August
2011• Recall: actions are hashtags and
retweets
Remove infrequent actions and inactive users, leaving us with:
75,880 users32,030 actions
Data description
38
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 310
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Chart Title
Badges
artist
photographer
country music fan
book worm
Badge statistics
39
Can we learn badges?
40
Vegetarian badge
41
Runner badge
42
Hacker badge
43
Manchester United badge
44
Do all badges look this good?
No, but most do.
45wine lover
Over-generalized
46
Overwhelmed
Ruby on Rails
47
Can we just use the labels directly?
48
Inferred Apple fanboy badge
Self-described Apple fanboys
49
• Compare to labeled LDA [Ramage+ 2009]– LDA extension where each document is
labeled with multiple tags– One-to-one mapping between topics and tags– Document explained only by topics
associated with its tags
• Hold out random 10% of labels, treat as ground truth, and try to predict them
Comparative Analysis
50
Rank of held-out labels be
tter
Better predictiveperformance
51
bett
erBetter predictions for active
users
52
Sparse badges
Apple fanboy (badges) Apple fanboy (l-lda)
53
Approach Model Experiments Summary
54
Leveraged how users describe themselves
55
Leveraged how users describe themselves to build interpretable user features You behave like a
vegan hipster
56
Empirically showed we can infer a user’s attributes from his behavior
57
谢谢
What recourse do we have?
Collaborative filtering
Content-based filtering
Can we do better?
59
Most vegans don’t label themselves as “vegan” on Twitter……but what about non-vegans?
“I drink too much and hate vegans.”
top related