improving user experience in recommender systems

Improving user experience in

recommender systems

How latent feature diversification

can decrease choice difficulty and

improve choice satisfaction

Martijn Willemsen

Talk at Institute for Software

technology, Nov 3, 2015

Graz University of Technology

Information overload

Recommendation

Choice Overload in Recommenders

Recommenders reduce information

overload…

But large personalized sets cause choice

overload!

Top-N of all highly ranked items

What should I choose?

These are all very attractive!

Choice Overload

Seminal example of choice overload

Satisfaction decreases with larger sets as increased

attractiveness is counteracted by choice difficulty

More attractive

3% sales

Less attractive

30% salesHigher purchase

satisfaction

From Iyengar and Lepper (2000)

Choice Overload in Recommenders(Bollen, Knijnenburg, Willemsen & Graus, RecSys 2010)

perceived recommendation

varietyperceived recommendation

quality

Top-20vs Top-5 recommendations

movie

expertisechoice

satisfaction

choice

difficulty

+

+

+

+

-+

.401 (.189)p < .05

.170 (.069)p < .05

.449 (.072)p < .001

.346 (.125)p < .01

.445 (.102)p < .001

-.217 (.070)p < .005

Objective System Aspects (OSA)

Subjective System Aspects (SSA)

Experience (EXP)

Personal Characteristics (PC)

Interaction (INT)

Lin-20vs Top-5 recommendations

+

+ - +

.172 (.068)p < .05

.938 (.249)p < .001

-.540 (.196)p < .01

-.633 (.177)p < .001

.496 (.152)p < .005

-0.1

0

0.1

0.2

0.3

0.4

0.5

Top-5 Top-20 Lin-20

Choice satisfaction

A solution: diversification

Tradeoff between similarity and diversity (Smyth &

McClave, 2001) in finding relevant items

Diversification remedies high similarity of Top-N lists

But diversification reduces the overall quality (accuracy) of the list

Many studies only look at data not at real users

Simulated users interact more efficiently with a diverse set (Bridge &

Kelly 2006)

Some studies look at how actual users perceive and

evaluate diversity

Ziegler et al. (2005): diversification based on external ontology

Diversity reduced the accuracy of the recommendations

But coverage and satisfaction increased!

Goal of the current research

Extend existing work by:

Diversification based on psychological mechanisms

Test the algorithm with real users and measure their perceptions and

experiences with the algorithm

Outline of the talk

Our user-centric evaluation framework

Psychology behind choice overload

Latent Feature diversification algorithm

Two user studies to test our diversification

User-Centric Framework

Computers Scientists (and marketing researchers) like to study

behavior…. (they hate asking the user or just cannot (AB tests))


Psychologists and HCI people are mostly interested in experience…


Though it helps to triangulate experience with behavior…


Our framework adds the intermediate construct of perception that explains

why behavior and experiences changes due to our manipulations


And adds personal

and situational

characteristics

Relations modeled

using factor analysis

and SEM

Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., Newell, C. (2012). Explaining

the User Experience of Recommender Systems. User Modeling and User-Adapted

Interaction (UMUAI), vol 22, p. 441-504 http://bit.ly/umuai

Latent feature diversification

from Psy to CS

Joint work with Mark Graus and

Bart Knijnenburg

Psychology behind choice overload

More options provide more benefits in terms of finding the

right option…

…but result in higher costs/effort

Objective effort:

More comparisons required

Cognitive effort:

Increased potential regret

Larger expectations for larger

sets

Many tradeoffs

Research on Choice overload

Choice overload is not omnipresent

Meta-analysis (Scheibehenne et al., JCR 2010)

suggests an overall effect size of zero

Choice overload stronger when:

No strong prior preferences

Little difference in attractiveness items

Prior studies did not control for

the diversity of the item set

Can we reduce choice difficulty and overload by using

personalized diversified item sets?

While controlling for attractiveness…

Diversification and attractiveness

Camera:

Suppose Peter thinks

resolution (MP) and Zoom

are equally important

user vector shows preference

direction

Equi-preference line:

Set of equally attractive options

(orthogonal on user vector)

Diversify on the equi-

preference line!

Choice Difficulty and Diversity

Larger sets are often more difficult because of the

increased uniformity of these sets (Fasolo et al., 2009;

Reutskaja et al., 2009)

Larger item sets have many

similar options

small inter-product distances

and small tradeoffs

High density!

Choice Difficulty related to

lack of justification 0

5

10

15

20

0 10 20R

eso

luti

on

(MP

)

Zoom

High Densitysmall tradeoffs

Choice difficulty and trade-offs

As item sets become more diverse (less dense) tradeoff

size increases

Tradeoffs are effortful…

give up one aspect for another

But can be justified very easily!

0

5

10

15

20

0 10 20Reso

luti

on

(MP

)

Zoom

High Densitysmall tradeoffs

0

5

10

15

20

0 10 20Reso

luti

on

(MP

)

Zoom

Low Densitylarge tradeoffs

Double Mediation Model for difficulty(Scholten and Sherman, JEP:G 2006)

U-shaped relation between diversity and difficulty:

Choosing from uniform set is

hard to justify but has no

difficult tradeoffs

Choosing from a diverse set

encompasses difficult tradeoffs

but is easy to justify

Does this also apply to personalized item sets?

How can we apply this to recommender system

algorithms? What features to diversify on?D

ifficulty

diversity

uniform diverse

Features in Matrix Factorization

Latent Features as means

of diversification!

“Latent features are Preference

dimensions related to real world

concepts (e.g. Escapist/serious)”

(Koren, Bell and Volinsky,2009)

Users and items described as

vectors of latent features

Parallel to how choice sets are described in

MAUT(multi-attribute utility theory) in consumer

psychology

Explaining Matrix Factorization

Map users and items to a joint latent factor space of

dimensionality f

Each item is a vector qi

each user a vector pu

Predicted rating r of user u for item i:

How to find the dimensions?

SVD: singular value decomposition

u

T

iui pqr ˆ

Matrix Factorization algorithms

each user a vector pu Each item is a vector qi

Usu

al

Su

sp

ects

Tit

an

ic

Die

Hard

Go

dfa

ther

Jack

Dylan

Olivia

Mark

?

?

?

? ? ?

?

pu

Dim

1

Dim

2

Jack 3 -1

Dylan 1.4 .2

Olivia -2.5 -.8

Mark -2 -1.5

qi

Usu

al

Su

sp

ec

ts

Tit

an

ic

Die

Hard

Go

dfa

ther

Dim 1 1.6 -1 5 0.2

Dim 2 1 1 .3 -.2

Two-dimensional Latent feature space and

diversification

Jack

Mark

Olivia Dylan

Greedy Diversity Algorithm

10-dimensional MF model

Take personalized top-200

Low: closest to centroid

Greedy algorithm

Select K items with

highest inter-item distance

(using city-block)

Medium:

select maximally diverse from

100 items closest to centroid

High: from all items in top-200

The algorithm

Measure of density: AFSR

Density/tradeoffs on the features: capture how items are distributed

in the feature space.

Average Factor Score Range (AFSR) based on the density metric

used by Fasolo et al. (2009)

X is set of items i

D is number of features

Captures the distribution of items in the feature space and their

tradeoffs better than standard similarity measures

Selection of initial set

Top-200 was selected as a balanced initial set:

Large differences in AFSR scores for the 3 levels of diversification

High average predicted rating: 4.48

Range in predicted ratings (0.546) lower than error of MF model:

MAE = .656 (RMSE = .854)

Final check: How does attractiveness vary within

and between the sets?

more diverse sets are by nature more likely to capture high-ranked

items…

Set Size Diversification Rating AFSR

5

Low 4.505 0.295

Medium 4.529 0.634

High 4.561 1.210

20

Low 4.537 0.586

Medium 4.558 1.005

High 4.604 1.615

System characteristics

MF recommender based on MyMedia project

10M MovieLens dataset: movies from 1994

5.6M ratings for 70k users and 5.4k movies

RMSE of 0.854, MAE of 0.656

Movies shown with title and predicted rating:

hovering the mouse over the title reveals additional information:

short synopsis, cast, director and image

Study 1a: Check diversification algorithm

Does our diversification affects the subjective experiences

with the recommendations?

Do people perceive the diversity?

Does it affects attractiveness and choice difficulty?

Each participant inspects 3 lists (low, mid and high

diversification), order counterbalanced

No choices made from the list

Study 1a: design/procedure

Pre-questionnaire

Personal characteristics

Rating task to train the system (10 ratings)

Assess three lists of recommendations

Within subjects: low / mid / high diversification

Between subjects: number of items (5,10,15,20,25)

After each list we measured:

Perceived Diversity & Attractiveness

Expected Trade-Off Difficulty & Choice Difficulty

Study 1a: Participants and Manipulation

checks

97 Participants from an online database

Paid for participation

Mean age: 29 years, 52 females and 45 males

Low, medium and high diversification

differed in the feature score range

Average predicted ratings of the sets were not different!

diversity

Average AFSR

(SE)

Avg. Predicted

rating

(SE)

Low 0.959 (0.015) 4.486 (0.042)

Medium 1.273 (0.016) 4.486 (0.041)

High 1.744 (0.024) 4.527 (0.039)

Study 1a: Structural Equation Model

Study 1a: Structural Equation Model

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

low mid high

sta

nd

ard

ize

d s

co

re

diversification

attractiveness diversity

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

low mid high

sc

ale

dif

fere

nc

e

diversification

choice diff. tradeoff diff.

Study 1a: Conclusions & Discussion

Diversifying on latent features

Increases attractiveness/diversity

Reduces trade-off difficulty (high)

Reduces choice difficulty (linearly)

No evidence for U-shaped difficulty model

High diversity does not result in trade-off conflicts

(perhaps due to the nature of the domain/MF?)

No effect of number on items

Small sets benefit as much from diversification

Diversification on MF features seems

promising to increase attractiveness!

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

low mid high

sc

ale

dif

fere

nc

e

diversification

choice diff. tradeoff diff.

Study 1b: No choice satisfaction

In Study 1a, no actual choice was made

Explains limited effect of number of items

We could not measure choice satisfaction or justification-based

processes

Diversification and list length as two factors in a new

experiment with choice (and choice satisfaction)

Item size: 5, 10 and 20

Low and high diversity (no medium)

We expect choice difficulty to be more prominent for low

diversity sets

Study 1b: design/procedure

Pre-questionnaire

Personal characteristics


Choose one item from a list of recommendations

Between subjects: 2 levels (low / high diversification)

X 3 lengths (5, 10 or 20 items)

Afterwards we measured:

Perceived Diversity & Attractiveness

Choice Difficulty and Choice satisfaction


Paid for participation Mean age: 29Y, 41F, 46M

Questionnaire-items

Perceived recommendation diversity

5 items, e.g. “The list of movies was varied”

Perceived recommendation attractiveness

5 items, e.g. “The list of recommendations was attractive”

Choice satisfaction

6 items, e.g. “I think I would enjoy watching the chosen movie”

Choice difficulty

5 items, e.g.: “It was easy to select a movie”

Study 1b: Structural Equation Model

Study 1b: Structural Equation Model

-0.5

0.0

0.5

1.0

1.5

2.0

5 10 20

sta

nd

ard

ized

sco

relist length

Diversity

low diversityhigh diversity

0.0

0.5

1.0

1.5

2.0

2.5

5 10 20

sta

nd

ard

ize

d s

co

re

list length

Satisfactionlow diversity high diversity

Study 1b: Results

Long list is more difficult (cognitive and objective effort

(hovers)) but also more satisfying in itself

Diversity influences choice satisfaction in important ways

Diversity increases attractiveness and reduces difficulty

These can increase satisfaction (but only for short lists)

Our diversification improved the 5-item list

5 diverse items are as satisfactory as

10 or 20 items and less difficult!

Less effort needed (hovers)

Using latent feature diversification

one does not need long item lists…0.0

0.5

1.0

1.5

2.0

2.5

5 10 20

sta

nd

ard

ize

d s

co

re

list length

Satisfactionlow diversity high diversity

But…

Our studies show that low or high diversification from the

centroid of Top-200 works

However, these sets were optimized for diversity, not for prediction

accuracy

Most item sets thus not contain the best predicted items

So how does this compare to standard Top-N lists?

Slight modification of our algorithm: diversify starting from the best

predicted item in the top-N set (Top-1) rather than the centroid

Diversification and list length as two factors in a choice

overload experiment

list sizes: 5 and 20

Diversification: none (top 5/20), medium, high

Properties of the item sets

Set size diversification Avg. AFSR Avg. Rank

5

High (α = 1) 1.380 78.284

Medium (α = 0.3) 1.096 7.484

Top-N (α = 0) 0.774 3.000

20

High (α = 1) 1.793 89.380

Medium (α = 0.3) 1.486 17.849

Top-N (α = 0) 1.270 10.500

Algorithm balances between max diversity and highest rank

Every iteration: weigh (1-α) highest rank

against highest diversification (α)

α=0: top-N, α=1: max diverse.

(α=0.3 gives good medium diversity)

Design/procedure Study 2



Choose one item from a list of recommendations

Between subjects: 3 levels of diversification, 2 lengths

Afterwards we measured:

Perceptions: Perceived Diversity & Attractiveness

Experience: Choice Difficulty and Choice satisfaction

Behavior: total views / unique items considered

Structural Equation Model

Perceived Diversity & attractiveness

Perceived Diversity increases with

Diversification

Similarly for 5 and 20 items

Perc. Diversity increases attractiveness

Perceived difficulty goes down with

diversification

Perceived attractiveness goes up

with diversification

Diverse 5 item set excels…

Just as satisfying as 20 items

Less difficult to choose from

Less cognitive load…!

-0.5

0

0.5

1

1.5

none med high

sta

nd

ard

ize

d s

co

re

diversification

Perc. Diversity

5 items

20 items

-0.2

0

0.2

0.4

0.6

0.8

1

none med highsta

nd

ard

ize

d s

co

re

diversification

Choice Satisfaction

5 items

20 items

Choice Characteristics

Chosen option (mean and std. err)

SetDiversity

List Position Rating Rank

5 items

None (top 5) 3.60 (0.27) 4.51 (0.07) 3.60 (0.27)

Medium 4.41 (0.59) 4.41 (0.07) 14.52 (5.37)

High 4.19 (0.27) 4.30 (0.07) 77.59 (12.76)

20 items

None (top 20) 10.15 (0.92) 4.45 (0.05) 10.15 (0.92)

Medium 10.33 (1.18) 4.40 (0.08) 17.7 (2.68)

high 9.93 (1.07) 4.16 (0.07) 72.22 (11.84)

With higher diversity, no difference in position of chosen option

Resulting in less ‘optimal’ choice in terms of predicted rating

Without a reduction in choice satisfaction!

Conclusions

Reducing Choice difficulty and overload

Diversity reduces choice difficulty

Less uniform sets are easier to choose from

Latent feature diversification easy to implement

Diversity can improve choice satisfaction

Even when the diversified list has movies with lower

predicted ratings than standard top-N lists

No need for larger item sets

Offering personalized diversified small items sets might be the key

to help decision makers cope with too much choice!

Psychological theory can inform how to improve the

output of Recommender algorithms

What you should take away…

Psychological theory can inform new ways of diversifying

algorithm output or eliciting preferences

But also: working with recommenders and algorithms we could

enhance psychological theory: personalized item sets gives control

User-centric evaluation helps to assess the effectiveness

Lot of work… and we need user studies…

But linking subjective to objective measures might help future

studies that cannot do user studies

User-centric framework allows us to understand WHY

particular approaches work or not

Concept of mediation: user perception helps understanding..

Questions?

Contact:

Martijn Willemsen

@MCWillemsen

[email protected]

www.martijnwillemsen.nl

Thanks to my co-authors:

Mark Graus

Bart Knijnenburg

mailto:[email protected]

http://www.martijnwillemsen.nl/

improving user experience in recommender systems

Engineering