[adma 2017] identification of grey sheep users by histogram intersection in recommender systems

23
Yong Zheng, Mayur Agnani, Mili Singh Illinois Institute of Technology Chicago, IL, 60616, USA 2017 International Conference on Advanced Data Mining and Applications, Singapore, Nov 5-6, 2017 Identification of Grey Sheep Users By Histogram Intersection In Recommender Systems

Upload: yong-zheng

Post on 22-Jan-2018

143 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Yong Zheng, Mayur Agnani, Mili Singh

Illinois Institute of TechnologyChicago, IL, 60616, USA

2017 International Conference on Advanced Data Mining and Applications, Singapore, Nov 5-6, 2017

Identification of Grey Sheep Users By Histogram Intersection In Recommender Systems

Agenda

• Background: Recommender Systems

• Grey Sheep Users In Collaborative Filtering

• Methodology and Solutions

• Experimental Results

• Conclusions and Future Work

2

Recommender System (RS)

• RS: item recommendations tailored to user tastes

3

Traditional Recommendation Algorithms

4

Content-Based Recommendation AlgorithmsThe user will be recommended items similar to the ones the user preferred in the past, such as book/movie recsys

Collaborative Filtering Based Recommendation AlgorithmsThe user will be recommended items that people with similar tastes and preferences liked in the past, e.g., movie recsys

Hybrid Recommendation AlgorithmsCombine content-based and collaborative filtering based algorithms to produce item recommendations.

Collaborative Filtering: Algorithms

5

User-Based KNN Collaborative Filtering (UBCF)Assumption: a user u’s rating on item t is similar to other users’ rating on item t, while this group of similar users is called user K-nearest neighbor

Pirates of the Caribbean 4

Kung Fu Panda 2 Harry Potter 6 Harry Potter 7

U1 4 4 1 2

U2 3 4 2 1

U3 2 2 4 4

U4 4 4 1 ?

Collaborative Filtering: Algorithms

6

User-Based KNN Collaborative Filtering (UBCF)

Pirates of the Caribbean 4

Kung Fu Panda 2 Harry Potter 6 Harry Potter 7

U1 4 4 1 2

U2 3 4 2 1

U3 2 2 4 4

U4 4 4 1 ?

a = the target useri = the target item

N = user neighborhoodu = a user neighbor in N

Collaborative Filtering: Algorithms

7

Popular Challenges in Collaborative Filtering

Data sparsity problems

Cold-start users or items

Grey-sheep users

Incorporate content into collaborative filtering

….

Grey Sheep Users

8

Definition 1 by Mark Claypool, et al., 1999

A group of users who neither agree nor disagree with any group of users. Therefore, they will not benefit from the user-based collaborative filtering technique

Clustering Technique by Ghazanfar, et al., 2011

Distribution of User Ratings by Gras, et al., 2016

Definition 2 by John McCrae, et al., 2004

White Sheep Users may have high correlations with other users; Black Sheep Users have very few or no correlating users; Grey Sheep Users own unusual tastes and low correlations with others

Distribution of User Similarities by Zheng, et al., 2017

Research Problem: Identifying Grey Sheep Users

9

Collaborative Filtering Other Algorithms

Proposed Solution

10

Approach Based on The Distribution of User Similarities

White Sheep Users: high correlations with other users

Black Sheep Users: very few or no correlating users

Grey Sheep Users: unusual tastes, low correlations with others

The Distribution of User-User Correlations or Similarities

Proposed Solution

11

Proposed Solution

Step 1, represent each user as distribution of user correlations

Step 2, select good and bad examples

Step 3, apply outlier detection on selected examples. Grey sheep users are the intersections of bad examples and identified outliers

Step4, examine the quality of identified grey sheep users

Proposed Solution

12

Step 1, Distribution Representations

We calculate user-user correlations by cosine similarity

Obtain the descriptive statistics of the distribution

Proposed Solution

13

Step 2, Example Selection

Good examples: high correlations and left-skewed

Bad examples: low correlations and right-skewed

Proposed Solution

14

Step 3, Outlier Detection by Local Outlier Factor (LOF)

LOF helps identify outliers by the local density

Observations with LOF > 1 will be considered as outliers

We set different threshold values to findthe optimal one for identifying grey sheep users, for example

LOF threshold = 1.0LOF threshold = 1.1LOF threshold = 1.2LOF threshold = ….

Proposed Solution

15

Step 4, Examine the quality of identified GS Users

The parameters in our solution

Example Selection

LOF threshold

Neighbor of neighborhood in LOF method

Our goals or examination criteria

To find as many GS users as possible

Recommendation by UBCF should be worse for GS users than non-GS users

Improved Approach

16

Drawback

Cosine similarities reply on co-ratings. If two users did not rate items in common, we are not able to measure their similarities

Improved Approach

We represent each user as its similarity distribution

The distribution can be represented by a histogram

The interaction of two histograms tells the user-user similarity

Experimental Setting

• Data: MovieLens 100K rating data

– 100K ratings

– 1K users

– 1.7K movies

– Each user has rated at least 20 movies

• Evaluation

– 80% as training, 20% as testing

– Mean absolute error, MAE, to eval rating predictions

17

Results

• Comparison of Different Approaches

18

Results

• Comparison of Recommendation Quality

19

Results

• Visualization of GS and Non-GS users

20

Conclusions

• We develop a novel approach to identify GS users by utilizing the definition related to the user-user correlations

• We propose to use histogram intersection to better measure user-user similarities

• Our approach is demonstrated to work better than others based on the MovieLens 100K data

21

Future Work

• Try it on other data sets

• Seek approaches to improve the recommendation performance for the group of Grey Sheep Users

22

Yong Zheng, Mayur Agnani, Mili Singh

Illinois Institute of TechnologyChicago, IL, 60616, USA

Identification of Grey Sheep Users By Histogram Intersection In Recommender Systems