item-based vs user-based collaborative recommendation ... · item-based vs user-based collaborative...

21
Item-based vs User-based Collaborative Recommendation Predictions Joel Azzopardi Department of Artificial Intelligence Faculty of ICT University of Malta [email protected] September 2017 Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 1 / 21

Upload: phungdieu

Post on 05-Aug-2019

225 views

Category:

Documents


0 download

TRANSCRIPT

Item-based vs User-based CollaborativeRecommendation Predictions

Joel Azzopardi

Department of Artificial IntelligenceFaculty of ICT

University of Malta

[email protected]

September 2017

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 1 / 21

Overview

1 The Problem

2 Background

3 Research Questions

4 Methodology

5 Evaluation

6 Conclusions

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 2 / 21

The Problem

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 3 / 21

The Problem

Information Overload

Information Retrieval – user ‘pulls’ relevant information aftersubmitting query.

Recommendation Systems – system ‘pushes’ relevant informationto the user based on user model.

Main Challenge: handling large amounts of data efficiently andeffectively.

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 4 / 21

Background

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 5 / 21

Recommendation Approaches

Content-based techniques – recommendation is performed on thebasis of similarity between the content of the different items(documents).

Need to extract features from the different items (documents).

Does not suffer from new user/item problem, and from sparse matrixproblem.

Suitable for items with high turn-over (e.g. news).

Collaborative techniques – recommendation is performed on thebasis of what other ‘similar’ users have found useful.

Does not use features from the items/documents.

Need to have substantial user-item rating overlap.

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 6 / 21

Collaborative Recommendation

More effective than content-based approaches.

Exploit the fact that humans enjoy sharing their opinions with others.

2 main types:

User-based – an item’s recommendation score for a user is calculateddepending on that items’ ratings by other similar users

Item-based – item’s rating is predicted based on how similar itemshave been rated by that user.

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 7 / 21

Research Questions

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 8 / 21

Research Questions

What will be the performance of an ensemble system combining bothuser-based and item-based approaches?

What is the effect of Latent Semantic Analysis (LSA) applied to thecollaborative recommendation algorithms?

What is the optimal neighbourhood size for the different collaborativerecommendation setups?

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 9 / 21

Latent Semantic Analyses

X = T · S ·DT

Figure 1 : Latent Semantic Analysis Process, from:http://www.slideshare.net/vitomirkovanovic/topic-modeling-for-learning-analytics-researchers-lak15-tutorial, September2016

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 10 / 21

Methodology

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 11 / 21

Collaborative Recommendation Algorithm

predictRating -SimUsers (UserSimMatrix , UserID, ItemID, k)CandidateRatings ← φSimUsers ← getSimilarUsers (UserSimMatrix , UserID)

curk ← 0

while (curk < k)user ← getNextMostSimilarUser (SimUsers)SimUserRating ← getUserItemRating (user , ItemID)

if (exists(SimUserRating ))updateCandidateRatings (CandidateRatings, SimUserRating ,

Similarity(user ,UserID))

k ← k + 1end if

end while

return (getHighestWeightedCandidate (CandidateRatings))end

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 12 / 21

Methodology

Algorithm is based on k Nearest Neighbours (kNN).

Votes are weighted according to neighbours’ similarities.

Use of:

User pair-wise similarity matrix in user-based recommendation.Item pair-wise similarity matrix in item-based recommendation.

In LSA, these similarity matrices are decomposed, and only the topdimensions are considered.

Ensemble algorithm:

Separate candidate user-item ratings are obtained from user-based anditem-based algorithms.Lists are merged together.Predicted recommendation score is set to the highest weightedcandidate score in the merged list.

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 13 / 21

Evaluation

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 14 / 21

Evaluation Dataset

MovieLens 1M dataset

1000209 ratings3883 movies6040 different users

Split into 80% / 20% for training and testing.

Training set consists of the oldest 80% ratings for each user.Rest into test set.

Metric used: Mean Average Error (MAE)

Neighbourhood sizes: 1, 2, 3, 6, 10, 20, 40, 80, 140, 200

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 15 / 21

System Configurations Evaluated

Algorithm Similar Similar Item LSA DimensionsIndex Items Users Category Used

1 X -2 X 3003 X -4 X 10005 X X -6 X X 3007 X -8 X X -9 X X 300

10 X X -11 X X 100012 X X X -13 X X X 300

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 16 / 21

Results

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 17 / 21

Conclusions

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 18 / 21

Comparison of the Different Setups

Item-based recommenders perform considerably better than theuser-based ones.

LSA has a beneficial effect on user-based recommendations, but anoverall negative effect on the item-based recommendations.

Ensemble system that uses LSA gives best (albeit slightly) resultsacross practically all neighbourhood sizes.

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 19 / 21

Optimal Neighbourhood Size

Optimal neighbourhood size seems to be around 40.

Item-based recommenders are most effective with a neighbourhoodsize of 40 with a slight deterioration of results for larger sizes.

Performance of user-based recommenders keeps improving (albeitvery slightly) as neighbourhood sizes are increased.

Ensemble algorithm that uses LSA obtains the best results with aneighbourhood size of 80, and results degrade slightly with largerneighbourhoods.

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 20 / 21

Future Work

Investigation of the different methods of how content-type featuresmay be incorporated in collaborative systems.

Recommendation over big-data: how to perform distributedrecommendation over multiple datasets and merging therecommendation scores.

Joel Azzopardi (University of Malta) IKC 2017, Gdansk, Poland September 2017 21 / 21