memory-based recommender systems : a comparative study
DESCRIPTION
CSCI 572 PROJECT RECOMPARATOR. Memory-Based Recommender Systems : A Comparative Study. Aaron John Mani Srinivas Ramani. Problem definition. This project is a comparative study of two movie recommendation systems based on collaborative filtering. User-User Rating vs Item-Item Rating - PowerPoint PPT PresentationTRANSCRIPT
Memory-Based Recommender Systems : A Comparative Study
Aaron John ManiSrinivas Ramani
CSCI 572PROJECT RECOMPARATOR
Problem definition
• This project is a comparative study of two movie recommendation systems based on collaborative filtering.
User-User Rating vs Item-Item Rating Slope-One algorithm - Prediction engine. Pearson’s Correlation – Calculate similarity of users/items Also compare against Netflix/IMDB recommendations The aim of the experiment is to study the accuracy of the
two algorithms when applied on the same dataset under similar conditions
2
S/W, Language used
3
S/W, Language PurposeNetFlix DatasetJava Main programming language
for similarity ranking and prediction engine
HTML/CSS/JavaScript Frond End/ GUIPerl Scraping/RegExMySQL Back End DatabaseShell/Ruby Scripts for importing/exporting
dataset
Plan of Action
4
SNo # Task Responsibility
CheckPoint(Week Ending)
1. Scripts to import/export Dataset
AJ 25th March
2. Similarity Ranking SR 1st April3. Prediction Engine AJ 1st April4. UI Design AJ 25th March5. Results Form SR 8th April6. Graphs/Metrics Data Plot AJ, SR 15th April7. NetFlix Scraping SR 8th April8. Unit/Incremental Testing,
QCAJ, SR 22nd April
Sample Screenshot[Recommendation Page]
5
Sample graphs showing the data you will collect and how it will be presented.
• Mean Absolute Error (MAE) – Sample error difference of approx.100 Users. This is a standard metric which is essentially used to measure how much deviation a particular algorithm will show against original ratings (blanked out for the test).
6User 1 User 2 User 3 User 4
0
1
2
3
4
5
Original DataSetUser-UserItem-Item
Users
Ratin
g
Sample graphs showing the data you will collect and how it will be presented.
• New User Problem – Conduct a survey among 10 human testers to gauge how relevant the top n predictions are compared to the selected movie and rate their accuracy on a scale of 1-10. These users will be new user rows in the User-Item Matrix with a single rating. The mean of this test data will provide a human perspective on the Precision of machine-generated suggestions for new users introduced into the system.
7
Human Users User-User Item-ItemUser 1 8 10User 2 6 4User 3 7 7User 4 8 5User 5 5 4User 6 7 7User 7 1 4User 8 4 3User 9 6 8
User 10 8 10
Sample graphs showing the data you will collect and how it will be presented.
• Average Precision Analysis – Create similar test conditions as before. Each human tester logs the relevancy of the top-n predictions of each algorithm to the selected movie. The average across each category of algorithms should provide some insight into the # of relevant predictions generated as compared to the total predictions generated.
8
Human Users PUser-User % PItem-Item %
User 1 0.8 0.1User 2 0.6 0.4User 3 0.7 0.7User 4 0.8 0.5User 5 0.5 0.4User 6 0.7 0.7User 7 0.1 0.4User 8 0.4 0.3User 9 0.6 0.8
User 10 0.8 0.9