how to build a recommender system?

37

Upload: blueace

Post on 01-Nov-2014

41.695 views

Category:

Technology


0 download

DESCRIPTION

By Coen Stevens, Lead Recommendations Engineer at Wakoopa. Presented at http://recked.org

TRANSCRIPT

Page 1: How to build a recommender system?
Page 2: How to build a recommender system?

Coen StevensLead Recommendation Engineer

Page 3: How to build a recommender system?

How to build a recommender system?

Wakoopa use case

Page 4: How to build a recommender system?

Mission:Discover software & games

Page 5: How to build a recommender system?

MacWindows Linux

Software tracker

Page 6: How to build a recommender system?

Your profile

Page 7: How to build a recommender system?

Updates

Page 8: How to build a recommender system?

Software pages

Page 9: How to build a recommender system?

Recommendations

Page 10: How to build a recommender system?

Building a recommender systemApproach and challenges

Page 11: How to build a recommender system?

Data

(implicit) (explicit)

• Noisy

• Only positive feedback

• Easy to collect

• Accurate

• Positive and negative feedback

• Hard to collect

what do we have?

Usage Ratingsvs.

Page 12: How to build a recommender system?

Datawhat do we use?

• Active users (Tracker activity in the past month): ~9.000

• Actively used software items (in the past month): ~10.000

• We calculate recommendations for each OS together with Web applications separately

Page 13: How to build a recommender system?

Recommender system methods

• Item-based collaborative filtering

• User-based collaborative filtering (we only use for calculating user similarities to find people like you)

• Combining both methods

Collaborative recommendations: The user will be recommended items that people with similar tastes and preferences liked (used) in the past

Page 14: How to build a recommender system?

Item-Based Collaborative FilteringUser software usage matrix

220 90 180 22

280 12 42 80

175 210 210 45

165 35 195 13 25

100 50 185 35 190

60 65 185

Users

Software items

Page 15: How to build a recommender system?

User software usage matrix [0, 1]

1 1 0 1 0 1 0

1 1 1 0 1 0 0

1 1 0 1 0 1 0

1 0 1 1 1 1 0

0 1 1 1 0 1 1

0 1 0 1 0 0 1

Users

Software items

Page 16: How to build a recommender system?

How do we predict the probability that I would like to use GMail?

1 1 0 1 0 1 0

1 1 1 0 1 0 0

1 1 1 0 1 0

1 0 1 1 1 1 0

0 1 1 1 0 1 1

0 1 0 1 0 0 1

Users

Software items

?

Page 17: How to build a recommender system?

Calculate the similarities between Gmail and the other software items.

1 1 0 1 0 1 0

1 1 1 0 1 0 0

1 1 0 1 0 1 0

1 0 1 1 1 1 0

0 1 1 1 0 1 1

0 1 0 1 0 0 1

Users

Software items

Cosine Similarity(Firefox, Gmail)

Page 18: How to build a recommender system?

Calculate the similarities between Gmail and the other software items.

1 1 0 1 0 1 0

1 1 1 0 1 0 0

1 1 0 1 0 1 0

1 0 1 1 1 1 0

0 1 1 1 0 1 1

0 1 0 1 0 0 1

Users

Software items

Cosine Similarity(Firefox, Gmail)

Page 19: How to build a recommender system?

Calculate the similarities between Gmail and the other software items.

1 1 0 1 0 1 0

1 1 1 0 1 0 0

1 1 0 1 0 1 0

1 0 1 1 1 1 0

0 1 1 1 0 1 1

0 1 0 1 0 0 1

Users

Software items

Cosine Similarity(Firefox, Gmail)

Popularity correction, we put less trust

in popular software

Page 20: How to build a recommender system?

Item-item correlation matrix

1 0.1 0.6 0.1 0.1 0.1 0.7

0.2 1 0.8 0.5 0.8 0.1 0.9

0.1 0.6 1 0.5 0.7 0.2 0.3

0.2 0.6 0.4 1 0.8 0.2 0.3

0.5 0.4 0.4 0.4 1 0.1 0.2

0.5 0.5 0.3 0.5 0.3 1 0.3

0.2 0.6 0.3 0.8 0.7 0.7 1

Page 21: How to build a recommender system?

Item-item correlation matrix

1 0.1 0.6 0.1 0.1 0.1 0.7

0.2 1 0.8 0.5 0.8 0.1 0.9

0.1 0.6 1 0.5 0.7 0.2 0.3

0.2 0.6 0.4 1 0.8 0.2 0.3

0.5 0.4 0.4 0.4 1 0.1 0.2

0.5 0.5 0.3 0.5 0.3 1 0.3

0.2 0.6 0.3 0.8 0.7 0.7 1

0.6

0.8

0.4

0.4

0.3

0.3

Gmail similarities

Page 22: How to build a recommender system?

K-nearest neighbor approach

Gmail similarities

• Performance vs quality

• We take only the ‘K’ most similar items (say 4)

• Space complexity: O(m + Kn)

• Computational complexity: O(m + n²)

0.6

0.8

0.4

0.4

0.3

0.3

Gmail similarities

Page 23: How to build a recommender system?

1

1

1

1

Calculate the predicted value for Gmail

User usage

0.6

0.8

0.4

0.4

Gmail similarities

Page 24: How to build a recommender system?

0.9

0.8

0.6

0.2

Calculate the predicted value for Gmail

User usage

0.6

0.8

0.4

0.4

Gmail similarities

Usage correction, more usage results

in a higher score [0,1]

Page 25: How to build a recommender system?

(0.6 * 0.9) + (0.8 * 0.8) + (0.4 * 0.6)

0.6 + 0.8 + 0.4 + 0.4= 0.82

Gmail similarities User usage

Calculate the predicted value for Gmail

0.9

0.8

0.6

0.2

0.6

0.8

0.4

0.4

Page 26: How to build a recommender system?

(0.6 * 0.9) + (0.8 * 0.8) + (0.4 * 0.6)

0.6 + 0.8 + 0.4 + 0.4= 0.82

Gmail similarities User usage

Calculate the predicted value for Gmail

0.9

0.8

0.6

0.2

0.6

0.8

0.4

0.4

• User feedback

• Contacts usage

• Commercial vs Free

Page 27: How to build a recommender system?

Calculate all unknown values andshow the Top-N recommendations to each user

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1

Users

Software items

?

?? ? ?

??

???

??

??? ?

?

Page 28: How to build a recommender system?

ExplainabilityWhy did I get this recommendation?

• Overlap between the item’s (K) neighbors and your usage

Page 29: How to build a recommender system?

User-Based Collaborative Filtering

1 1 0 1 0 1 0

1 1 1 0 1 0 0

1 1 0 1 0 1 0

1 1 1 1 1 1 0

0 1 1 1 0 1 1

0 1 0 1 0 0 1

Cosine Similarity(Coen, Menno)

Finding people like you

Page 30: How to build a recommender system?

0.1 0.2 0 0.4 0 0.4 0

0.1 0.2 0.6 0 0.8 0 0

0.1 0.2 0 0.4 0 0.4 0

0.1 0.2 0.6 0.4 0.8 0.4 0

0 0.2 0.6 0.4 0 0.4 0.2

0 0.2 0 0.4 0 0 0.2

Cosine Similarity(Coen, Menno)

Applying inverse user frequency

log(n/ni): ni is the number of users that uses item i and n is the total number of users in the database

The fact that you both use Textmate tells you more than when you both use firefox

Page 31: How to build a recommender system?

0.1 0.2 0 0.4 0 0.4 0

0.1 0.2 0.6 0 0.8 0 0

0.1 0.2 0 0.4 0 0.4 0

0.1 0.2 0.6 0.4 0.8 0.4 0

0 0.2 0.6 0.4 0 0.4 0.2

0 0.2 0 0.4 0 0 0.2

Cosine Similarity(Coen, Menno)

Page 32: How to build a recommender system?

1 0.8 0.6 0.5 0.7 0.2

0.8 1 0.4 0.7 0.5 0.5

0.6 0.4 1 0.4 0.9 0.1

0.5 0.8 0.4 1 0.6 0.4

0.8 0.5 0.9 0.6 1 0.2

0.2 0.5 0.1 0.4 0.2 1

User-user correlation matrix

Page 33: How to build a recommender system?

Performancemeasure for success

• Cross-validation: Train-Test split (80-20)

• Precision and Recall:- precision = size(hit set) / size(total given recs) - recall = size(hit set) / size(test set)

• Root mean squared error (RMSE)

Page 34: How to build a recommender system?

Implementation

• Ruby Enterprise Edition (garbage collection)

• MySQL database

• Built our own c-libraries

• Amazon EC2: - Low cost- Flexibility- Ease of use

• Open source

Page 35: How to build a recommender system?

Future challenges

• What is the best algorithm for Wakoopa? (or you)

• Reducing space-time complexity (scalability):- Parallelization (Clojure)- Distributed computing (Hadoop)

Page 36: How to build a recommender system?
Page 37: How to build a recommender system?

1 evening, 3 speakers, 100 developerswww.recked.org