collaborative filtering for fun ...and profit!
DESCRIPTION
This is a fun presentation prepared for 30 high school-aged young ladies at the Girls Who Code summer camp. It's an introduction to concepts of collaborative filtering implemented in Python.TRANSCRIPT
Collaborative Filtering for fun Elissa Brown Erin Shellman Stella Rowlett
…and profit!💖 😹💻
How’s it done?‣ Collaborative filtering!
‣ Look for people who like the stuff you like, and recommend the things they’ve rated positively that you haven’t seen yet.
Leaf Shorts Floral ShortsElissa 5 5
Erin 2 5
Stella 4 1
Step 1: Collect ratings
1
2
3
4
5
2 3 4 5Leaf Shorts
Flor
al S
horts Person
ElissaErinStella
Step 2: Find someone similar for Mr. New Customer‣ A new customer wants to
buy a pair of shorts for his new “girlfriend.”
‣Which of us is most similar to Justin?
💋
💍
💘
💌
Finding the nearest neighbor‣Remember the
Pythagorean Theorem?
a2 + b2 = c2
c =p
a2 + b2p(x1 � x2)2 + (y1 � y2)2
p(1� 5)2 + (4� 5)2 = 4.12
1
2
3
4
5
1 2 3 4 5Leaf Shorts
Flor
al S
horts
PersonElissaErinJustinStella
What’s the distance between Justin and Stella?
1
2
3
4
5
1 2 3 4 5Leaf Shorts
Flor
al S
horts
PersonElissaErinJustinStella?
What’s the distance between Justin and Stella?
1
2
3
4
5
1 2 3 4 5Leaf Shorts
Flor
al S
horts
PersonElissaErinJustinStella
p(1� 4)2 + (1� 4)2 = 4.24
Meant to be!‣Justin and Erin are a
match made in heaven.
‣How can this help us shop for Justin’s girlfriend (not Erin)?p
(1� 2)2 + (4� 5)2 = 1.41
1
2
3
4
5
1 2 3 4 5Leaf Shorts
Flor
al S
horts
PersonElissaErinJustinStella
💘
Leaf Shorts Floral Shorts Jungle ShortsElissa 5 5 1
Erin 2 5 5
Stella 4 1 2
Justin 1 4 ?
Step 3: Make a suggestion!
Let’s go shopping!💃
How should we store our ratings data?
How should we store our ratings data?
A dictionary!
# Store user ratings user_ratings = {"Elissa": {"Leaf Shorts": 5, "Floral Shorts": 5}, "Erin": {"Leaf Shorts": 2, "Floral Shorts": 5}, "Stella": {"Leaf Shorts": 4, "Floral Shorts": 1} }
Python Warm-up!# Store user ratings user_ratings = {"Elissa": {"Leaf Shorts": 5, "Floral Shorts": 5}, "Erin": {"Leaf Shorts": 2, "Floral Shorts": 5}, "Stella": {"Leaf Shorts": 4, "Floral Shorts": 1} } !# Print Stella's ratings. print user_ratings['Stella'] !# What did Elissa think of the floral ones? print user_ratings['Elissa']['Leaf Shorts'] !def what_did_they_think(rating): if rating > 3: opinion = "LOVED IT!" else: opinion = "HATED IT!" return opinion !my_humble_opinion = what_did_they_think(user_ratings['Erin']['Leaf Shorts']) !print "Erin's review of the Leaf Shorts: " + my_humble_opinion !!
What functions do we need to build a recommender?
What functions do we need to build a recommender?1. Function to compute distances
2. Function to find nearby people
3. Function to recommend items I haven’t rated yet
Pseudocode: Function to compute distances
compute_distancedef compute_distance(user1_ratings, user2_ratings): """ This function computes the distance between two user's ratings. Both arguments should be dictionaries keyed on users, and items. """ distances = [] for key in user1_ratings: if key in user2_ratings: distances.append((user1_ratings[key] - user2_ratings[key]) ** 2) total_distance = round(sum(distances) ** 0.5, 2) return total_distance
Pseudocode: Function to find closest match
find_nearest_neighborsdef find_nearest_neighbors(username, user_ratings): """ Returns the list of neighbors, ordered by distance. Call like this: find_nearest_neighbors('Erin', user_ratings) """ distances = [] for user in user_ratings: if user != username: distance = compute_distance(user_ratings[user], user_ratings[username]) distances.append((distance, user)) distances.sort() return distances
Pseudocode: Function to make recommendations
get_recommendationsdef get_recommendations(username, user_ratings): """ Return a list of recommendations. """ nearest_users = find_nearest_neighbors(username, user_ratings) recommendations = [] ! # Input user's ratings ratings = user_ratings[username] ! for neighbor in nearest_users: neighbor_name = neighbor[1] for item in user_ratings[neighbor_name]: if not item in ratings: recommendations.append((item, user_ratings[neighbor_name][item])) ! return sorted(recommendations, key = lambda personTuple: personTuple[1], reverse = True)
Limitations‣ What happens if everyone rated the
same group of items?
‣ What if there’s no overlap?
‣ What about new users with no ratings?
Now what?‣Ask your classmates to rate their electives,
and make a recommender to help students pick their classes.
‣Poll recent graduates from your school about their college, and make a recommender to help students pick colleges.
https://github.com/erinshellman/girls-who-code-recommender
🐱