building data pipelines for music recommendations at spotify
TRANSCRIPT
![Page 1: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/1.jpg)
October 17, 2015
Data Pipelines for Music Recommendations
@ Spotify
Vidhya Murali@vid052
![Page 2: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/2.jpg)
Vidhya Murali
Who Am I?
2
•Areas of Interest: Data & Machine Learning•Data Engineer @Spotify•Masters Student from the University of Wisconsin Madison
aka Happy Badger for life!
![Page 3: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/3.jpg)
“Torture the data, and it will confess!”
3
– Ronald Coase, Nobel Prize Laureate
![Page 4: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/4.jpg)
Spotify’s Big Data
4
•Started in 2006, now available in 58 countries• 70+ million active users, 20+ million paid subscribers• 30+ million songs in our catalog, ~20K added every day• 1.5 billion playlists so far and counting• 1 TB of user data logged every day• Hadoop cluster with 1500 nodes • ~20,000 Hadoop jobs per day
![Page 5: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/5.jpg)
Music Recommendations at Spotify
Features:DiscoverDiscover WeeklyMomentsRadioRelated Artists
5
![Page 6: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/6.jpg)
6
30 million tracks…What to recommend?
![Page 7: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/7.jpg)
Approaches 7
•Manual curation by Experts
•Editorial Tagging
•Metadata (e.g. Label provided data, NLP over News, Blogs)
•Audio Signals
•Collaborative Filtering Model
![Page 8: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/8.jpg)
Approaches 7
•Manual curation by Experts
•Editorial Tagging
•Metadata (e.g. Label provided data, NLP over News, Blogs)
•Audio Signals
•Collaborative Filtering Model
![Page 9: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/9.jpg)
Collaborative Filtering Model 8
•Find patterns from user’s past behavior to generate recommendations
•Domain independent
•Scalable
•Accuracy (Collaborative Model) >= Accuracy (Content Based Model)
![Page 10: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/10.jpg)
Definition of CF
9
Hey,I like tracks P, Q, R, S!
Well,I like tracks Q, R, S, T!
Then you should check out track P!
Nice! Btw try track T!
Legacy Slide of Erik Bernhardsson
![Page 11: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/11.jpg)
The YoLo Problem 10
![Page 12: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/12.jpg)
The YoLo Problem 10
•YoLo Problem: “You Only Listen Once” to judge recommendations•Goal: Predict if users will listen to new music (new to user)
![Page 13: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/13.jpg)
The YoLo Problem 10
•YoLo Problem: “You Only Listen Once” to judge recommendations•Goal: Predict if users will listen to new music (new to user)
•Challenges• Scale of catalog (30M songs + ~20K added every day)• Repeated consumption of music is not very uncommon• Music is niche• Music consumption is heavily influenced by user’s lifestyle
![Page 14: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/14.jpg)
The YoLo Problem 10
•YoLo Problem: “You Only Listen Once” to judge recommendations•Goal: Predict if users will listen to new music (new to user)
•Challenges• Scale of catalog (30M songs + ~20K added every day)• Repeated consumption of music is not very uncommon• Music is niche• Music consumption is heavily influenced by user’s lifestyle
• Input: Feedback is implicit through streaming behavior, collection adds, browse history, search history etc
![Page 15: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/15.jpg)
User Plays to Track Recs 11
![Page 16: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/16.jpg)
User Plays to Track Recs 11
1. Weighted play counts from logs
![Page 17: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/17.jpg)
User Plays to Track Recs 11
1. Weighted play counts from logs
2. Train Model using the input signals
![Page 18: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/18.jpg)
User Plays to Track Recs 11
1. Weighted play counts from logs
2. Train Model using the input signals
3. Generate recs from the trained model
![Page 19: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/19.jpg)
User Plays to Track Recs 11
1. Weighted play counts from logs
2. Train Model using the input signals
3. Generate recs from the trained model
4. Post process the recommendations
![Page 20: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/20.jpg)
12
Step 1: ETL of Logs
•Extract and transform the anonymized logs to training data set•Case: Logs -> (user, track, wt.count)
![Page 21: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/21.jpg)
Step 2: Construct Big Matrix! 13
Tracks(n)
Users(m)
Vidhya
Burn by Ellie Goulding
![Page 22: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/22.jpg)
Step 2: Construct Big Matrix! 13
Tracks(n)
Users(m)
Vidhya
Burn by Ellie Goulding
Order of 70M x 30M!
![Page 23: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/23.jpg)
Latent Factor Models 14
Vidhya Burn
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(tracks): f-dimensional vectors
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .m m
n
m n
![Page 24: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/24.jpg)
Latent Factor Models 14
Vidhya Burn
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(tracks): f-dimensional vectors
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .m m
n
m n
User Track Matrix: (m x n)
![Page 25: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/25.jpg)
Latent Factor Models 14
Vidhya Burn
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(tracks): f-dimensional vectors
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .m m
n
m n
User Vector Matrix: X: (m x f)
User Track Matrix: (m x n)
![Page 26: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/26.jpg)
Latent Factor Models 14
Vidhya Burn
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(tracks): f-dimensional vectors
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .m m
n
m n
User Vector Matrix: X: (m x f)
Track Vector Matrix: Y: (n x f)
User Track Matrix: (m x n)
![Page 27: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/27.jpg)
Latent Factor Models 14
Vidhya Burn
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(tracks): f-dimensional vectors
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .
(here, f = 2)
m m
n
m n
User Vector Matrix: X: (m x f)
Track Vector Matrix: Y: (n x f)
User Track Matrix: (m x n)
![Page 28: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/28.jpg)
Matrix Factorization using Implicit Feedback 15
![Page 29: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/29.jpg)
Matrix Factorization using Implicit Feedback
User Track Play Count Matrix
15
![Page 30: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/30.jpg)
Matrix Factorization using Implicit Feedback
User Track Play Count Matrix
User Track Preference
Matrix
Binary Label: 1 => played 0 => not played
15
![Page 31: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/31.jpg)
Matrix Factorization using Implicit Feedback
User Track Play Count Matrix
User Track Preference
Matrix
Binary Label: 1 => played 0 => not played
Weights Matrix
Weights based on play count and smoothing
15
![Page 32: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/32.jpg)
Equation(s) Alert!16
![Page 33: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/33.jpg)
Implicit Matrix Factorization 17
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
•Aggregate all (user, track) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
X YUsers
Tracks
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed track else 0• • = user latent factor vector• = item latent factor vectoryi
![Page 34: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/34.jpg)
Alternating Least Squares 18
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
X YUsers
Tracks
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed track else 0• • = user latent factor vector• = item latent factor vector
Fix tracks
•Aggregate all (user, track) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
yi
![Page 35: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/35.jpg)
19
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
X YUsers
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed track else 0• • = user latent factor vector• = item latent factor vector
Fix tracks
Solve for users
•Aggregate all (user, track) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
Alternating Least Squares
yi
Tracks
![Page 36: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/36.jpg)
20
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
X YUsers
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed track else 0• • = user latent factor vector• = item latent factor vector
Fix users
•Aggregate all (user, track) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
Alternating Least Squares
yi
Tracks
![Page 37: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/37.jpg)
21
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
X YUsers
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed track else 0• • = user latent factor vector• = item latent factor vector
Fix usersSolve for tracks
•Aggregate all (user, track) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
Alternating Least Squares
yi
Tracks
![Page 38: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/38.jpg)
22
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
X YUsers
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed track else 0• • = user latent factor vector• = item latent factor vector
Fix usersSolve for tracks
Repeat until convergence…
•Aggregate all (user, track) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
Alternating Least Squares
yi
Tracks
![Page 39: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/39.jpg)
23
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
X YUsers
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed track else 0• • = user latent factor vector• = item latent factor vector
Fix usersSolve for tracks
Repeat until convergence…
•Aggregate all (user, track) streams into a large matrix•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
Alternating Least Squares
yi
Tracks
![Page 40: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/40.jpg)
Vectors•“Compact” representation for users and items(tracks) in the same space
![Page 41: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/41.jpg)
Why Vectors? 25
•Vectors encode higher order dependencies
•Users and Items in the same vector space!•Use vector similarity to compute:•Item-Item similarities•User-Item recommendations
•Linear complexity: order of number of latent factors
•Easy to scale up
![Page 42: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/42.jpg)
26
• Compute track similarities and track recommendations for users as a similarity measure
Step 3: Compute Recs!
![Page 43: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/43.jpg)
• Euclidian Distance
• Cosine Similarity
• Pearson Correlation
26
• Compute track similarities and track recommendations for users as a similarity measure
Step 3: Compute Recs!
![Page 44: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/44.jpg)
Recommendations via Cosine Similarity 27
![Page 45: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/45.jpg)
Recommendations via Cosine Similarity 27
![Page 46: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/46.jpg)
28
Annoy
• 70 million users, at least 4 million tracks for candidates per user• Brute Force Approach: • O(70M x 4M x 10) ~= 0(3 peta-operations)!
• Approximate Nearest Neighbor Oh Yeah!
• Uses Local Sensitive Hashing
• Clone: https://github.com/spotify/annoy
![Page 47: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/47.jpg)
29
• Apply Filters• Interacted music• Holiday music anyone?
• Factor for:• Diversity• Freshness• Popularity• Demographics• Seasonality
Step 4: Post Processing
![Page 48: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/48.jpg)
30
70 Million users x 30 Million tracks. How to
scale?
![Page 49: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/49.jpg)
Matrix Factorization with MapReduce 31
Reduce stepMap step
u % K = 0i % L = 0
u % K = 0i % L = 1 ... u % K = 0
i % L = L-1
u % K = 1i % L = 0
u % K = 1i % L = 1 ... ...
... ... ... ...
u % K = K-1i % L = 0 ... ... u % K = K-1
i % L = L-1
item vectorsitem%L=0
item vectorsitem%L=1
item vectorsi % L = L-1
user vectorsu % K = 0
user vectorsu % K = 1
user vectorsu % K = K-1
all log entriesu % K = 1i % L = 1
u % K = 0
u % K = 1
u % K = K-1
•Split the matrix up into K x L blocks.•Each mapper gets a different block, sums up intermediate terms, then key by
user (or item) to reduce final user (or item) vector.
![Page 50: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/50.jpg)
Matrix Factorization with MapReduce 32
One map taskDistributed
cache:All user vectors where u % K = x
Distributed cache:
All item vectors where i % L = y
Mapper Emit contributions
Map input:tuples (u, i, count)
where u % K = x
andi % L = y
Reducer New vector!
• Input to Mapper is a list of (user, item, count) tuples– user modulo K is the same for all users in block– item modulo L is the same for all items in the block– Mapper aggregates intermediate contributions for each user (or item)
– Eg: K=4, Mapper #1 gets user 1, 5, 9, 13 etc– Reducer keys by user (or item), aggregates intermediate mapper sums and solves closed form for final user
(or item) vector
![Page 51: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/51.jpg)
Music Recommendations Data Flow
33
![Page 52: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/52.jpg)
34
![Page 53: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/53.jpg)
Source:
Revisiting YOLO!35
“You Only Listen Once to judge recommendations” problem
![Page 54: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/54.jpg)
Optimizing for the Yolo Problem
•OFFLINE TESTING: •Experts’ Inputs •Measure accuracy
•A/B TESTS: control vs a/b group. Some useful metrics we consider: •DAU / WAU / MAU•Retention•Session Length•Skip Rate
36
![Page 55: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/55.jpg)
Challenge Accepted!
•Cold start problem for both users and new music/upcoming artists: •Content based signals, real time recommendation
•Measuring recommendation quality:• A/B test metrics•Active forums for getting user feedback
•Scam Attacks:•Rule based model to detect scammers
•Humans choices are not always predictable: •Faith in humanity
37
![Page 56: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/56.jpg)
What Next?
•Personalize user experience on Spotify for every moment: •Right Now
•Recommend other media formats:•Podcasts•Video
•Power music recommendations on other platforms:•Google Now
38
![Page 57: Building Data Pipelines for Music Recommendations at Spotify](https://reader031.vdocuments.net/reader031/viewer/2022020113/58849b921a28ab26058b66b9/html5/thumbnails/57.jpg)
Join the Band!
We are hiring!
39