thesis-presentation: tuenti engineering
DESCRIPTION
TRANSCRIPT
![Page 1: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/1.jpg)
Online recommendations using matrix factorisation
Marcus [email protected]
Royal Institute of Technology, Stockholm, SwedenInstituto Superior Técnico, Lisbon, Portugal
Universitat Politécnica de Catalunya, Barcelona, Spain
Thesis presentation
![Page 2: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/2.jpg)
40+ million videos
13+ million users
500 requests/second
306 years
![Page 3: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/3.jpg)
3 reasons:- find good content- improve user experience- increase revenue
great!
![Page 4: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/4.jpg)
![Page 5: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/5.jpg)
3 problems
![Page 6: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/6.jpg)
1: the data
![Page 7: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/7.jpg)
2: the model
![Page 8: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/8.jpg)
2: the model
![Page 9: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/9.jpg)
Why so little
systems research?
3: the system
![Page 10: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/10.jpg)
3: the system
![Page 11: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/11.jpg)
3 1 problem
![Page 12: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/12.jpg)
How do you serve recommendations from millions of items to millions of users?
Question:
![Page 13: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/13.jpg)
![Page 14: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/14.jpg)
2 4 4 ? 13 5 ? ? 1? 4 2 1 ?1 ? 1 3 3
Use
rsVideo ratings
![Page 15: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/15.jpg)
def matrix_factorization(MatrixToFactorise, UsersPreferences, MoviesFeatures, NumberOfLatentFeatures, MaxSteps=5000, LearningRate=0.0002, RegularizationConstant=0.02): MoviesFeatures = MoviesFeatures.T for step in xrange(MaxSteps): for user in xrange(len(MatrixToFactorise)): for movie in xrange(len(MatrixToFactorise[user])): if MatrixToFactorise[user][movie] > 0: estimatedUserMovieFactors = MatrixToFactorise[user][movie] - \ numpy.dot(UsersPreferences[user,:], MoviesFeatures[:,movie]) for feature in xrange(NumberOfLatentFeatures): UsersPreferences[user][feature] = UsersPreferences[user][feature] + \ LearningRate * (2 * estimatedUserMovieFactors * MoviesFeatures[feature][movie] - RegularizationConstant * UsersPreferences[user][feature]) MoviesFeatures[feature][movie] = MoviesFeatures[feature][movie] + \ LearningRate * (2 * estimatedUserMovieFactors * UsersPreferences[user][feature] - RegularizationConstant * MoviesFeatures[feature][movie]) # if approximation is good enough, stop iterating ApproximationError = calculate_mean_squared_error_of_estimate(MatrixToFactorise, UsersPreferences, MoviesFeatures, NumberOfLatentFeatures, RegularizationConstant) if ApproximationError < 0.001: break
Sorry about
the slide
![Page 16: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/16.jpg)
[ 1.52 -0.07 0.66 0.76 0.79] [ 0.79 0.63 0.08 0.9 1.46] [ 0.56 0.58 0.16 0.43 1.28] [-0.15 0.7 0.87 1.45 -0.3]
[ 0.38 0.91 0.32 0.36 1.22] [ 0.72 0.98 0.98 1.28 1.75] [ 1.54 -0.19 0.81 0.61 0.72] [ 0.22 0.61 0.95 1.18 -0.09] [-0.13 0.76 0.97 1.04 -0.26]
![Page 17: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/17.jpg)
[ 2.05 3.97 3.96 2.12 1.01] [ 2.93 5.02 3.21 1.61 0.98] [ 2.15 3.95 2.01 1.05 1.1 ] [ 1. 4.29 1.01 2.96 2.98]
![Page 18: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/18.jpg)
[ 2.05 3.97 3.96 2.12 1.01] [ 2.93 5.02 3.21 1.61 0.98] [ 2.15 3.95 2.01 1.05 1.1 ] [ 1. 4.29 1.01 2.96 2.98]
2 4 4 ? 13 5 ? ? 1? 4 2 1 ?1 ? 1 3 3
![Page 19: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/19.jpg)
[ 2.05 3.97 3.96 2.12 1.01] [ 2.93 5.02 3.21 1.61 0.98] [ 2.15 3.95 2.01 1.05 1.1 ] [ 1. 4.29 1.01 2.96 2.98]
![Page 20: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/20.jpg)
1 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 111 23 4 2 42 34 5 234 5 2 24 1 13 3 56 6 23 3 45 23 6 6 2 34 51 24 43 21 1 1 123 4 65 6 14 1 2 4 61 24 6 2 1 34 5 6 34 2 24 5 6 8 9 65 3 2 11
13x40 million ratings
![Page 21: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/21.jpg)
Clustering
![Page 22: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/22.jpg)
12
3
Millions of items
![Page 23: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/23.jpg)
12
3
![Page 24: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/24.jpg)
12
3
Recommendation Request
![Page 25: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/25.jpg)
12
3
Recommendation Request
![Page 26: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/26.jpg)
Compass = last video
![Page 27: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/27.jpg)
Interface Delegate Router
WorkersWorkers
Workers
merge / sort
start
request
routecompute
reply top N
top N to jsonstart
![Page 28: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/28.jpg)
Did it work?
![Page 29: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/29.jpg)
Results- ~600 requests per second- latency below 30 ms- quality is ok
![Page 30: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/30.jpg)
Results: Throughput
![Page 31: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/31.jpg)
Results: Throughput
huh?
![Page 32: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/32.jpg)
Interface Delegate Router
WorkersWorkers
Workers
merge / sort
start
request
routecompute
reply top N
top N to jsonstart
![Page 33: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/33.jpg)
Results: Quality
Queries Non-zero MAP
1 41 23%
2 87 25%
3 116 36%
4 165 58%
5 196 74%
![Page 34: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/34.jpg)
Summary- clustering is data- balanced clusters needed- scale is ok
![Page 35: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/35.jpg)
?
![Page 36: Thesis-presentation: Tuenti Engineering](https://reader034.vdocuments.net/reader034/viewer/2022042606/5481550ab4795969578b4882/html5/thumbnails/36.jpg)
Photos and imagery used in the presentation (except graphs and logos). Amazon recommendations: http://pleated-jeans.com/2010/08/06/amazon-recommendations-for-characters-from-the-office/Pile of books: http://www.paper-pills.com/category/gewgaws/page/2/Function: http://en.wikipedia.org/wiki/File:Graph_of_example_function.svgServer: http://arstechnica.com/gadgets/2007/08/windows-home-server-system-specs-prices-and-launch-date-leaked/Tick: http://ia.wikipedia.org/wiki/File:Tick_green_modern.svg Phone: http://www.foxbusiness.com/technology/2012/05/22/are-carrier-subsidies-hurting-innovation-and-driving-up-mobile-phone-costs/Man in front of computer: http://honesttogawd.blogspot.com.es/