recommender systems - university of washington€¦ · challenges in recommender systems •...
TRANSCRIPT
Recommender Systems
Sewoong Oh
CSE/STAT 416University of Washington
Personalization is a successful use of learning from data
• Facebook advertisements from browsing history• Amazon, YouTube, Netflix recommendations from user
choices• Input: user preferences (or activities)• Goal: select (a small set of) items the user will like
• Challenge: sparsity• Key idea: collaborative filtering
a user might like something, if similar users liked it!2
User 1 5 3User 2 2 4User 3 3
3User 4 1User 5 4User 6 5 2
Challenges in recommender systems• Some feedback are implicit• explicit feedback: rating, purchase history, ranking• Implicit feedback: browsing history, TV viewing pattern• Implicit feedback requires pre-processing of data such as time
spent, clicked, interval, etc.
• We seek diversity which is not easy to imposebecause users are multifaceted • A person with Linear Algebra textbook does not need another
one, but Top-k recommendation might stick to k linear algebra textbooks
• We don’t want to recommend just Marvel movies
• Cold-start is hard• Recommendations for new user/movie with no data is hard• Need to use additional features/contexts (Netflix 20 questions)
!3
Challenges in recommender systems• Interests change over time, but dynamic models are hard to train• Users preferences change over time• Movies perception changes over time
• Given millions of users and hundreds of thousands of movies, we need a scalable (i.e. fast) algorithm• We need to exploit that data is sparse
!4
Approach 0: popularity• No personalization• Netflix: trending now (average number of viewers)• NY times: popular article (average views)
!5
Approach 1: classifier• Train a classifier on
x = (user features and movie features) y = liked (+1) or not (-1)
• Output: +1 (recommend) or -1 (do not recommend)• Pros• personalized• flexible to include additional features like time
• Cons• Useful features are hard to get• Empirical performance not as good as Collaborative Filtering
Approach 2: co-occurrences• “People who bought X also bought …”• Construct a normalized co-occurrence matrix C
where both rows and columns indicate items
• This is a symmetric matrix:
• For a user who bought {milk, diapers}
• If user bought similar items, then give more score for “baby wipes”
!6
Cij =# of people who bought i and j# of people who bought i or j
<latexit sha1_base64="Czz1B0BOcts+HhAug96q6JGPn20=">AAACUXicjVE9bxNBEB0fHwkOHwZKmhE+JCrrzimgiRSRhjJIOInks6y99Zxvk73d0+4cYJ3uL1JAxf+goQCx/iggoeBVT++90cy+zWutPCfJt1506/adu3v79/oH9x88fDR4/OTM28ZJmkirrbvIhSetDE1YsaaL2pGock3n+dXJ2j//QM4ra97zqqZZJZZGFUoKDtJ8UMYn81ZddniEWeGEbDOmT9xmQ7QF1mRrTfixtJjbZlkyxipGYRYYX8Zd9x9Z67bRuD8fDJNRsgHeJOmODGGH0/ngS7awsqnIsNTC+2ma1DxrhWMlNXX9rPFUC3klljQN1IiK/KzdNNLhi6AssAjrC2sYN+qfE62ovF9VeUhWgkt/3VuL//KmDRevZ60ydcNk5HZR0Whki+t6caEcSdarQIR0KtyKshShWA6fsC4hvf7km+RsPEoPR+N34+Hxm10d+/AMnsNLSOEVHMNbOIUJSPgM3+En/Op97f2IIIq20ai3m3kKfyE6+A1E0rEQ</latexit>
Cij = Cji<latexit sha1_base64="LVLpHGMMy8+TiVLgtUik9Y0JlTA=">AAAB+3icbZDLTsJAFIZP8YZ4q7h0MxFMXJEWF7oxIbJxiYlcEmia6TDAwPSSmamRNH0VNy40xq0v4s63cQpdKPgnk3z5zzk5Z34v4kwqy/o2ChubW9s7xd3S3v7B4ZF5XO7IMBaEtknIQ9HzsKScBbStmOK0FwmKfY/TrjdrZvXuIxWShcGDmkfU8fE4YCNGsNKWa5arTTdh0xTdIA1TllZLrlmxatZCaB3sHCqQq+WaX4NhSGKfBopwLGXftiLlJFgoRjhNS4NY0giTGR7TvsYA+1Q6yeL2FJ1rZ4hGodAvUGjh/p5IsC/l3Pd0p4/VRK7WMvO/Wj9Wo2snYUEUKxqQ5aJRzJEKURYEGjJBieJzDZgIpm9FZIIFJkrHlYVgr355HTr1mn1Zq9/XK43bPI4inMIZXIANV9CAO2hBGwg8wTO8wpuRGi/Gu/GxbC0Y+cwJ/JHx+QPjfZMK</latexit>
Score(baby wipes, user) = Cbaby wipes, milk+Cbaby wipes, diapers
2<latexit sha1_base64="f7SJb3dZpJm1mgFR/fkavzQZqzM=">AAACWXicbVBNTxsxFPRu+QihH6EcuViESlRF0W56gAsSgkuPVBBAykaR13kLVuxdy34LRJb/ZA+VUP9KD/WGHEpgJEujmffkN5NrKSwmyVMUv1tZXVtvbbQ333/4+Kmz9fnKVrXhMOCVrMxNzixIUcIABUq40QaYyiVc59Ozxr++B2NFVV7iTMNIsdtSFIIzDNK4o/dcZhS94JUBv58hPKLLWT6jD0KD9Qe1BfOV0mOaFYZxdzZ2yzMHVAk59Z5+o2+6E8F0OMB77/p+rz3udJNeMgd9TdIF6ZIFzsedX9mk4rWCErlk1g7TROPIMYOCS/DtLJyoGZ+yWxgGWjIFduTmzXj6JSgTWlQmvBLpXP1/wzFl7UzlYVIxvLPLXiO+5Q1rLI5GTpS6Rij580dFLSlWtKk5pDbAUc6a+NyIcCvldyxUiKGLpoR0OfJrctXvpd97/Z/97snpoo4W2SG7ZJ+k5JCckB/knAwIJ7/J32g1Wov+xFHcitvPo3G02NkmLxBv/wPXR7Rw</latexit>
Approach 3: matrix factorization• Movie recommendations• Users watch movies and give ratings• But each user only rates a few
!7
User Movie Rating
User 1 5 3User 2 2 4User 3 3User 4 1User 5 4User 6 5 2
Input Data
Input Data in a matrix form
Xij known for black cellsXij unknown for white cells
Rows index moviesColumns index users
X =
Matrix completion problem
• Black cells indicate Rating(user,movie) known• White cells indicate Rating(user,movie) unknown• Each cell has values in {1,2,3,4,5}• Goal: predict missing entries
!8
Xij known for black cellsXij unknown for white cells
Rows index moviesColumns index users
X =Rating
Premise: Suppose we have d types of movies
• We can describe each movie v with feature vector Rv
• How much is the movie action, romance, drama, …• Rv = [0.3, 0.01, 1.5, … ]
• We can describe a user u with feature vector Lu• How much she likes action, romance, drama, … • Lu = [2.3, 0 , 0.7, … ]
• Perhaps we can find such features that the rating can be predicted as the inner product of those two vectors
• Rating(u,v) = 0.3*2.3 + 0.01*0 + 1.5*0.7 + …
• This allows you to predict how a user will rate a movie, that she has not seen yet
!9
Product recommendations• Suppose the following features have been learned,
which movie should we recommend to user #3?
• Such prediction can be computed for all (user,movie) pairs• And be written in a matrix form:
!10
User ID Feature vector1 (2, 0)
2 (1, 1)3 (0, 1)4 (2, 1)
Movie ID Feature vector1 (3, 1)2 (1, 2)3 (2, 1)
Call this 4x2 matrix L
Call this 3x2 matrix R
6 2 4
4 3 3
1 2 1
7 4 5
=<latexit sha1_base64="qH777iEVKmSp2MtgcTQAXCUUeOo=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFhPBKtzFQhshaGMZwcRAcoS9zVyyZHfv2N0TQshfsLFQxNY/ZOe/8S65QhMfDDzem2FmXhALbqzrfjuFtfWNza3idmlnd2//oHx41DZRohm2WCQi3QmoQcEVtiy3AjuxRioDgY/B+DbzH59QGx6pBzuJ0Zd0qHjIGbWZVL2ulvrliltz5yCrxMtJBXI0++Wv3iBiiURlmaDGdD03tv6UasuZwFmplxiMKRvTIXZTqqhE40/nt87IWaoMSBjptJQlc/X3xJRKYyYySDsltSOz7GXif143seGVP+UqTiwqtlgUJoLYiGSPkwHXyKyYpIQyzdNbCRtRTZlN48lC8JZfXiXtes27qNXv65XGTR5HEU7gFM7Bg0towB00oQUMRvAMr/DmSOfFeXc+Fq0FJ585hj9wPn8AeY6NMw==</latexit>
2 0
1 1
0 1
2 1
3 1 21 2 1
RT<latexit sha1_base64="Mo4Ybb/8OMitQyzbTfyRh4iDllk=">AAAB7XicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWKLhK4GT7C17sLK3e9ndMyGE/2BjoTG2/h87/417cIWCL5nk5b2ZzMwLYs60cd1vJ7e2vrG5ld8u7Ozu7R8UD49aWiaK0CaRXKpOgDXlTNCmYYbTTqwojgJO28H4JvXbT1RpJkXDTGLqR3goWMgINlZqle8fGuVCv1hyK+4caJV4GSlBhnq/+NUbSJJEVBjCsdZdz42NP8XKMMLprNBLNI0xGeMh7VoqcES1P51fO0NnVhmgUCpbwqC5+ntiiiOtJ1FgOyNsRnrZS8X/vG5iwit/ykScGCrIYlGYcGQkSl9HA6YoMXxiCSaK2VsRGWGFibEBpSF4yy+vkla14l1UqnfVUu06iyMPJ3AK5+DBJdTgFurQBAKP8Ayv8OZI58V5dz4WrTknmzmGP3A+fwDz544O</latexit>
L<latexit sha1_base64="Sr/FWvmpJnks64L2fkR0bJw3Wng=">AAAB63icbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWFhgIkgCF7K37MGG3b3L7pwJIfwFGwuNsfUP2flvvIMrFHzJJC/vzWRmXhBLYdF1v53C2vrG5lZxu7Szu7d/UD48atsoMYy3WCQj0wmo5VJo3kKBkndiw6kKJH8MxjeZ//jEjRWRfsBJzH1Fh1qEglHMpOpdtdQvV9yaOwdZJV5OKpCj2S9/9QYRSxTXyCS1tuu5MfpTalAwyWelXmJ5TNmYDnk3pZoqbv3p/NYZOUuVAQkjk5ZGMld/T0ypsnaigrRTURzZZS8T//O6CYZX/lToOEGu2WJRmEiCEckeJwNhOEM5SQllRqS3EjaihjJM48lC8JZfXiXtes27qNXv65XGdR5HEU7gFM7Bg0towC00oQUMRvAMr/DmKOfFeXc+Fq0FJ585hj9wPn8AkGiNQg==</latexit>
Predictions in a matrix form
• Ratings matrix is the product of L and R: user feature matrix and movie feature matrix
• How do we learn the feature matrices from data?• When we have all the ratings, then it is easy• PCA gives optimal factorization L and R
in terms of reconstruction error• This automatically discovers the right topics from data
• But, if we have all the ratings, we don’t need to predict anything!11
≈X LR’
=Xij known for black cells
Xij unknown for white cellsRows index movies
Columns index usersX =Rating
Matrix factorizations are not unique• Let’s say we have an exact factorization M = L*RT
• There are infinitely many factorizations, which give the exactly same M • For example, we can scale up the user features
and scale down the movie ones, so that the ratings do not change
• Precisely, for any invertible matrix Q, (LQ,RQ-T) give the same matrix as (L,R) since LQ*(RQ-T)T = LQ*Q-TR = LR=M
!12
6 2 44 3 31 2 17 4 5
=<latexit sha1_base64="qH777iEVKmSp2MtgcTQAXCUUeOo=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFhPBKtzFQhshaGMZwcRAcoS9zVyyZHfv2N0TQshfsLFQxNY/ZOe/8S65QhMfDDzem2FmXhALbqzrfjuFtfWNza3idmlnd2//oHx41DZRohm2WCQi3QmoQcEVtiy3AjuxRioDgY/B+DbzH59QGx6pBzuJ0Zd0qHjIGbWZVL2ulvrliltz5yCrxMtJBXI0++Wv3iBiiURlmaDGdD03tv6UasuZwFmplxiMKRvTIXZTqqhE40/nt87IWaoMSBjptJQlc/X3xJRKYyYySDsltSOz7GXif143seGVP+UqTiwqtlgUJoLYiGSPkwHXyKyYpIQyzdNbCRtRTZlN48lC8JZfXiXtes27qNXv65XGTR5HEU7gFM7Bg0towB00oQUMRvAMr/DmSOfFeXc+Fq0FJ585hj9wPn8AeY6NMw==</latexit>
2 01 10 12 1
3 1 21 2 1
RT<latexit sha1_base64="Mo4Ybb/8OMitQyzbTfyRh4iDllk=">AAAB7XicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWKLhK4GT7C17sLK3e9ndMyGE/2BjoTG2/h87/417cIWCL5nk5b2ZzMwLYs60cd1vJ7e2vrG5ld8u7Ozu7R8UD49aWiaK0CaRXKpOgDXlTNCmYYbTTqwojgJO28H4JvXbT1RpJkXDTGLqR3goWMgINlZqle8fGuVCv1hyK+4caJV4GSlBhnq/+NUbSJJEVBjCsdZdz42NP8XKMMLprNBLNI0xGeMh7VoqcES1P51fO0NnVhmgUCpbwqC5+ntiiiOtJ1FgOyNsRnrZS8X/vG5iwit/ykScGCrIYlGYcGQkSl9HA6YoMXxiCSaK2VsRGWGFibEBpSF4yy+vkla14l1UqnfVUu06iyMPJ3AK5+DBJdTgFurQBAKP8Ayv8OZI58V5dz4WrTknmzmGP3A+fwDz544O</latexit>
L<latexit sha1_base64="Sr/FWvmpJnks64L2fkR0bJw3Wng=">AAAB63icbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWFhgIkgCF7K37MGG3b3L7pwJIfwFGwuNsfUP2flvvIMrFHzJJC/vzWRmXhBLYdF1v53C2vrG5lZxu7Szu7d/UD48atsoMYy3WCQj0wmo5VJo3kKBkndiw6kKJH8MxjeZ//jEjRWRfsBJzH1Fh1qEglHMpOpdtdQvV9yaOwdZJV5OKpCj2S9/9QYRSxTXyCS1tuu5MfpTalAwyWelXmJ5TNmYDnk3pZoqbv3p/NYZOUuVAQkjk5ZGMld/T0ypsnaigrRTURzZZS8T//O6CYZX/lToOEGu2WJRmEiCEckeJwNhOEM5SQllRqS3EjaihjJM48lC8JZfXiXtes27qNXv65XGdR5HEU7gFM7Bg0towC00oQUMRvAMr/DmKOfFeXc+Fq0FJ585hj9wPn8AkGiNQg==</latexit>
M
6 2 44 3 31 2 17 4 5
=<latexit sha1_base64="qH777iEVKmSp2MtgcTQAXCUUeOo=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFhPBKtzFQhshaGMZwcRAcoS9zVyyZHfv2N0TQshfsLFQxNY/ZOe/8S65QhMfDDzem2FmXhALbqzrfjuFtfWNza3idmlnd2//oHx41DZRohm2WCQi3QmoQcEVtiy3AjuxRioDgY/B+DbzH59QGx6pBzuJ0Zd0qHjIGbWZVL2ulvrliltz5yCrxMtJBXI0++Wv3iBiiURlmaDGdD03tv6UasuZwFmplxiMKRvTIXZTqqhE40/nt87IWaoMSBjptJQlc/X3xJRKYyYySDsltSOz7GXif143seGVP+UqTiwqtlgUJoLYiGSPkwHXyKyYpIQyzdNbCRtRTZlN48lC8JZfXiXtes27qNXv65XGTR5HEU7gFM7Bg0towB00oQUMRvAMr/DmSOfFeXc+Fq0FJ585hj9wPn8AeY6NMw==</latexit>
4 02 20 24 2
1.5 0.5 1.00.5 1.0 0.5
RT<latexit sha1_base64="Mo4Ybb/8OMitQyzbTfyRh4iDllk=">AAAB7XicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWKLhK4GT7C17sLK3e9ndMyGE/2BjoTG2/h87/417cIWCL5nk5b2ZzMwLYs60cd1vJ7e2vrG5ld8u7Ozu7R8UD49aWiaK0CaRXKpOgDXlTNCmYYbTTqwojgJO28H4JvXbT1RpJkXDTGLqR3goWMgINlZqle8fGuVCv1hyK+4caJV4GSlBhnq/+NUbSJJEVBjCsdZdz42NP8XKMMLprNBLNI0xGeMh7VoqcES1P51fO0NnVhmgUCpbwqC5+ntiiiOtJ1FgOyNsRnrZS8X/vG5iwit/ykScGCrIYlGYcGQkSl9HA6YoMXxiCSaK2VsRGWGFibEBpSF4yy+vkla14l1UqnfVUu06iyMPJ3AK5+DBJdTgFurQBAKP8Ayv8OZI58V5dz4WrTknmzmGP3A+fwDz544O</latexit>
L<latexit sha1_base64="Sr/FWvmpJnks64L2fkR0bJw3Wng=">AAAB63icbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWFhgIkgCF7K37MGG3b3L7pwJIfwFGwuNsfUP2flvvIMrFHzJJC/vzWRmXhBLYdF1v53C2vrG5lZxu7Szu7d/UD48atsoMYy3WCQj0wmo5VJo3kKBkndiw6kKJH8MxjeZ//jEjRWRfsBJzH1Fh1qEglHMpOpdtdQvV9yaOwdZJV5OKpCj2S9/9QYRSxTXyCS1tuu5MfpTalAwyWelXmJ5TNmYDnk3pZoqbv3p/NYZOUuVAQkjk5ZGMld/T0ypsnaigrRTURzZZS8T//O6CYZX/lToOEGu2WJRmEiCEckeJwNhOEM5SQllRqS3EjaihjJM48lC8JZfXiXtes27qNXv65XGdR5HEU7gFM7Bg0towC00oQUMRvAMr/DmKOfFeXc+Fq0FJ585hj9wPn8AkGiNQg==</latexit>M
From factorization to Matrix completion
• In reality, we only have partial observations of the ratings matrix• We fit the best L and R, to the observed ratings• There has been many efficient algorithms to find factorization based on
partial observations, a.k.a. matrix completion problem
• We suppose there are m movies and n users, and k topics, and the ground truth matrix M is generated by M=L0*R0T for the form above
• No matter how many entries I observed, there are multiple choices of parameters (L,R) that will match all the entries• because, if (L,R) matches the entries, so does (LQ,RQ-T)!13
≈Xij known for black cells
Xij unknown for white cellsRows index movies
Columns index usersX =Rating
!14
• But when can we solve this problem?• That is how many entries do we need to see,
in order for our prediction to be accurate?
• One extreme: suppose we observe all entries, then• Any factorization methods like singular value decomposition (SVD)
will provide (one of the) correct factorizations• And, this correct, i.e. resulting M = LRT
• Another extreme: suppose we observe one entry, then• It is easy to match the entry observed• But, most likely this is incorrect on the missing entries, i.e. M != LRT
Number of observed entries
Difficulty in finding (L,R) that matches
the observed entries
=<latexit sha1_base64="qH777iEVKmSp2MtgcTQAXCUUeOo=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFhPBKtzFQhshaGMZwcRAcoS9zVyyZHfv2N0TQshfsLFQxNY/ZOe/8S65QhMfDDzem2FmXhALbqzrfjuFtfWNza3idmlnd2//oHx41DZRohm2WCQi3QmoQcEVtiy3AjuxRioDgY/B+DbzH59QGx6pBzuJ0Zd0qHjIGbWZVL2ulvrliltz5yCrxMtJBXI0++Wv3iBiiURlmaDGdD03tv6UasuZwFmplxiMKRvTIXZTqqhE40/nt87IWaoMSBjptJQlc/X3xJRKYyYySDsltSOz7GXif143seGVP+UqTiwqtlgUJoLYiGSPkwHXyKyYpIQyzdNbCRtRTZlN48lC8JZfXiXtes27qNXv65XGTR5HEU7gFM7Bg0towB00oQUMRvAMr/DmSOfFeXc+Fq0FJ585hj9wPn8AeY6NMw==</latexit>
6 2 44 3 31 2 17 4 5
=<latexit sha1_base64="qH777iEVKmSp2MtgcTQAXCUUeOo=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFhPBKtzFQhshaGMZwcRAcoS9zVyyZHfv2N0TQshfsLFQxNY/ZOe/8S65QhMfDDzem2FmXhALbqzrfjuFtfWNza3idmlnd2//oHx41DZRohm2WCQi3QmoQcEVtiy3AjuxRioDgY/B+DbzH59QGx6pBzuJ0Zd0qHjIGbWZVL2ulvrliltz5yCrxMtJBXI0++Wv3iBiiURlmaDGdD03tv6UasuZwFmplxiMKRvTIXZTqqhE40/nt87IWaoMSBjptJQlc/X3xJRKYyYySDsltSOz7GXif143seGVP+UqTiwqtlgUJoLYiGSPkwHXyKyYpIQyzdNbCRtRTZlN48lC8JZfXiXtes27qNXv65XGTR5HEU7gFM7Bg0towB00oQUMRvAMr/DmSOfFeXc+Fq0FJ585hj9wPn8AeY6NMw==</latexit>
2
!15
From factorization to Matrix completion
≈Xij known for black cells
Xij unknown for white cellsRows index movies
Columns index usersX =Rating
• If there are m movies and n users, and k topics, then how many parameters do we have in our factorization model L and R? degrees-of-freedom = k*m+k*n
• This is also sometimes called the degrees-of-freedom in the problem.
• How many entries do we observe if we have the full matrix?
• How many entries do you think we need, to accurately reconstruct the ground truth L and R that generated the data?
m
nk
k
Algorithmic solution for matrix completion
• How do we write a program to find (L,R) matching the observed entries?
• Machine learning approach:• Write a loss function and minimize
• Coordinate descent us popular in solving this optimization
!16
≈Xij known for black cells
Xij unknown for white cellsRows index movies
Columns index usersX =Rating
minimizeL,R
X
u,v:ruv 6=?
⇣(LRT )uv| {z }
LTuRv
�ruv⌘2
<latexit sha1_base64="05qAGFgy739y5ZkfOHeXRYxBHzw=">AAACUXicbVFNbxMxFHxZCpQUaEqPXKxGSKlUot2ABOqFqlw49NBGSVspm6y8zkti1fYu/ohIV/sXeygn/gcXDqA66R76wUiWR/NmZHuc5oIbG4a/asGTtafPnq+/qG+8fPV6s7H15tRkTjPss0xk+jylBgVX2LfcCjzPNVKZCjxLL74u52dz1IZnqmcXOQ4lnSo+4YxaLyWNWWzxhy0kV1zySyyT4mivW5LYOJkUbm++r/02L2OF38kXrx/yaYvETo1Rp5oyLFpH3VFvd2VahhM36pFuMi/Je1JFfWR31EkazbAdrkAek6giTahwnDSu43HGnERlmaDGDKIwt8OCasuZwLIeO4M5ZRd0igNPFZVohsWqkZK888qYTDLtl7Jkpd5NFFQas5Cpd0pqZ+bhbCn+bzZwdvJ5WHCVO4uK3R40cYLYjCzrJWOukVmx8IQyzf1dCZtR35T1n1D3JUQPn/yYnHba0Yd25+Rj8+CwqmMd3sIOtCCCT3AA3+AY+sDgCn7DX/hX+1n7E0AQ3FqDWpXZhnsINm4AD5i0ag==</latexit>
v
uv
u
Coordinate descent• Consider a optimization problem (in 2-dimensions for illustration purposes)
• One method is called coordinate descent
• Initialize (w0,w1) to be random or smart initialization• While not converged, repeat• Pick a coordinate j in {0,1} (either random, round-robin, etc.)
• Main idea:• Minimizing over 1 coordinate is much easier• No need to choose step-size• This is guaranteed to find optimal solution, under some constraints
• When does it fail?!17
minimizew0,w1 g(w0, w1)<latexit sha1_base64="v+R+/yEvrsmXfWhl8JLOh+4u8Q4=">AAACFHicbVBNSwMxEM36bf2qevQSLIKilN0qKHgpevFYwarQliWbTmswyS7JrFqX/ggv/hUvHhTx6sGb/8a07sGvNww83pshmRclUlj0/Q9vZHRsfGJyarowMzs3v1BcXDq1cWo41HksY3MeMQtSaKijQAnniQGmIgln0eXhwD+7AmNFrE+wl0BLsa4WHcEZOiksbjYRbjBTQgslbqEfZtehv3UdBn3a3B8U7a7nykZYLPllfwj6lwQ5KZEctbD43mzHPFWgkUtmbSPwE2xlzKDgEvqFZmohYfySdaHhqGYKbCsbHtWna05p005sXGukQ/X7RsaUtT0VuUnF8ML+9gbif14jxc5eKxM6SRE0/3qok0qKMR0kRNvCAEfZc4RxI9xfKb9ghnF0ORZcCMHvk/+S00o52C5XjndK1YM8jimyQlbJOgnILqmSI1IjdcLJHXkgT+TZu/cevRfv9Wt0xMt3lskPeG+f3PidXg==</latexit>
wj argminwj
g(w0, w1)<latexit sha1_base64="/GP/juzHgvFuI/55U9c+HD2HG8Q=">AAACGHicbVDLSgMxFM34rPVVdekmWIQKUmeqoOCm6MZlBfuAThkyaWYam2SGJGMpQz/Djb/ixoUibrvzb0zbWWjruVw4nHMvyT1+zKjStv1tLS2vrK6t5zbym1vbO7uFvf2GihKJSR1HLJItHynCqCB1TTUjrVgSxH1Gmn7/duI3n4hUNBIPehiTDkehoAHFSBvJK5wNvEfoMhJoJGU0gC6Socup8FJjjKB7PSkYlgaefTrwnBOvULTL9hRwkTgZKYIMNa8wdrsRTjgRGjOkVNuxY91JkdQUMzLKu4kiMcJ9FJK2oQJxojrp9LARPDZKFwaRNC00nKq/N1LElRpy30xypHtq3puI/3ntRAdXnZSKONFE4NlDQcKgjuAkJdilkmDNhoYgLKn5K8Q9JBHWJsu8CcGZP3mRNCpl57xcub8oVm+yOHLgEByBEnDAJaiCO1ADdYDBM3gF7+DDerHerE/raza6ZGU7B+APrPEP3pye+g==</latexit>
Coordinate descent
!18
(w⇤0 , w
⇤1)
<latexit sha1_base64="banXklYo9/l4TQWmdrgfkAvE4Yg=">AAAB9HicbVBNT8JAEJ3iF+IX6tFLIzFBYkiLJnokevGIiXwkUJrtsoUN223d3UJIw+/w4kFjvPpjvPlvXKAHBV8yyct7M5mZ50WMSmVZ30ZmbX1jcyu7ndvZ3ds/yB8eNWQYC0zqOGShaHlIEkY5qSuqGGlFgqDAY6TpDe9mfnNEhKQhf1STiDgB6nPqU4yUlpzi2LW6pYuxa3dL526+YJWtOcxVYqekAClqbv6r0wtxHBCuMENStm0rUk6ChKKYkWmuE0sSITxEfdLWlKOASCeZHz01z7TSM/1Q6OLKnKu/JxIUSDkJPN0ZIDWQy95M/M9rx8q/cRLKo1gRjheL/JiZKjRnCZg9KghWbKIJwoLqW008QAJhpXPK6RDs5ZdXSaNSti/LlYerQvU2jSMLJ3AKRbDhGqpwDzWoA4YneIZXeDNGxovxbnwsWjNGOnMMf2B8/gB5dZCa</latexit>
(w(t)0 , w(t)
1 )<latexit sha1_base64="LZSvHYKIu6xyQVMxgFcOdLDrJ2g=">AAAB/nicbZDLSsNAFIYn9VbrLSqu3AwWoQUpSRV0WXTjsoK9QBvDZDpph04uzJxYSij4Km5cKOLW53Dn2zhts9DWHwY+/nMO58zvxYIrsKxvI7eyura+kd8sbG3v7O6Z+wdNFSWSsgaNRCTbHlFM8JA1gINg7VgyEniCtbzhzbTeemRS8Si8h3HMnID0Q+5zSkBbrnlUGrnWQ1qC8uRs5NpzKrtm0apYM+FlsDMookx11/zq9iKaBCwEKohSHduKwUmJBE4FmxS6iWIxoUPSZx2NIQmYctLZ+RN8qp0e9iOpXwh45v6eSEmg1DjwdGdAYKAWa1Pzv1onAf/KSXkYJ8BCOl/kJwJDhKdZ4B6XjIIYayBUcn0rpgMiCQWdWEGHYC9+eRma1Yp9XqneXRRr11kceXSMTlAJ2egS1dAtqqMGoihFz+gVvRlPxovxbnzMW3NGNnOI/sj4/AENF5RB</latexit>
• Coordinate descent successfully finds the optimal solution if g(.) is strongly convex and smooth
Coordinate descent for matrix completion
• Initialize (L,R)• Repeat• Fix R and optimize over L• Fix L and optimize over R
• First insight:
!19
minimizeL,R
X
u,v:ruv 6=?
⇣(LRT )uv| {z }
LTuRv
�ruv⌘2
<latexit sha1_base64="05qAGFgy739y5ZkfOHeXRYxBHzw=">AAACUXicbVFNbxMxFHxZCpQUaEqPXKxGSKlUot2ABOqFqlw49NBGSVspm6y8zkti1fYu/ohIV/sXeygn/gcXDqA66R76wUiWR/NmZHuc5oIbG4a/asGTtafPnq+/qG+8fPV6s7H15tRkTjPss0xk+jylBgVX2LfcCjzPNVKZCjxLL74u52dz1IZnqmcXOQ4lnSo+4YxaLyWNWWzxhy0kV1zySyyT4mivW5LYOJkUbm++r/02L2OF38kXrx/yaYvETo1Rp5oyLFpH3VFvd2VahhM36pFuMi/Je1JFfWR31EkazbAdrkAek6giTahwnDSu43HGnERlmaDGDKIwt8OCasuZwLIeO4M5ZRd0igNPFZVohsWqkZK888qYTDLtl7Jkpd5NFFQas5Cpd0pqZ+bhbCn+bzZwdvJ5WHCVO4uK3R40cYLYjCzrJWOukVmx8IQyzf1dCZtR35T1n1D3JUQPn/yYnHba0Yd25+Rj8+CwqmMd3sIOtCCCT3AA3+AY+sDgCn7DX/hX+1n7E0AQ3FqDWpXZhnsINm4AD5i0ag==</latexit>
minL1,··· ,Ln
X
(u,v):ruv 6=?
(LTu Rv � ruv)
2
= minL1,··· ,Ln
nX
u=1
n X
v:ruv 6=?
(LTu Rv � ruv)
2o
=nX
u=1
nminLu
X
v:ruv 6=?
(LTu Rv � ruv)
2o
<latexit sha1_base64="4RMBwobn+k83a8p/XWrMLKpsLEE=">AAADBHiclVJNaxNBGJ7d+lHjV2pvenkxWFKIYTcKlZZiqRcPOVRp2kImGWYnk3To7ux2PhbCsode/CtePCji1R/hzX/jbBLFNiL6wgwPz/N+zTtvlMVCmyD47vkr167fuLl6q3b7zt179+trD450ahXjPZbGqTqJqOaxkLxnhIn5SaY4TaKYH0dnryr9OOdKi1QemmnGBwmdSDEWjBpHkTXv4QZOhCRFl4QtzEap0a0ukSVgbRNSNG0r39xWpLB5iSU/h5clNLvEDg9x6y3J4SnMtc1hBzCubeAdvLNbXQB/S2t3w3IoAe+LCS7mVP6vVaqYEmOAy9WW8/7qwJY/9f8rQuqNoB3MDJZBuAANtLADUv+GRymzCZeGxVTrfhhkZlBQZQSLeVnDVvOMsjM64X0HJU24HhSzTyzhiWNGME6VO9LAjP09oqCJ1tMkcp4JNaf6qlaRf9L61oxfDAohM2u4ZPNCYxuDSaHaCBgJxZmJpw5QpoTrFdgpVZQZtzc1N4Tw6pOXwVGnHT5rd948b+ztL8axih6hx6iJQrSF9tBrdIB6iHkX3nvvo/fJf+d/8D/7X+auvreIWUeXzP/6A7gg7iI=</latexit>
This only involves each row of Lindependently, and can be solved as a separate optimization for each row
Coordinate descent for matrix completion• We broke down the problem into solving multiple inner
optimizations of the form:
• Second insight: • And this is the standard linear regression with quadratic loss• Many efficient solvers exist + can be solved in a closed form too
•
!20
minLu
X
v:ruv 6=?
(LTu Rv � ruv)
2
<latexit sha1_base64="NpeslmxKvmlOqeXceI2FsSDdBgw=">AAACJHicbVC7SgNBFJ31bXxFLW0Gg6CgYTcKihaKNhYWKkkUsskyO5lNBmdm13kEwrIfY+Ov2Fj4wMLGb3HyKDR64MLhnHu5954wYVRp1/10xsYnJqemZ2Zzc/MLi0v55ZWqio3EpIJjFsvbECnCqCAVTTUjt4kkiIeM3IR3Zz3/pkOkorEo625C6hy1BI0oRtpKQf4I+pyKIL0ITAahrwwP0s6hDFLTyXxB7uFxBjet2Sj729dBB+7AgbfVKMEgX3CLbh/wL/GGpACGuAzyb34zxoYToTFDStU8N9H1FElNMSNZzjeKJAjfoRapWSoQJ6qe9p/M4IZVmjCKpS2hYV/9OZEirlSXh7aTI91Wo15P/M+rGR0d1FMqEqOJwINFkWFQx7CXGGxSSbBmXUsQltTeCnEbSYS1zTVnQ/BGX/5LqqWit1ssXe0VTk6HccyANbAONoEH9sEJOAeXoAIweABP4AW8Oo/Os/PufAxax5zhzCr4BefrG9GAo6I=</latexit>
Demo
!21
!22
!23
!24
!25
!26
!27
Application: localization
• Wireless sensors deployed in a region• Each measure distance to the close-by sensors• Goal: find the distances to all sensors• If we have all pairwise distances, then it is easy to find
locations of all sensors simultaneously!28
!29
Application: localization
• Why is this a Matrix Completion problem?• We have missing entries• The data is in a matrix form• But most importantly, the ground-truth is a low-rank matrix• The ambient dimension is 2 or 3, i.e. position is
1<latexit sha1_base64="TtIgPQprnJE4HSS++PuM3etxya8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48t2FpoQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgTj25n/8IRK81jem0mCfkSHkoecUWOlptcvV9yqOwdZJV5OKpCj0S9/9QYxSyOUhgmqdddzE+NnVBnOBE5LvVRjQtmYDrFrqaQRaj+bHzolZ1YZkDBWtqQhc/X3REYjrSdRYDsjakZ62ZuJ/3nd1ITXfsZlkhqUbLEoTAUxMZl9TQZcITNiYgllittbCRtRRZmx2ZRsCN7yy6ukXat6F9Va87JSv8njKMIJnMI5eHAFdbiDBrSAAcIzvMKb8+i8OO/Ox6K14OQzx/AHzucPe+uMuQ==</latexit>
x2u + y2u
<latexit sha1_base64="rzfByz4EBXXPm0iCRIR1WsamCeo=">AAAB8nicbVDLSsNAFJ3UV62vqks3g0UQhJJUQZdFNy4r2AeksUymk3boZBJmbsQQ+hluXCji1q9x5984abPQ1gMzHM65l3vv8WPBNdj2t1VaWV1b3yhvVra2d3b3qvsHHR0lirI2jUSkej7RTHDJ2sBBsF6sGAl9wbr+5Cb3u49MaR7Je0hj5oVkJHnAKQEjuU+D5KFxlub/oFqz6/YMeJk4BamhAq1B9as/jGgSMglUEK1dx47By4gCTgWbVvqJZjGhEzJirqGShEx72WzlKT4xyhAHkTJPAp6pvzsyEmqdhr6pDAmM9aKXi/95bgLBlZdxGSfAJJ0PChKBIcL5/XjIFaMgUkMIVdzsiumYKELBpFQxITiLJy+TTqPunNcbdxe15nURRxkdoWN0ihx0iZroFrVQG1EUoWf0it4ssF6sd+tjXlqyip5D9AfW5w+db5DQ</latexit>
�p2xu
<latexit sha1_base64="7DYHNhWKUCAMoodH4mbuWfxiUk8=">AAAB83icbVDLSgNBEOyNrxhfUY9eBoPgxbAbBT0GvXiMYB6QXcLsZDYZMju7zkMMS37DiwdFvPoz3vwbJ8keNLGgoajqprsrTDlT2nW/ncLK6tr6RnGztLW9s7tX3j9oqcRIQpsk4YnshFhRzgRtaqY57aSS4jjktB2ObqZ++5FKxRJxr8cpDWI8ECxiBGsr+We+epA6q02eeqZXrrhVdwa0TLycVCBHo1f+8vsJMTEVmnCsVNdzUx1kWGpGOJ2UfKNoiskID2jXUoFjqoJsdvMEnVilj6JE2hIazdTfExmOlRrHoe2MsR6qRW8q/ud1jY6ugoyJ1GgqyHxRZDjSCZoGgPpMUqL52BJMJLO3IjLEEhNtYyrZELzFl5dJq1b1zqu1u4tK/TqPowhHcAyn4MEl1OEWGtAEAik8wyu8OcZ5cd6dj3lrwclnDuEPnM8fIfGRvw==</latexit>
�p2yu
<latexit sha1_base64="SBTHbA+4dC/KhpGLP+rnjkljWLQ=">AAAB83icbVDLSsNAFL3xWeur6tLNYBHcWJIq6LLoxmUF+4AmlMl00g6dTOI8hBD6G25cKOLWn3Hn3zhts9DWAxcO59zLvfeEKWdKu+63s7K6tr6xWdoqb+/s7u1XDg7bKjGS0BZJeCK7IVaUM0FbmmlOu6mkOA457YTj26nfeaJSsUQ86CylQYyHgkWMYG0l/9xXj1Ln9UnWN/1K1a25M6Bl4hWkCgWa/cqXP0iIianQhGOlep6b6iDHUjPC6aTsG0VTTMZ4SHuWChxTFeSzmyfo1CoDFCXSltBopv6eyHGsVBaHtjPGeqQWvan4n9czOroOciZSo6kg80WR4UgnaBoAGjBJieaZJZhIZm9FZIQlJtrGVLYheIsvL5N2veZd1Or3l9XGTRFHCY7hBM7AgytowB00oQUEUniGV3hzjPPivDsf89YVp5g5gj9wPn8AI3eRwA==</latexit>
x2v + y2v
<latexit sha1_base64="qm22V9a0ZO11aMttob4d2xMHszk=">AAAB8nicbVDLSsNAFJ34rPVVdelmsAiCUJIq6LLoxmUF+4A0lsl00g6dTMLMTTGEfoYbF4q49Wvc+TdO2iy09cAMh3Pu5d57/FhwDbb9ba2srq1vbJa2yts7u3v7lYPDto4SRVmLRiJSXZ9oJrhkLeAgWDdWjIS+YB1/fJv7nQlTmkfyAdKYeSEZSh5wSsBI7lN/8lg/T/O/X6naNXsGvEycglRRgWa/8tUbRDQJmQQqiNauY8fgZUQBp4JNy71Es5jQMRky11BJQqa9bLbyFJ8aZYCDSJknAc/U3x0ZCbVOQ99UhgRGetHLxf88N4Hg2su4jBNgks4HBYnAEOH8fjzgilEQqSGEKm52xXREFKFgUiqbEJzFk5dJu15zLmr1+8tq46aIo4SO0Qk6Qw66Qg10h5qohSiK0DN6RW8WWC/Wu/UxL12xip4j9AfW5w+ggZDS</latexit>
1<latexit sha1_base64="TtIgPQprnJE4HSS++PuM3etxya8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48t2FpoQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgTj25n/8IRK81jem0mCfkSHkoecUWOlptcvV9yqOwdZJV5OKpCj0S9/9QYxSyOUhgmqdddzE+NnVBnOBE5LvVRjQtmYDrFrqaQRaj+bHzolZ1YZkDBWtqQhc/X3REYjrSdRYDsjakZ62ZuJ/3nd1ITXfsZlkhqUbLEoTAUxMZl9TQZcITNiYgllittbCRtRRZmx2ZRsCN7yy6ukXat6F9Va87JSv8njKMIJnMI5eHAFdbiDBrSAAcIzvMKb8+i8OO/Ox6K14OQzx/AHzucPe+uMuQ==</latexit>p2xv
<latexit sha1_base64="0CsKdGk1cAAFmE7QeXMfcoA9Ems=">AAAB8nicbVBNSwMxEM3Wr1q/qh69BIvgqexWQY9FLx4r2A/YLiWbZtvQbLIms8Wy9Gd48aCIV3+NN/+NabsHbX0w8Hhvhpl5YSK4Adf9dgpr6xubW8Xt0s7u3v5B+fCoZVSqKWtSJZTuhMQwwSVrAgfBOolmJA4Fa4ej25nfHjNtuJIPMElYEJOB5BGnBKzkd82jhqw2feqNe+WKW3XnwKvEy0kF5Wj0yl/dvqJpzCRQQYzxPTeBICMaOBVsWuqmhiWEjsiA+ZZKEjMTZPOTp/jMKn0cKW1LAp6rvycyEhsziUPbGRMYmmVvJv7n+SlE10HGZZICk3SxKEoFBoVn/+M+14yCmFhCqOb2VkyHRBMKNqWSDcFbfnmVtGpV76Jau7+s1G/yOIroBJ2ic+ShK1RHd6iBmogihZ7RK3pzwHlx3p2PRWvByWeO0R84nz+4S5GJ</latexit>p2yv
<latexit sha1_base64="05ERgpiq3aJqmgvm+ydVno54Cio=">AAAB8nicbVBNS8NAEN34WetX1aOXxSJ4KkkV9Fj04rGC/YA0lM120y7d7MbdSSGE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvTAQ34Lrfztr6xubWdmmnvLu3f3BYOTpuG5VqylpUCaW7ITFMcMlawEGwbqIZiUPBOuH4buZ3JkwbruQjZAkLYjKUPOKUgJX8nnnSkNenWX/Sr1TdmjsHXiVeQaqoQLNf+eoNFE1jJoEKYozvuQkEOdHAqWDTci81LCF0TIbMt1SSmJkgn588xedWGeBIaVsS8Fz9PZGT2JgsDm1nTGBklr2Z+J/npxDdBDmXSQpM0sWiKBUYFJ79jwdcMwois4RQze2tmI6IJhRsSmUbgrf88ipp12veZa3+cFVt3BZxlNApOkMXyEPXqIHuURO1EEUKPaNX9OaA8+K8Ox+L1jWnmDlBf+B8/gC50ZGK</latexit>
RT<latexit sha1_base64="Mo4Ybb/8OMitQyzbTfyRh4iDllk=">AAAB7XicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWKLhK4GT7C17sLK3e9ndMyGE/2BjoTG2/h87/417cIWCL5nk5b2ZzMwLYs60cd1vJ7e2vrG5ld8u7Ozu7R8UD49aWiaK0CaRXKpOgDXlTNCmYYbTTqwojgJO28H4JvXbT1RpJkXDTGLqR3goWMgINlZqle8fGuVCv1hyK+4caJV4GSlBhnq/+NUbSJJEVBjCsdZdz42NP8XKMMLprNBLNI0xGeMh7VoqcES1P51fO0NnVhmgUCpbwqC5+ntiiiOtJ1FgOyNsRnrZS8X/vG5iwit/ykScGCrIYlGYcGQkSl9HA6YoMXxiCSaK2VsRGWGFibEBpSF4yy+vkla14l1UqnfVUu06iyMPJ3AK5+DBJdTgFurQBAKP8Ayv8OZI58V5dz4WrTknmzmGP3A+fwDz544O</latexit>
L<latexit sha1_base64="Sr/FWvmpJnks64L2fkR0bJw3Wng=">AAAB63icbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWFhgIkgCF7K37MGG3b3L7pwJIfwFGwuNsfUP2flvvIMrFHzJJC/vzWRmXhBLYdF1v53C2vrG5lZxu7Szu7d/UD48atsoMYy3WCQj0wmo5VJo3kKBkndiw6kKJH8MxjeZ//jEjRWRfsBJzH1Fh1qEglHMpOpdtdQvV9yaOwdZJV5OKpCj2S9/9QYRSxTXyCS1tuu5MfpTalAwyWelXmJ5TNmYDnk3pZoqbv3p/NYZOUuVAQkjk5ZGMld/T0ypsnaigrRTURzZZS8T//O6CYZX/lToOEGu2WJRmEiCEckeJwNhOEM5SQllRqS3EjaihjJM48lC8JZfXiXtes27qNXv65XGdR5HEU7gFM7Bg0towC00oQUMRvAMr/DmKOfFeXc+Fq0FJ585hj9wPn8AkGiNQg==</latexit>
D =<latexit sha1_base64="F8VcG3zl+WF/GhojiAQu6/zQtjY=">AAAB7HicbVBNSwMxEJ2tX7V+VT16CRbBU9mtgoIIRT14rOC2hXYp2TTbhibZJckKZelv8OJBEa/+IG/+G9N2D9r6YODx3gwz88KEM21c99sprKyurW8UN0tb2zu7e+X9g6aOU0WoT2Ieq3aINeVMUt8ww2k7URSLkNNWOLqd+q0nqjSL5aMZJzQQeCBZxAg2VvLvUPfquleuuFV3BrRMvJxUIEejV/7q9mOSCioN4VjrjucmJsiwMoxwOil1U00TTEZ4QDuWSiyoDrLZsRN0YpU+imJlSxo0U39PZFhoPRah7RTYDPWiNxX/8zqpiS6DjMkkNVSS+aIo5cjEaPo56jNFieFjSzBRzN6KyBArTIzNp2RD8BZfXibNWtU7q9Yeziv1mzyOIhzBMZyCBxdQh3togA8EGDzDK7w50nlx3p2PeWvByWcO4Q+czx+gpI3o</latexit>
Duv = (xu � xv)2 + (yu � yv)
2<latexit sha1_base64="JxiP/SLBdqVl7VzGHns28iK0ph8=">AAACDHicbVDLSgMxFM3UV62vqks3wSJUxDJTBd0IRV24rGAf0I5DJk3b0ExmyKN0GPoBbvwVNy4UcesHuPNvTNtZaPVA4Nxz7uXmHj9iVCrb/rIyC4tLyyvZ1dza+sbmVn57py5DLTCp4ZCFoukjSRjlpKaoYqQZCYICn5GGP7ia+I0hEZKG/E7FEXED1OO0SzFSRvLyhWsv0cMxvIDFkaePR97w8L4Mj2AxNlU8q0yXXbKngH+Jk5ICSFH18p/tToh1QLjCDEnZcuxIuQkSimJGxrm2liRCeIB6pGUoRwGRbjI9ZgwPjNKB3VCYxxWcqj8nEhRIGQe+6QyQ6st5byL+57W06p67CeWRVoTj2aKuZlCFcJIM7FBBsGKxIQgLav4KcR8JhJXJL2dCcOZP/kvq5ZJzUirfnhYql2kcWbAH9kEROOAMVMANqIIawOABPIEX8Go9Ws/Wm/U+a81Y6cwu+AXr4xvR75jr</latexit>
(xu, yu)<latexit sha1_base64="pGNMrbXQ1CZjKg6fSlqu/arTSgU=">AAAB8HicbVDLSsNAFJ3UV62vqks3g0WoICWpgi6LblxWsA9pQ5hMJ+3QmUmYhxhCv8KNC0Xc+jnu/BunbRbaeuDC4Zx7ufeeMGFUadf9dgorq2vrG8XN0tb2zu5eef+grWIjMWnhmMWyGyJFGBWkpalmpJtIgnjISCcc30z9ziORisbiXqcJ8TkaChpRjLSVHqpPgTlLA3MalCtuzZ0BLhMvJxWQoxmUv/qDGBtOhMYMKdXz3ET7GZKaYkYmpb5RJEF4jIakZ6lAnCg/mx08gSdWGcAolraEhjP190SGuFIpD20nR3qkFr2p+J/XMzq68jMqEqOJwPNFkWFQx3D6PRxQSbBmqSUIS2pvhXiEJMLaZlSyIXiLLy+Tdr3mndfqdxeVxnUeRxEcgWNQBR64BA1wC5qgBTDg4Bm8gjdHOi/Ou/Mxby04+cwh+APn8wcTCo/u</latexit>
Application: recommendation systems• Given partially observed ratings matrix • Discover k topics, and k-dimensional user features Lu
movie features Rv• Predict how much a user will like a movie by ruv=LuT*Rv• Make recommendations based on the prediction
• Applied to Wikipedia
!30
Xij known for black cellsXij unknown for white cells
Rows index moviesColumns index users
X = ≈X LR’
=
Application to text data:
Which is correct about matrix factorization based recommendation systems?• a) provide personalization• b) capture context (e.g. time of the day)
• Another weakness of matrix factorization• We need to know k, in some sense• If we set k= min{m,n}, what goes wrong?• overfitting
• Solution: regularize
!31
6 44 3
27 5
=<latexit sha1_base64="qH777iEVKmSp2MtgcTQAXCUUeOo=">AAAB63icbVA9SwNBEJ2LXzF+RS1tFhPBKtzFQhshaGMZwcRAcoS9zVyyZHfv2N0TQshfsLFQxNY/ZOe/8S65QhMfDDzem2FmXhALbqzrfjuFtfWNza3idmlnd2//oHx41DZRohm2WCQi3QmoQcEVtiy3AjuxRioDgY/B+DbzH59QGx6pBzuJ0Zd0qHjIGbWZVL2ulvrliltz5yCrxMtJBXI0++Wv3iBiiURlmaDGdD03tv6UasuZwFmplxiMKRvTIXZTqqhE40/nt87IWaoMSBjptJQlc/X3xJRKYyYySDsltSOz7GXif143seGVP+UqTiwqtlgUJoLYiGSPkwHXyKyYpIQyzdNbCRtRTZlN48lC8JZfXiXtes27qNXv65XGTR5HEU7gFM7Bg0towB00oQUMRvAMr/DmSOfFeXc+Fq0FJ585hj9wPn8AeY6NMw==</latexit>
RT<latexit sha1_base64="Mo4Ybb/8OMitQyzbTfyRh4iDllk=">AAAB7XicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWKLhK4GT7C17sLK3e9ndMyGE/2BjoTG2/h87/417cIWCL5nk5b2ZzMwLYs60cd1vJ7e2vrG5ld8u7Ozu7R8UD49aWiaK0CaRXKpOgDXlTNCmYYbTTqwojgJO28H4JvXbT1RpJkXDTGLqR3goWMgINlZqle8fGuVCv1hyK+4caJV4GSlBhnq/+NUbSJJEVBjCsdZdz42NP8XKMMLprNBLNI0xGeMh7VoqcES1P51fO0NnVhmgUCpbwqC5+ntiiiOtJ1FgOyNsRnrZS8X/vG5iwit/ykScGCrIYlGYcGQkSl9HA6YoMXxiCSaK2VsRGWGFibEBpSF4yy+vkla14l1UqnfVUu06iyMPJ3AK5+DBJdTgFurQBAKP8Ayv8OZI58V5dz4WrTknmzmGP3A+fwDz544O</latexit>
L<latexit sha1_base64="Sr/FWvmpJnks64L2fkR0bJw3Wng=">AAAB63icbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWFhgIkgCF7K37MGG3b3L7pwJIfwFGwuNsfUP2flvvIMrFHzJJC/vzWRmXhBLYdF1v53C2vrG5lZxu7Szu7d/UD48atsoMYy3WCQj0wmo5VJo3kKBkndiw6kKJH8MxjeZ//jEjRWRfsBJzH1Fh1qEglHMpOpdtdQvV9yaOwdZJV5OKpCj2S9/9QYRSxTXyCS1tuu5MfpTalAwyWelXmJ5TNmYDnk3pZoqbv3p/NYZOUuVAQkjk5ZGMld/T0ypsnaigrRTURzZZS8T//O6CYZX/lToOEGu2WJRmEiCEckeJwNhOEM5SQllRqS3EjaihjJM48lC8JZfXiXtes27qNXv65XGdR5HEU7gFM7Bg0towC00oQUMRvAMr/DmKOfFeXc+Fq0FJ585hj9wPn8AkGiNQg==</latexit>
minLu
X
v:ruv 6=?
(LTu Rv � ruv)
2 + �kLuk2
<latexit sha1_base64="/FvKAWZjDBheRO+9YwZHo9fyto0=">AAACS3icbVBNbxMxFPQGCiV8BXrk8kSEVAREuwEJRA+t6IUDh4KatlKcrLzel9aq7V38ERG5+/+4cOHWP8GFAwhxwPk4QMtIlkYz7+mNp6ilsC5Nz5PWlatr166v32jfvHX7zt3OvfsHtvKG44BXsjJHBbMohcaBE07iUW2QqULiYXG6O/cPp2isqPS+m9U4UuxYi4ngzEUp7xRAldB5eJf7BoBar/IwfW3y4KcN1fgRthvYjOZ4nz79kE/hGSy9x+M+UIef3CJCMFg2gW49oVtAZbxeMnoWt+jZuN/knW7aSxeAyyRbkS5ZYS/vfKVlxb1C7bhk1g6ztHajwIwTXGLTpt5izfgpO8ZhpJoptKOwCNLAo6iUMKlMfNrBQv17IzBl7UwVcVIxd2IvenPxf97Qu8mrURC69g41Xx6aeAmugnmxUAqD3MlZJIwbEbMCP2GGcRfrb8cSsotfvkwO+r3sea///kV3582qjnXygDwkmyQjL8kOeUv2yIBw8pl8Iz/Iz+RL8j35lfxejraS1c4G+QettT+UrrMq</latexit>
Featured matrix factorization• Limitations of matrix factorization
!32
• Cold-start problem- This model still cannot handle a new user or movie
Xij known for black cellsXij unknown for white cells
Rows index moviesColumns index users
X =Rating
• As there is no observation for the entire row/columnputting anything in that row has no penalty
minimizeL,R
X
u,v:ruv 6=?
⇣(LRT )uv| {z }
LTuRv
�ruv⌘2
<latexit sha1_base64="05qAGFgy739y5ZkfOHeXRYxBHzw=">AAACUXicbVFNbxMxFHxZCpQUaEqPXKxGSKlUot2ABOqFqlw49NBGSVspm6y8zkti1fYu/ohIV/sXeygn/gcXDqA66R76wUiWR/NmZHuc5oIbG4a/asGTtafPnq+/qG+8fPV6s7H15tRkTjPss0xk+jylBgVX2LfcCjzPNVKZCjxLL74u52dz1IZnqmcXOQ4lnSo+4YxaLyWNWWzxhy0kV1zySyyT4mivW5LYOJkUbm++r/02L2OF38kXrx/yaYvETo1Rp5oyLFpH3VFvd2VahhM36pFuMi/Je1JFfWR31EkazbAdrkAek6giTahwnDSu43HGnERlmaDGDKIwt8OCasuZwLIeO4M5ZRd0igNPFZVohsWqkZK888qYTDLtl7Jkpd5NFFQas5Cpd0pqZ+bhbCn+bzZwdvJ5WHCVO4uK3R40cYLYjCzrJWOukVmx8IQyzf1dCZtR35T1n1D3JUQPn/yYnHba0Yd25+Rj8+CwqmMd3sIOtCCCT3AA3+AY+sDgCn7DX/hX+1n7E0AQ3FqDWpXZhnsINm4AD5i0ag==</latexit>
Combining features and discovered topics
•Features capture context-Time of day, what I just saw, user info, past purchases,…
•Discovered topics from matrix factorization capture groups of users who behave similarly-Women from Seattle who teach and have a baby
•Combine to mitigate cold-start problem-Ratings for a new user from features only-As more information about user is discovered, matrix factorization topics become more relevant
!33
Collaborative filtering with specified features
!34
• Create feature vector for each movie (often have this even for new movies):
• Define weights on these features for how much all users like each feature
• Fit linear model:
• Minimize:
Building in personalization
!35
• Of course, users do not have identical preferences• Include a user-specific deviation from the global set of user weights:
• If we don’t have any observations about a user, use wisdom of the crowd
• As we gain more information about the user, forget the crowd
• Can add in user-specific features, and cross-features, too
Featurized matrix factorization—A combined approach
!36
Feature-based approach:- Feature representation of user and movies fixed- Can address cold-start problem
Matrix factorization approach:- Suffers from cold-start problem- User & movie features are learned from data
A unified model: