Matrix factorization and embeddings
X ⇡
k
H k D
d
d
n n
Reminders/Comments
• Initial draft of Mini-project due today
2
Today
• Back to representation learning
• How do we transform an input x into a new vector phi(x) that is• composed of real values
• enables nonlinear functions in terms of x, using only a generalized linear model with phi(x)
• and has other potentially desirable properties, like compactness or…
3
Neural networks summary
• Discussed basics, including
• Basic architectures (fully connected layers with activations like sigmoid, tanh, and relu)
• How to choose the output loss • i.e., still using the GLM formulation
• Learning strategy: gradient descent (called back-propagation)
• Basic regularization strategies
4
How else can we learn the representation?
• Discussed how learning can be done in simple ways even for “fixed representations”• e.g., learn the centres for radial basis function networks
• e.g., learn the bandwidths for Gaussian kernel
• In general, this problem has been tackled for a long time in the field of unsupervised learning• where the goal is to analyze the underlying structure in the data
5
Using factorizations
• Many unsupervised learning and semi-supervised learning problems can be formulated as factorizations• PCA, kernel PCA, sparse coding, clustering, etc.
• Also provides an way to embed more complex items into a shared space using co-occurrence • e.g., matrix completion for Netflix challenge
• e.g., word2vec
6
Intuition (factor analysis)
7
• Imagine you have test scores from 10 subjects (topics), for 1000 students
• As a psychologist, you hypothesize there are two kinds of intelligence: verbal and mathematical
• You cannot observe these factors (hidden variables)
• Instead, you would like to see if these two factor explain the data, where x is the vector of test scores of a student
• Want to find: x = d1 h1 + d2 h2, where d1 and d2 are vectors h1 = verbal intelligence and h2 = mathematical intelligence
• Having features h1 and h2 would give a compact, intuitive rep
Example continued
8
• Imagine you have test scores from 10 subjects (topics), for 1000 students
• Learned basis vectors d1 and d2 that reflect scores for a student with high verbal or math intelligence, respectively
• Features [h_{5,1}, h_{5,2}] provide useful attributes about student 5
Obtain x5 = d1h5,1 + d2h5,2
where h5,1 = verbal intelligence and h5,2 = math intelligence<latexit sha1_base64="lY7qf34hTka24ISwU7ZeUFj/xzM=">AAACh3icbVFNb9NAEF2bAm1aIMCRy4iUCgEqdmihl0qlXHprkZq2UhxZ6/U4WXW9tnbH0MjyX+FHcePfsE5c1A9GWunNmzez85GUSloKgj+e/2Dl4aPHq2u99Y0nT5/1n784s0VlBI5EoQpzkXCLSmockSSFF6VBnicKz5PLb238/AcaKwt9SvMSJzmfaplJwclRcf/XVkR4RfVxQlxqaCDKOc2SrL5q4l3Y/+emTRzCLK53P4QNvL9JD5f0sIki6IrBzxkahM1r/f4muB4SrkBqQqXkFLVA4DrtNMOFpq15S9Fr4v4g2A4WBvdB2IEB6+wk7v+O0kJUOWoSils7DoOSJjU3JIXCphdVFksuLvkUxw5qnqOd1Is9NvDGMSlkhXFPEyzYmxk1z62d54lTtr3au7GW/F9sXFG2N6mlLitycy0/yioFVEB7FEilQUFq7gAXRrpeQcy44YLc6XpuCeHdke+Ds+F26PD3ncHBYbeOVfaKvWZvWci+sAN2xE7YiAlvxXvnffJ2/DX/o//Z31tKfa/Leclumf/1L7pjwDY=</latexit><latexit sha1_base64="lY7qf34hTka24ISwU7ZeUFj/xzM=">AAACh3icbVFNb9NAEF2bAm1aIMCRy4iUCgEqdmihl0qlXHprkZq2UhxZ6/U4WXW9tnbH0MjyX+FHcePfsE5c1A9GWunNmzez85GUSloKgj+e/2Dl4aPHq2u99Y0nT5/1n784s0VlBI5EoQpzkXCLSmockSSFF6VBnicKz5PLb238/AcaKwt9SvMSJzmfaplJwclRcf/XVkR4RfVxQlxqaCDKOc2SrL5q4l3Y/+emTRzCLK53P4QNvL9JD5f0sIki6IrBzxkahM1r/f4muB4SrkBqQqXkFLVA4DrtNMOFpq15S9Fr4v4g2A4WBvdB2IEB6+wk7v+O0kJUOWoSils7DoOSJjU3JIXCphdVFksuLvkUxw5qnqOd1Is9NvDGMSlkhXFPEyzYmxk1z62d54lTtr3au7GW/F9sXFG2N6mlLitycy0/yioFVEB7FEilQUFq7gAXRrpeQcy44YLc6XpuCeHdke+Ds+F26PD3ncHBYbeOVfaKvWZvWci+sAN2xE7YiAlvxXvnffJ2/DX/o//Z31tKfa/Leclumf/1L7pjwDY=</latexit><latexit sha1_base64="lY7qf34hTka24ISwU7ZeUFj/xzM=">AAACh3icbVFNb9NAEF2bAm1aIMCRy4iUCgEqdmihl0qlXHprkZq2UhxZ6/U4WXW9tnbH0MjyX+FHcePfsE5c1A9GWunNmzez85GUSloKgj+e/2Dl4aPHq2u99Y0nT5/1n784s0VlBI5EoQpzkXCLSmockSSFF6VBnicKz5PLb238/AcaKwt9SvMSJzmfaplJwclRcf/XVkR4RfVxQlxqaCDKOc2SrL5q4l3Y/+emTRzCLK53P4QNvL9JD5f0sIki6IrBzxkahM1r/f4muB4SrkBqQqXkFLVA4DrtNMOFpq15S9Fr4v4g2A4WBvdB2IEB6+wk7v+O0kJUOWoSils7DoOSJjU3JIXCphdVFksuLvkUxw5qnqOd1Is9NvDGMSlkhXFPEyzYmxk1z62d54lTtr3au7GW/F9sXFG2N6mlLitycy0/yioFVEB7FEilQUFq7gAXRrpeQcy44YLc6XpuCeHdke+Ds+F26PD3ncHBYbeOVfaKvWZvWci+sAN2xE7YiAlvxXvnffJ2/DX/o//Z31tKfa/Leclumf/1L7pjwDY=</latexit><latexit sha1_base64="lY7qf34hTka24ISwU7ZeUFj/xzM=">AAACh3icbVFNb9NAEF2bAm1aIMCRy4iUCgEqdmihl0qlXHprkZq2UhxZ6/U4WXW9tnbH0MjyX+FHcePfsE5c1A9GWunNmzez85GUSloKgj+e/2Dl4aPHq2u99Y0nT5/1n784s0VlBI5EoQpzkXCLSmockSSFF6VBnicKz5PLb238/AcaKwt9SvMSJzmfaplJwclRcf/XVkR4RfVxQlxqaCDKOc2SrL5q4l3Y/+emTRzCLK53P4QNvL9JD5f0sIki6IrBzxkahM1r/f4muB4SrkBqQqXkFLVA4DrtNMOFpq15S9Fr4v4g2A4WBvdB2IEB6+wk7v+O0kJUOWoSils7DoOSJjU3JIXCphdVFksuLvkUxw5qnqOd1Is9NvDGMSlkhXFPEyzYmxk1z62d54lTtr3au7GW/F9sXFG2N6mlLitycy0/yioFVEB7FEilQUFq7gAXRrpeQcy44YLc6XpuCeHdke+Ds+F26PD3ncHBYbeOVfaKvWZvWci+sAN2xE7YiAlvxXvnffJ2/DX/o//Z31tKfa/Leclumf/1L7pjwDY=</latexit>
Whiteboard
• Linear neural network
• Auto-encoders and Matrix factorization
• Learning (latent) attributes of inputs
9
Example: K-means
10
Xn
d
⇡ Hn
k
⇥Dk
d
Figure 7.4: Matrix factorization of data matrix X 2 Rn⇥d.
1 0 0.1 -3.1 2.4 Sample 1
Select cluster 1
0.2 -3.0 2.0
1.2 0.1 -6.3
Mean cluster 1
Mean cluster 2
Figure 7.5: K-means clustering as a matrix factorization for data matrix X 2 Rn⇥d.
K-means clustering is an unsupervised learning problem to group data points into kclusters by minimizing distances to the mean of each cluster. This problem is not usuallythought of as a representation learning approach, because the cluster number is not typicallyused as a representation. However, we nonetheless start with k-means because it is anintuitive example of how these unsupervised learning algorithms can be thought of as matrixfactorization. Further, the clustering approach can be seen as a representation learningapproach, because it is a learned discretization of the space. We will discuss this view ofk-means after discussing it as a matrix factorization.
Imagine that you have two clusters (k = 2), with data dimension d = 3. Let d1 be themean for cluster 1 and d2 the mean for cluster 2. The goal is to minimize the squared `2distance of each data point x to its cluster center
kx�2X
i=1
1 (x in cluster i)dik22 = kx� hDk22
where h = [1 0] or h = [0 1] and D = [d1 ; d2]. An example is depicted in Figure 7.5.For a point x = [0.1 � 3.1 2.4], h = [1 0], meaning it is placed in cluster 1 with meand1 = [0.2 � 3.0 2.0]. It would incur more error to place x in cluster 2 which has a meanthat is more dissimilar: d2 = [1.2 0.1 � 6.3].
The overall minimization is defined across all the samples, giving loss
minH2{0,1}n⇥k,1H=1
D2Rk⇥d
kX�HDk2F .
Different clusters vectors h are learned for each x, but the dictionary of means is sharedamongst all the data points. The specified optimization should pick dictionary D of meansthat provides the smallest distances to points in the training dataset.
86
Xn
d
⇡ Hn
k
⇥Dk
d
Figure 7.4: Matrix factorization of data matrix X 2 Rn⇥d.
1 0 0.1 -3.1 2.4 Sample 1
Select cluster 1
0.2 -3.0 2.0
1.2 0.1 -6.3
Mean cluster 1
Mean cluster 2
Figure 7.5: K-means clustering as a matrix factorization for data matrix X 2 Rn⇥d.
K-means clustering is an unsupervised learning problem to group data points into kclusters by minimizing distances to the mean of each cluster. This problem is not usuallythought of as a representation learning approach, because the cluster number is not typicallyused as a representation. However, we nonetheless start with k-means because it is anintuitive example of how these unsupervised learning algorithms can be thought of as matrixfactorization. Further, the clustering approach can be seen as a representation learningapproach, because it is a learned discretization of the space. We will discuss this view ofk-means after discussing it as a matrix factorization.
Imagine that you have two clusters (k = 2), with data dimension d = 3. Let d1 be themean for cluster 1 and d2 the mean for cluster 2. The goal is to minimize the squared `2distance of each data point x to its cluster center
kx�2X
i=1
1 (x in cluster i)dik22 = kx� hDk22
where h = [1 0] or h = [0 1] and D = [d1 ; d2]. An example is depicted in Figure 7.5.For a point x = [0.1 � 3.1 2.4], h = [1 0], meaning it is placed in cluster 1 with meand1 = [0.2 � 3.0 2.0]. It would incur more error to place x in cluster 2 which has a meanthat is more dissimilar: d2 = [1.2 0.1 � 6.3].
The overall minimization is defined across all the samples, giving loss
minH2{0,1}n⇥k,1H=1
D2Rk⇥d
kX�HDk2F .
Different clusters vectors h are learned for each x, but the dictionary of means is sharedamongst all the data points. The specified optimization should pick dictionary D of meansthat provides the smallest distances to points in the training dataset.
86
Xn
d
⇡ Hn
k
⇥Dk
d
Figure 7.4: Matrix factorization of data matrix X 2 Rn⇥d.
1 0 0.1 -3.1 2.4 Sample 1
Select cluster 1
0.2 -3.0 2.0
1.2 0.1 -6.3
Mean cluster 1
Mean cluster 2
Figure 7.5: K-means clustering as a matrix factorization for data matrix X 2 Rn⇥d.
K-means clustering is an unsupervised learning problem to group data points into kclusters by minimizing distances to the mean of each cluster. This problem is not usuallythought of as a representation learning approach, because the cluster number is not typicallyused as a representation. However, we nonetheless start with k-means because it is anintuitive example of how these unsupervised learning algorithms can be thought of as matrixfactorization. Further, the clustering approach can be seen as a representation learningapproach, because it is a learned discretization of the space. We will discuss this view ofk-means after discussing it as a matrix factorization.
Imagine that you have two clusters (k = 2), with data dimension d = 3. Let d1 be themean for cluster 1 and d2 the mean for cluster 2. The goal is to minimize the squared `2distance of each data point x to its cluster center
kx�2X
i=1
1 (x in cluster i)dik22 = kx� hDk22
where h = [1 0] or h = [0 1] and D = [d1 ; d2]. An example is depicted in Figure 7.5.For a point x = [0.1 � 3.1 2.4], h = [1 0], meaning it is placed in cluster 1 with meand1 = [0.2 � 3.0 2.0]. It would incur more error to place x in cluster 2 which has a meanthat is more dissimilar: d2 = [1.2 0.1 � 6.3].
The overall minimization is defined across all the samples, giving loss
minH2{0,1}n⇥k,1H=1
D2Rk⇥d
kX�HDk2F .
Different clusters vectors h are learned for each x, but the dictionary of means is sharedamongst all the data points. The specified optimization should pick dictionary D of meansthat provides the smallest distances to points in the training dataset.
86
X ⇡
k
H k D
d
d
n n
Dimensionality reduction• If set inner dimension k < d, obtain dimensionality reduction
• Recall that the product of two matrices H and D has rank at most the minimum rank of H and D
• Even if d = 1000, if we set k = 2, then we get a reconstruction of X that is only two-dimensional• we could even visualize the data! How?
11
rank(HD) min(rank(H), rank(D)
X ⇡
k
H k D
d
d
n n
Principal components analysis• New representation is k left singular vectors that correspond to
k largest singular values• i.e., for each sample x, the corresponding k-dimensional h is the rep
• Not the same as selecting k features, but rather projecting features into lower-dimensional space
12 H
Do these make useful features?
• Before we were doing (huge) nonlinear expansions
• PCA takes input features and reduces the dimension
• This constrains the model, cannot be more powerful
• Why could this be helpful?• Constraining the model is a form of regularization: could promote
generalization
• Sometimes have way too many features (e.g., someone overdid their nonlinear expansion, redundant features), want to extract key dimensions and remove redundancy and noise
• Can be helpful for simply analyzing the data, to choose better models13
What if the data does not lie on a plane?
• Can do non-linear dimensionality reduction
• Interestingly enough, many non-linear dimensionality reduction techniques correspond to PCA, but first by taking a nonlinear transformation of the data with a (specialized) kernel• Isomap, Laplacian eigenmaps, LLE, etc.
• Can therefore extract a lower-dimensional representation on a curved manifold, can better approximate input data in a low-dimensional space • which would be hard to capture on a linear surface
14
Isomap vs PCA
15
Data PCA ISOMAP
*Note: you don’t need to know Isomap, just using it as an example
Sparse coding
16
k
X ⇡ k DH0 1 0 0 1 0 0.3 0 0 0
dd
• For sparse representation, usually k > d
• Many entries in new representation are zero
n n
17
Comments: Nov. 28, 2019
• TAs will send you two initial drafts to review
• Andy Patterson will do a review lecture next Thursday
18
Summary so far
• Factorization another way to obtain re-representations of data
• One advantage over NNs: lets us easily impose structure on the re-representation, since we optimize for it explicitly
• One disadvantage over NNs: usually restricted to pretty simple re-representation (e.g., linear weighting of dictionary items)
• Disclaimer: I won’t actually ask you any questions on the final about Matrix Factorization
19
Embeddings with co-occurrence
• Embed complex items into a shared (Euclidean) space based on their relationships to other complex items
• Examples: • word2vec
• users and movies
• gene sequences
20
Consider word features
• Imagine want to predict whether a sentence is positive or negative (say with logistic regression)
• How do we encode words?
• One basic option: a one-hot encoding. If there are 10000 words, the ith word has a 1 in the ith location of a 10000 length vector, and zero everywhere else.
• This is a common way to deal with categorical variables, but with 10000 words this can get big!
• Can we get a more compact representation of a word?
21
Co-occurrence matrix example• X is count of words (rows) and context (columns), where for
word i the count is the number of times a context word j is seen within 2 words (say) of word i
• Each word is a one-hot encoding; if there are 10000 words, each row corresponds to 1 word, and X is 10000x10000
22
How obtain embeddings?• Words i and s that have similar context counts should be
embedded similarly
• Factorize co-occurrence matrix
23
k
H k Dd
n⇡
Hi: = representation for word i<latexit sha1_base64="OtxzPETtobIR6ox9cmz77j6g6RQ=">AAACHHicbVDLSsNAFJ34rPVVdelmsAiuSqrFFwhFN10qWBXaUibTm3ZwkgkzN2oJ+RA3/oobF4q4cSH4N05iEV9ndTjnPo8XSWHQdd+dsfGJyanpwkxxdm5+YbG0tHxmVKw5NLmSSl94zIAUITRRoISLSAMLPAnn3uVR5p9fgTZChac4jKATsH4ofMEZWqlb2moHDAeenzTSbiL2U3pA2wg3mGiwcwyEmBdSX2l6rXSPplR0S2W34uagf0l1RMpkhONu6bXdUzwO7DQumTGtqhthJ2EaBZeQFtuxgYjxS9aHlqUhC8B0kvy5lK5bpZfv91WINFe/dyQsMGYYeLYye8X89jLxP68Vo7/bSUQYxQgh/1zkx5KiollStCc0cJRDSxjXwt5K+YBpxtHmWcxD2Muw/fXyX3K2WaluVWontXL9cBRHgaySNbJBqmSH1EmDHJMm4eSW3JNH8uTcOQ/Os/PyWTrmjHpWyA84bx9+VaJe</latexit>
Algorithm
24
minH,D
X
available (i,j)
(Xij �Hi:D:j)2
<latexit sha1_base64="2tPoulXvHkDxFvuqbMsygaHUoL8=">AAACQHicbVBNTxsxEPUCLRBKCe2xF4uoEkiAdhESiBNqOeQIEiGRsmE163jBwfau7FlEZO1P49KfwI1zLz20Qlw54YQI8dEnWX7z5sPjlxZSWAzD22BqeubDx9m5+drCp8XPS/XlLyc2Lw3jLZbL3HRSsFwKzVsoUPJOYTioVPJ2evFzlG9fcmNFro9xWPCegjMtMsEAvZTU27ESOnHNdXpQ0diWKnEx8it0cAlCgh9DK7oq1gdr/uokTgwqukFjBXieZq5ZeWWveo4PfLw3qNZOt5J6I9wMx6DvSTQhDTLBYVK/ifs5KxXXyCRY243CAnsODAomeVWLS8sLYBdwxruealDc9tzYgIp+90qfZrnxRyMdqy87HChrhyr1laNN7dvcSPxfrltitttzQhclcs2eHspKSTGnIzdpXxjOUA49AWaE35WyczDA0Hte8yZEb7/8npxsbUaeH2039n9M7Jgj38gKWSUR2SH7pEkOSYswck1+k7/kX/Ar+BPcBfdPpVPBpOcreYXg4REHWa9v</latexit><latexit sha1_base64="2tPoulXvHkDxFvuqbMsygaHUoL8=">AAACQHicbVBNTxsxEPUCLRBKCe2xF4uoEkiAdhESiBNqOeQIEiGRsmE163jBwfau7FlEZO1P49KfwI1zLz20Qlw54YQI8dEnWX7z5sPjlxZSWAzD22BqeubDx9m5+drCp8XPS/XlLyc2Lw3jLZbL3HRSsFwKzVsoUPJOYTioVPJ2evFzlG9fcmNFro9xWPCegjMtMsEAvZTU27ESOnHNdXpQ0diWKnEx8it0cAlCgh9DK7oq1gdr/uokTgwqukFjBXieZq5ZeWWveo4PfLw3qNZOt5J6I9wMx6DvSTQhDTLBYVK/ifs5KxXXyCRY243CAnsODAomeVWLS8sLYBdwxruealDc9tzYgIp+90qfZrnxRyMdqy87HChrhyr1laNN7dvcSPxfrltitttzQhclcs2eHspKSTGnIzdpXxjOUA49AWaE35WyczDA0Hte8yZEb7/8npxsbUaeH2039n9M7Jgj38gKWSUR2SH7pEkOSYswck1+k7/kX/Ar+BPcBfdPpVPBpOcreYXg4REHWa9v</latexit><latexit sha1_base64="2tPoulXvHkDxFvuqbMsygaHUoL8=">AAACQHicbVBNTxsxEPUCLRBKCe2xF4uoEkiAdhESiBNqOeQIEiGRsmE163jBwfau7FlEZO1P49KfwI1zLz20Qlw54YQI8dEnWX7z5sPjlxZSWAzD22BqeubDx9m5+drCp8XPS/XlLyc2Lw3jLZbL3HRSsFwKzVsoUPJOYTioVPJ2evFzlG9fcmNFro9xWPCegjMtMsEAvZTU27ESOnHNdXpQ0diWKnEx8it0cAlCgh9DK7oq1gdr/uokTgwqukFjBXieZq5ZeWWveo4PfLw3qNZOt5J6I9wMx6DvSTQhDTLBYVK/ifs5KxXXyCRY243CAnsODAomeVWLS8sLYBdwxruealDc9tzYgIp+90qfZrnxRyMdqy87HChrhyr1laNN7dvcSPxfrltitttzQhclcs2eHspKSTGnIzdpXxjOUA49AWaE35WyczDA0Hte8yZEb7/8npxsbUaeH2039n9M7Jgj38gKWSUR2SH7pEkOSYswck1+k7/kX/Ar+BPcBfdPpVPBpOcreYXg4REHWa9v</latexit><latexit sha1_base64="2tPoulXvHkDxFvuqbMsygaHUoL8=">AAACQHicbVBNTxsxEPUCLRBKCe2xF4uoEkiAdhESiBNqOeQIEiGRsmE163jBwfau7FlEZO1P49KfwI1zLz20Qlw54YQI8dEnWX7z5sPjlxZSWAzD22BqeubDx9m5+drCp8XPS/XlLyc2Lw3jLZbL3HRSsFwKzVsoUPJOYTioVPJ2evFzlG9fcmNFro9xWPCegjMtMsEAvZTU27ESOnHNdXpQ0diWKnEx8it0cAlCgh9DK7oq1gdr/uokTgwqukFjBXieZq5ZeWWveo4PfLw3qNZOt5J6I9wMx6DvSTQhDTLBYVK/ifs5KxXXyCRY243CAnsODAomeVWLS8sLYBdwxruealDc9tzYgIp+90qfZrnxRyMdqy87HChrhyr1laNN7dvcSPxfrltitttzQhclcs2eHspKSTGnIzdpXxjOUA49AWaE35WyczDA0Hte8yZEb7/8npxsbUaeH2039n9M7Jgj38gKWSUR2SH7pEkOSYswck1+k7/kX/Ar+BPcBfdPpVPBpOcreYXg4REHWa9v</latexit>
rD
X
available (i,j)
(Xij �Hi:D:j)2 =
X
available (i,j)
rD(Xij �Hi:D:j)2
<latexit sha1_base64="/6gvRiVHWpvPPoUCy6WHDPYY5S4=">AAAClHiclVFtSxtBEN47bbXpW6LQL35ZGgoKrdxJISIK8Q38JBYaDeTSY24zpxv39o7dOTEc94v8N37rv+kmBmm1UDow7DPPvO5MUihpKQh+ev7C4ouXS8uvGq/fvH33vtlaObd5aQT2RK5y00/AopIaeyRJYb8wCFmi8CK5Ppz6L27QWJnr7zQpcJjBpZapFECOipt3kYZEQXzEI1tmcRUR3lIFNyCV45HXfF1+Hm+4px9XclzzLzzKgK6StDqpHbNTP9pHzt4Z1xs/thp7/6r22PV/ysbNdrAZzIQ/B+EctNlczuLmfTTKRZmhJqHA2kEYFDSswJAUCutGVFosQFzDJQ4c1JChHVazpdb8k2NGPM2NU018xv6eUUFm7SRLXOR0UvvUNyX/5huUlG4PK6mLklCLh0ZpqTjlfHohPpIGBamJAyCMdLNycQUGBLk7NtwSwqdffg7OtzZDh799bXcP5utYZmvsI1tnIeuwLjthZ6zHhNfyOl7X2/c/+Lv+oX/8EOp785xV9of4p78AL/DH4Q==</latexit><latexit sha1_base64="/6gvRiVHWpvPPoUCy6WHDPYY5S4=">AAAClHiclVFtSxtBEN47bbXpW6LQL35ZGgoKrdxJISIK8Q38JBYaDeTSY24zpxv39o7dOTEc94v8N37rv+kmBmm1UDow7DPPvO5MUihpKQh+ev7C4ouXS8uvGq/fvH33vtlaObd5aQT2RK5y00/AopIaeyRJYb8wCFmi8CK5Ppz6L27QWJnr7zQpcJjBpZapFECOipt3kYZEQXzEI1tmcRUR3lIFNyCV45HXfF1+Hm+4px9XclzzLzzKgK6StDqpHbNTP9pHzt4Z1xs/thp7/6r22PV/ysbNdrAZzIQ/B+EctNlczuLmfTTKRZmhJqHA2kEYFDSswJAUCutGVFosQFzDJQ4c1JChHVazpdb8k2NGPM2NU018xv6eUUFm7SRLXOR0UvvUNyX/5huUlG4PK6mLklCLh0ZpqTjlfHohPpIGBamJAyCMdLNycQUGBLk7NtwSwqdffg7OtzZDh799bXcP5utYZmvsI1tnIeuwLjthZ6zHhNfyOl7X2/c/+Lv+oX/8EOp785xV9of4p78AL/DH4Q==</latexit><latexit sha1_base64="/6gvRiVHWpvPPoUCy6WHDPYY5S4=">AAAClHiclVFtSxtBEN47bbXpW6LQL35ZGgoKrdxJISIK8Q38JBYaDeTSY24zpxv39o7dOTEc94v8N37rv+kmBmm1UDow7DPPvO5MUihpKQh+ev7C4ouXS8uvGq/fvH33vtlaObd5aQT2RK5y00/AopIaeyRJYb8wCFmi8CK5Ppz6L27QWJnr7zQpcJjBpZapFECOipt3kYZEQXzEI1tmcRUR3lIFNyCV45HXfF1+Hm+4px9XclzzLzzKgK6StDqpHbNTP9pHzt4Z1xs/thp7/6r22PV/ysbNdrAZzIQ/B+EctNlczuLmfTTKRZmhJqHA2kEYFDSswJAUCutGVFosQFzDJQ4c1JChHVazpdb8k2NGPM2NU018xv6eUUFm7SRLXOR0UvvUNyX/5huUlG4PK6mLklCLh0ZpqTjlfHohPpIGBamJAyCMdLNycQUGBLk7NtwSwqdffg7OtzZDh799bXcP5utYZmvsI1tnIeuwLjthZ6zHhNfyOl7X2/c/+Lv+oX/8EOp785xV9of4p78AL/DH4Q==</latexit><latexit sha1_base64="/6gvRiVHWpvPPoUCy6WHDPYY5S4=">AAAClHiclVFtSxtBEN47bbXpW6LQL35ZGgoKrdxJISIK8Q38JBYaDeTSY24zpxv39o7dOTEc94v8N37rv+kmBmm1UDow7DPPvO5MUihpKQh+ev7C4ouXS8uvGq/fvH33vtlaObd5aQT2RK5y00/AopIaeyRJYb8wCFmi8CK5Ppz6L27QWJnr7zQpcJjBpZapFECOipt3kYZEQXzEI1tmcRUR3lIFNyCV45HXfF1+Hm+4px9XclzzLzzKgK6StDqpHbNTP9pHzt4Z1xs/thp7/6r22PV/ysbNdrAZzIQ/B+EctNlczuLmfTTKRZmhJqHA2kEYFDSswJAUCutGVFosQFzDJQ4c1JChHVazpdb8k2NGPM2NU018xv6eUUFm7SRLXOR0UvvUNyX/5huUlG4PK6mLklCLh0ZpqTjlfHohPpIGBamJAyCMdLNycQUGBLk7NtwSwqdffg7OtzZDh799bXcP5utYZmvsI1tnIeuwLjthZ6zHhNfyOl7X2/c/+Lv+oX/8EOp785xV9of4p78AL/DH4Q==</latexit>
rD:j (Xij �Hi:D:j)2 = �2(Xij �Hi:D:j)Hi:
<latexit sha1_base64="oM8UY7ylBPcHKEtIDuzLK1B1Hgw=">AAACZXiclVHLSgMxFM2Mr1qttlXcuDBYhLqwzBRBKQhFu+iygn1AW4dMmmnTZjJDkhHKOD/pzq0bf8P0gWjrxguBk3PP4d6cuCGjUlnWu2FubG5t76R203v7mYPDbC7fkkEkMGnigAWi4yJJGOWkqahipBMKgnyXkbY7eZj12y9ESBrwJzUNSd9HQ049ipHSlJN97XHkMuTENSeujJMEFjtOTMcJvII9H6mR68X1RDOV5PteS+bSy+dy+g5elf/hWBU42YJVsuYF14G9BAWwrIaTfesNAhz5hCvMkJRd2wpVP0ZCUcxIku5FkoQIT9CQdDXkyCeyH89TSuCFZgbQC4Q+XME5+9MRI1/Kqe9q5WxNudqbkX/1upHybvsx5WGkCMeLQV7EoArgLHI4oIJgxaYaICyo3hXiERIIK/0xaR2CvfrkddAql2yNH68L1ftlHClwCs5BEdjgBlRBHTRAE2DwYaSMnJE3Ps2MeWyeLKSmsfQcgV9lnn0B4dq3Eg==</latexit><latexit sha1_base64="oM8UY7ylBPcHKEtIDuzLK1B1Hgw=">AAACZXiclVHLSgMxFM2Mr1qttlXcuDBYhLqwzBRBKQhFu+iygn1AW4dMmmnTZjJDkhHKOD/pzq0bf8P0gWjrxguBk3PP4d6cuCGjUlnWu2FubG5t76R203v7mYPDbC7fkkEkMGnigAWi4yJJGOWkqahipBMKgnyXkbY7eZj12y9ESBrwJzUNSd9HQ049ipHSlJN97XHkMuTENSeujJMEFjtOTMcJvII9H6mR68X1RDOV5PteS+bSy+dy+g5elf/hWBU42YJVsuYF14G9BAWwrIaTfesNAhz5hCvMkJRd2wpVP0ZCUcxIku5FkoQIT9CQdDXkyCeyH89TSuCFZgbQC4Q+XME5+9MRI1/Kqe9q5WxNudqbkX/1upHybvsx5WGkCMeLQV7EoArgLHI4oIJgxaYaICyo3hXiERIIK/0xaR2CvfrkddAql2yNH68L1ftlHClwCs5BEdjgBlRBHTRAE2DwYaSMnJE3Ps2MeWyeLKSmsfQcgV9lnn0B4dq3Eg==</latexit><latexit sha1_base64="oM8UY7ylBPcHKEtIDuzLK1B1Hgw=">AAACZXiclVHLSgMxFM2Mr1qttlXcuDBYhLqwzBRBKQhFu+iygn1AW4dMmmnTZjJDkhHKOD/pzq0bf8P0gWjrxguBk3PP4d6cuCGjUlnWu2FubG5t76R203v7mYPDbC7fkkEkMGnigAWi4yJJGOWkqahipBMKgnyXkbY7eZj12y9ESBrwJzUNSd9HQ049ipHSlJN97XHkMuTENSeujJMEFjtOTMcJvII9H6mR68X1RDOV5PteS+bSy+dy+g5elf/hWBU42YJVsuYF14G9BAWwrIaTfesNAhz5hCvMkJRd2wpVP0ZCUcxIku5FkoQIT9CQdDXkyCeyH89TSuCFZgbQC4Q+XME5+9MRI1/Kqe9q5WxNudqbkX/1upHybvsx5WGkCMeLQV7EoArgLHI4oIJgxaYaICyo3hXiERIIK/0xaR2CvfrkddAql2yNH68L1ftlHClwCs5BEdjgBlRBHTRAE2DwYaSMnJE3Ps2MeWyeLKSmsfQcgV9lnn0B4dq3Eg==</latexit><latexit sha1_base64="oM8UY7ylBPcHKEtIDuzLK1B1Hgw=">AAACZXiclVHLSgMxFM2Mr1qttlXcuDBYhLqwzBRBKQhFu+iygn1AW4dMmmnTZjJDkhHKOD/pzq0bf8P0gWjrxguBk3PP4d6cuCGjUlnWu2FubG5t76R203v7mYPDbC7fkkEkMGnigAWi4yJJGOWkqahipBMKgnyXkbY7eZj12y9ESBrwJzUNSd9HQ049ipHSlJN97XHkMuTENSeujJMEFjtOTMcJvII9H6mR68X1RDOV5PteS+bSy+dy+g5elf/hWBU42YJVsuYF14G9BAWwrIaTfesNAhz5hCvMkJRd2wpVP0ZCUcxIku5FkoQIT9CQdDXkyCeyH89TSuCFZgbQC4Q+XME5+9MRI1/Kqe9q5WxNudqbkX/1upHybvsx5WGkCMeLQV7EoArgLHI4oIJgxaYaICyo3hXiERIIK/0xaR2CvfrkddAql2yNH68L1ftlHClwCs5BEdjgBlRBHTRAE2DwYaSMnJE3Ps2MeWyeLKSmsfQcgV9lnn0B4dq3Eg==</latexit>
Gradient descent on H and D until convergence
Pros/cons of rep learning approaches• Neural networks✓ demonstrably useful in practice
✓ theoretical representability results
- can be difficult to optimize, due to non-convexity
- properties of solutions not well understood
- not natural for missing data
• Matrix factorization models✓ widely used for unsupervised learning
✓ simple to optimize, with well understood solutions in many situations
✓ amenable to missing data
- much fewer demonstrations of utility25 Can they both be used to get embeddings?
Neural Network Embedding
• We saw that the solution for a linear Auto-encoder (an NN) corresponds to PCA (which is a factorization approach)
• Similarly, we can input one-hot encoding of words into an auto-encoder and predict context vectors as outputs
• The hidden layer can be used as an embedding• again, we have a bit less control over if it exactly has the properties
we want it to have
• Why is it called an embedding? Isn’t the hidden layer a representation?
26
Missing data
• Can easily perform factorization even with missing data
• Important in an area called matrix completion or collaborative filtering
• This contrasts NNs, where it is less clear how to handle missing data (why?)
27
Embedding Movies and Users
28
k
H k Dd
n
Hi: is the representation of user i<latexit sha1_base64="ii4eC00MMWd4/r63aJXZUiI3+90=">AAACGnicbVDLSsNAFJ3UV62vqEs3g63gqiTdKK6KbrqsYFuhDWEyvWmHTh7MTIQS8h1u/BU3LhRxJ278GydpFtp6YOBw7rn3zj1ezJlUlvVtVNbWNza3qtu1nd29/QPz8Kgvo0RQ6NGIR+LeIxI4C6GnmOJwHwsggcdh4M1u8vrgAYRkUXin5jE4AZmEzGeUKC25pt0YBURNPT/tZG7KrrIGZhKrKWABepCEUBVOHPk4kSAwc8261bQK4FVil6SOSnRd83M0jmgS6FGUEymHthUrJyVCMcohq4303JjQGZnAUNOQBCCdtDgtw2daGWM/EvqFChfq746UBFLOA0878zvkci0X/6sNE+VfOikL40RBSBeL/IRjFeE8JzxmAqjic00IFUz/FdMpEYQqnWZNh2Avn7xK+q2mrfltq96+LuOoohN0is6RjS5QG3VQF/UQRY/oGb2iN+PJeDHejY+FtWKUPcfoD4yvH8apoLA=</latexit><latexit sha1_base64="ii4eC00MMWd4/r63aJXZUiI3+90=">AAACGnicbVDLSsNAFJ3UV62vqEs3g63gqiTdKK6KbrqsYFuhDWEyvWmHTh7MTIQS8h1u/BU3LhRxJ278GydpFtp6YOBw7rn3zj1ezJlUlvVtVNbWNza3qtu1nd29/QPz8Kgvo0RQ6NGIR+LeIxI4C6GnmOJwHwsggcdh4M1u8vrgAYRkUXin5jE4AZmEzGeUKC25pt0YBURNPT/tZG7KrrIGZhKrKWABepCEUBVOHPk4kSAwc8261bQK4FVil6SOSnRd83M0jmgS6FGUEymHthUrJyVCMcohq4303JjQGZnAUNOQBCCdtDgtw2daGWM/EvqFChfq746UBFLOA0878zvkci0X/6sNE+VfOikL40RBSBeL/IRjFeE8JzxmAqjic00IFUz/FdMpEYQqnWZNh2Avn7xK+q2mrfltq96+LuOoohN0is6RjS5QG3VQF/UQRY/oGb2iN+PJeDHejY+FtWKUPcfoD4yvH8apoLA=</latexit><latexit sha1_base64="ii4eC00MMWd4/r63aJXZUiI3+90=">AAACGnicbVDLSsNAFJ3UV62vqEs3g63gqiTdKK6KbrqsYFuhDWEyvWmHTh7MTIQS8h1u/BU3LhRxJ278GydpFtp6YOBw7rn3zj1ezJlUlvVtVNbWNza3qtu1nd29/QPz8Kgvo0RQ6NGIR+LeIxI4C6GnmOJwHwsggcdh4M1u8vrgAYRkUXin5jE4AZmEzGeUKC25pt0YBURNPT/tZG7KrrIGZhKrKWABepCEUBVOHPk4kSAwc8261bQK4FVil6SOSnRd83M0jmgS6FGUEymHthUrJyVCMcohq4303JjQGZnAUNOQBCCdtDgtw2daGWM/EvqFChfq746UBFLOA0878zvkci0X/6sNE+VfOikL40RBSBeL/IRjFeE8JzxmAqjic00IFUz/FdMpEYQqnWZNh2Avn7xK+q2mrfltq96+LuOoohN0is6RjS5QG3VQF/UQRY/oGb2iN+PJeDHejY+FtWKUPcfoD4yvH8apoLA=</latexit><latexit sha1_base64="ii4eC00MMWd4/r63aJXZUiI3+90=">AAACGnicbVDLSsNAFJ3UV62vqEs3g63gqiTdKK6KbrqsYFuhDWEyvWmHTh7MTIQS8h1u/BU3LhRxJ278GydpFtp6YOBw7rn3zj1ezJlUlvVtVNbWNza3qtu1nd29/QPz8Kgvo0RQ6NGIR+LeIxI4C6GnmOJwHwsggcdh4M1u8vrgAYRkUXin5jE4AZmEzGeUKC25pt0YBURNPT/tZG7KrrIGZhKrKWABepCEUBVOHPk4kSAwc8261bQK4FVil6SOSnRd83M0jmgS6FGUEymHthUrJyVCMcohq4303JjQGZnAUNOQBCCdtDgtw2daGWM/EvqFChfq746UBFLOA0878zvkci0X/6sNE+VfOikL40RBSBeL/IRjFeE8JzxmAqjic00IFUz/FdMpEYQqnWZNh2Avn7xK+q2mrfltq96+LuOoohN0is6RjS5QG3VQF/UQRY/oGb2iN+PJeDHejY+FtWKUPcfoD4yvH8apoLA=</latexit>
D:j is the representation of movie j<latexit sha1_base64="3S3QJz3BblOJ3LKCyxmzZ3GHf3k=">AAACG3icbVC7TsMwFHXKq5RXgZHFokViqpIuIKYKGBiLRB9SG1WOe9O6deLIdipVUf+DhV9hYQAhJiQG/gYn7QAtR7J0dM59+B4v4kxp2/62cmvrG5tb+e3Czu7e/kHx8KipRCwpNKjgQrY9ooCzEBqaaQ7tSAIJPA4tb3yT+q0JSMVE+KCnEbgBGYTMZ5RoI/WK1XI3IHro+cntrJdcjWZlzBTWQ8ASzCAFoc4qsfBxICYM8KhXLNkVOwNeJc6ClNAC9V7xs9sXNA7MLMqJUh3HjrSbEKkZ5TArdGMFEaFjMoCOoSEJQLlJdtsMnxmlj30hzQs1ztTfHQkJlJoGnqlMD1HLXir+53Vi7V+6CQujWENI54v8mGMtcBoU7jMJVPOpIYRKZv6K6ZBIQrWJs2BCcJZPXiXNasUx/L5aql0v4sijE3SKzpGDLlAN3aE6aiCKHtEzekVv1pP1Yr1bH/PSnLXoOUZ/YH39AI2uoRk=</latexit><latexit sha1_base64="3S3QJz3BblOJ3LKCyxmzZ3GHf3k=">AAACG3icbVC7TsMwFHXKq5RXgZHFokViqpIuIKYKGBiLRB9SG1WOe9O6deLIdipVUf+DhV9hYQAhJiQG/gYn7QAtR7J0dM59+B4v4kxp2/62cmvrG5tb+e3Czu7e/kHx8KipRCwpNKjgQrY9ooCzEBqaaQ7tSAIJPA4tb3yT+q0JSMVE+KCnEbgBGYTMZ5RoI/WK1XI3IHro+cntrJdcjWZlzBTWQ8ASzCAFoc4qsfBxICYM8KhXLNkVOwNeJc6ClNAC9V7xs9sXNA7MLMqJUh3HjrSbEKkZ5TArdGMFEaFjMoCOoSEJQLlJdtsMnxmlj30hzQs1ztTfHQkJlJoGnqlMD1HLXir+53Vi7V+6CQujWENI54v8mGMtcBoU7jMJVPOpIYRKZv6K6ZBIQrWJs2BCcJZPXiXNasUx/L5aql0v4sijE3SKzpGDLlAN3aE6aiCKHtEzekVv1pP1Yr1bH/PSnLXoOUZ/YH39AI2uoRk=</latexit><latexit sha1_base64="3S3QJz3BblOJ3LKCyxmzZ3GHf3k=">AAACG3icbVC7TsMwFHXKq5RXgZHFokViqpIuIKYKGBiLRB9SG1WOe9O6deLIdipVUf+DhV9hYQAhJiQG/gYn7QAtR7J0dM59+B4v4kxp2/62cmvrG5tb+e3Czu7e/kHx8KipRCwpNKjgQrY9ooCzEBqaaQ7tSAIJPA4tb3yT+q0JSMVE+KCnEbgBGYTMZ5RoI/WK1XI3IHro+cntrJdcjWZlzBTWQ8ASzCAFoc4qsfBxICYM8KhXLNkVOwNeJc6ClNAC9V7xs9sXNA7MLMqJUh3HjrSbEKkZ5TArdGMFEaFjMoCOoSEJQLlJdtsMnxmlj30hzQs1ztTfHQkJlJoGnqlMD1HLXir+53Vi7V+6CQujWENI54v8mGMtcBoU7jMJVPOpIYRKZv6K6ZBIQrWJs2BCcJZPXiXNasUx/L5aql0v4sijE3SKzpGDLlAN3aE6aiCKHtEzekVv1pP1Yr1bH/PSnLXoOUZ/YH39AI2uoRk=</latexit><latexit sha1_base64="3S3QJz3BblOJ3LKCyxmzZ3GHf3k=">AAACG3icbVC7TsMwFHXKq5RXgZHFokViqpIuIKYKGBiLRB9SG1WOe9O6deLIdipVUf+DhV9hYQAhJiQG/gYn7QAtR7J0dM59+B4v4kxp2/62cmvrG5tb+e3Czu7e/kHx8KipRCwpNKjgQrY9ooCzEBqaaQ7tSAIJPA4tb3yT+q0JSMVE+KCnEbgBGYTMZ5RoI/WK1XI3IHro+cntrJdcjWZlzBTWQ8ASzCAFoc4qsfBxICYM8KhXLNkVOwNeJc6ClNAC9V7xs9sXNA7MLMqJUh3HjrSbEKkZ5TArdGMFEaFjMoCOoSEJQLlJdtsMnxmlj30hzQs1ztTfHQkJlJoGnqlMD1HLXir+53Vi7V+6CQujWENI54v8mGMtcBoU7jMJVPOpIYRKZv6K6ZBIQrWJs2BCcJZPXiXNasUx/L5aql0v4sijE3SKzpGDLlAN3aE6aiCKHtEzekVv1pP1Yr1bH/PSnLXoOUZ/YH39AI2uoRk=</latexit>
Example Hi: = [like comedies, . . .] = [1, . . .]
D:j = [is a comedy, . . .] = [1, . . .]
Hi:D:j = 1 + . . .<latexit sha1_base64="lXg+k7tM1f4dDIz3PGZoQpv6aVc=">AAACqXicjVFNa9wwEJXdj6TuR7btsZehS0MhJdglkBIIhDaFHHpIoLtZujaLLI8TdWXLSOOSxfi/9Tf0ln9Tedcp6aaHDgie3rx5Gs2klZKWwvDa8+/df/BwY/NR8PjJ02dbg+cvxlbXRuBIaKXNJOUWlSxxRJIUTiqDvEgVnqfzT13+/AcaK3X5lRYVJgW/KGUuBSdHzQY/Y8Iraj5f8aJSCC3EBafLNG9O2lkjD1rYPoTpSqPkHEHoAjOJtn0Hsco02QScIPpzi+PgxuHYORx8v+0gLfCVw+I/6m86WPc7hAh2en0QzAbDcDdcBtwFUQ+GrI/T2eBXnGlRF1iSUNzaaRRWlDTckBQK2yCuLVZczPkFTh0seYE2aZaTbuGNYzLItXGnJFiytysaXli7KFKn7Lq267mO/FduWlP+IWlkWdWEpVg9lNcKSEO3NsikQUFq4QAXRrpeQVxywwW55XZDiNa/fBeM3+9GDp/tDY8+9uPYZK/Ya/aWRWyfHbETdspGTHjb3hdv5I39Hf/Mn/jfVlLf62tesr/CF78B0tHOvg==</latexit><latexit sha1_base64="lXg+k7tM1f4dDIz3PGZoQpv6aVc=">AAACqXicjVFNa9wwEJXdj6TuR7btsZehS0MhJdglkBIIhDaFHHpIoLtZujaLLI8TdWXLSOOSxfi/9Tf0ln9Tedcp6aaHDgie3rx5Gs2klZKWwvDa8+/df/BwY/NR8PjJ02dbg+cvxlbXRuBIaKXNJOUWlSxxRJIUTiqDvEgVnqfzT13+/AcaK3X5lRYVJgW/KGUuBSdHzQY/Y8Iraj5f8aJSCC3EBafLNG9O2lkjD1rYPoTpSqPkHEHoAjOJtn0Hsco02QScIPpzi+PgxuHYORx8v+0gLfCVw+I/6m86WPc7hAh2en0QzAbDcDdcBtwFUQ+GrI/T2eBXnGlRF1iSUNzaaRRWlDTckBQK2yCuLVZczPkFTh0seYE2aZaTbuGNYzLItXGnJFiytysaXli7KFKn7Lq267mO/FduWlP+IWlkWdWEpVg9lNcKSEO3NsikQUFq4QAXRrpeQVxywwW55XZDiNa/fBeM3+9GDp/tDY8+9uPYZK/Ya/aWRWyfHbETdspGTHjb3hdv5I39Hf/Mn/jfVlLf62tesr/CF78B0tHOvg==</latexit><latexit sha1_base64="lXg+k7tM1f4dDIz3PGZoQpv6aVc=">AAACqXicjVFNa9wwEJXdj6TuR7btsZehS0MhJdglkBIIhDaFHHpIoLtZujaLLI8TdWXLSOOSxfi/9Tf0ln9Tedcp6aaHDgie3rx5Gs2klZKWwvDa8+/df/BwY/NR8PjJ02dbg+cvxlbXRuBIaKXNJOUWlSxxRJIUTiqDvEgVnqfzT13+/AcaK3X5lRYVJgW/KGUuBSdHzQY/Y8Iraj5f8aJSCC3EBafLNG9O2lkjD1rYPoTpSqPkHEHoAjOJtn0Hsco02QScIPpzi+PgxuHYORx8v+0gLfCVw+I/6m86WPc7hAh2en0QzAbDcDdcBtwFUQ+GrI/T2eBXnGlRF1iSUNzaaRRWlDTckBQK2yCuLVZczPkFTh0seYE2aZaTbuGNYzLItXGnJFiytysaXli7KFKn7Lq267mO/FduWlP+IWlkWdWEpVg9lNcKSEO3NsikQUFq4QAXRrpeQVxywwW55XZDiNa/fBeM3+9GDp/tDY8+9uPYZK/Ya/aWRWyfHbETdspGTHjb3hdv5I39Hf/Mn/jfVlLf62tesr/CF78B0tHOvg==</latexit><latexit sha1_base64="lXg+k7tM1f4dDIz3PGZoQpv6aVc=">AAACqXicjVFNa9wwEJXdj6TuR7btsZehS0MhJdglkBIIhDaFHHpIoLtZujaLLI8TdWXLSOOSxfi/9Tf0ln9Tedcp6aaHDgie3rx5Gs2klZKWwvDa8+/df/BwY/NR8PjJ02dbg+cvxlbXRuBIaKXNJOUWlSxxRJIUTiqDvEgVnqfzT13+/AcaK3X5lRYVJgW/KGUuBSdHzQY/Y8Iraj5f8aJSCC3EBafLNG9O2lkjD1rYPoTpSqPkHEHoAjOJtn0Hsco02QScIPpzi+PgxuHYORx8v+0gLfCVw+I/6m86WPc7hAh2en0QzAbDcDdcBtwFUQ+GrI/T2eBXnGlRF1iSUNzaaRRWlDTckBQK2yCuLVZczPkFTh0seYE2aZaTbuGNYzLItXGnJFiytysaXli7KFKn7Lq267mO/FduWlP+IWlkWdWEpVg9lNcKSEO3NsikQUFq4QAXRrpeQVxywwW55XZDiNa/fBeM3+9GDp/tDY8+9uPYZK/Ya/aWRWyfHbETdspGTHjb3hdv5I39Hf/Mn/jfVlLf62tesr/CF78B0tHOvg==</latexit>
⇡
Matrix completion
29
⇡
Subspace (low-rank) form
k
H k Dd
n
Hi: is the representation of user i<latexit sha1_base64="ii4eC00MMWd4/r63aJXZUiI3+90=">AAACGnicbVDLSsNAFJ3UV62vqEs3g63gqiTdKK6KbrqsYFuhDWEyvWmHTh7MTIQS8h1u/BU3LhRxJ278GydpFtp6YOBw7rn3zj1ezJlUlvVtVNbWNza3qtu1nd29/QPz8Kgvo0RQ6NGIR+LeIxI4C6GnmOJwHwsggcdh4M1u8vrgAYRkUXin5jE4AZmEzGeUKC25pt0YBURNPT/tZG7KrrIGZhKrKWABepCEUBVOHPk4kSAwc8261bQK4FVil6SOSnRd83M0jmgS6FGUEymHthUrJyVCMcohq4303JjQGZnAUNOQBCCdtDgtw2daGWM/EvqFChfq746UBFLOA0878zvkci0X/6sNE+VfOikL40RBSBeL/IRjFeE8JzxmAqjic00IFUz/FdMpEYQqnWZNh2Avn7xK+q2mrfltq96+LuOoohN0is6RjS5QG3VQF/UQRY/oGb2iN+PJeDHejY+FtWKUPcfoD4yvH8apoLA=</latexit><latexit sha1_base64="ii4eC00MMWd4/r63aJXZUiI3+90=">AAACGnicbVDLSsNAFJ3UV62vqEs3g63gqiTdKK6KbrqsYFuhDWEyvWmHTh7MTIQS8h1u/BU3LhRxJ278GydpFtp6YOBw7rn3zj1ezJlUlvVtVNbWNza3qtu1nd29/QPz8Kgvo0RQ6NGIR+LeIxI4C6GnmOJwHwsggcdh4M1u8vrgAYRkUXin5jE4AZmEzGeUKC25pt0YBURNPT/tZG7KrrIGZhKrKWABepCEUBVOHPk4kSAwc8261bQK4FVil6SOSnRd83M0jmgS6FGUEymHthUrJyVCMcohq4303JjQGZnAUNOQBCCdtDgtw2daGWM/EvqFChfq746UBFLOA0878zvkci0X/6sNE+VfOikL40RBSBeL/IRjFeE8JzxmAqjic00IFUz/FdMpEYQqnWZNh2Avn7xK+q2mrfltq96+LuOoohN0is6RjS5QG3VQF/UQRY/oGb2iN+PJeDHejY+FtWKUPcfoD4yvH8apoLA=</latexit><latexit sha1_base64="ii4eC00MMWd4/r63aJXZUiI3+90=">AAACGnicbVDLSsNAFJ3UV62vqEs3g63gqiTdKK6KbrqsYFuhDWEyvWmHTh7MTIQS8h1u/BU3LhRxJ278GydpFtp6YOBw7rn3zj1ezJlUlvVtVNbWNza3qtu1nd29/QPz8Kgvo0RQ6NGIR+LeIxI4C6GnmOJwHwsggcdh4M1u8vrgAYRkUXin5jE4AZmEzGeUKC25pt0YBURNPT/tZG7KrrIGZhKrKWABepCEUBVOHPk4kSAwc8261bQK4FVil6SOSnRd83M0jmgS6FGUEymHthUrJyVCMcohq4303JjQGZnAUNOQBCCdtDgtw2daGWM/EvqFChfq746UBFLOA0878zvkci0X/6sNE+VfOikL40RBSBeL/IRjFeE8JzxmAqjic00IFUz/FdMpEYQqnWZNh2Avn7xK+q2mrfltq96+LuOoohN0is6RjS5QG3VQF/UQRY/oGb2iN+PJeDHejY+FtWKUPcfoD4yvH8apoLA=</latexit><latexit sha1_base64="ii4eC00MMWd4/r63aJXZUiI3+90=">AAACGnicbVDLSsNAFJ3UV62vqEs3g63gqiTdKK6KbrqsYFuhDWEyvWmHTh7MTIQS8h1u/BU3LhRxJ278GydpFtp6YOBw7rn3zj1ezJlUlvVtVNbWNza3qtu1nd29/QPz8Kgvo0RQ6NGIR+LeIxI4C6GnmOJwHwsggcdh4M1u8vrgAYRkUXin5jE4AZmEzGeUKC25pt0YBURNPT/tZG7KrrIGZhKrKWABepCEUBVOHPk4kSAwc8261bQK4FVil6SOSnRd83M0jmgS6FGUEymHthUrJyVCMcohq4303JjQGZnAUNOQBCCdtDgtw2daGWM/EvqFChfq746UBFLOA0878zvkci0X/6sNE+VfOikL40RBSBeL/IRjFeE8JzxmAqjic00IFUz/FdMpEYQqnWZNh2Avn7xK+q2mrfltq96+LuOoohN0is6RjS5QG3VQF/UQRY/oGb2iN+PJeDHejY+FtWKUPcfoD4yvH8apoLA=</latexit>
D:j is the representation of movie j<latexit sha1_base64="3S3QJz3BblOJ3LKCyxmzZ3GHf3k=">AAACG3icbVC7TsMwFHXKq5RXgZHFokViqpIuIKYKGBiLRB9SG1WOe9O6deLIdipVUf+DhV9hYQAhJiQG/gYn7QAtR7J0dM59+B4v4kxp2/62cmvrG5tb+e3Czu7e/kHx8KipRCwpNKjgQrY9ooCzEBqaaQ7tSAIJPA4tb3yT+q0JSMVE+KCnEbgBGYTMZ5RoI/WK1XI3IHro+cntrJdcjWZlzBTWQ8ASzCAFoc4qsfBxICYM8KhXLNkVOwNeJc6ClNAC9V7xs9sXNA7MLMqJUh3HjrSbEKkZ5TArdGMFEaFjMoCOoSEJQLlJdtsMnxmlj30hzQs1ztTfHQkJlJoGnqlMD1HLXir+53Vi7V+6CQujWENI54v8mGMtcBoU7jMJVPOpIYRKZv6K6ZBIQrWJs2BCcJZPXiXNasUx/L5aql0v4sijE3SKzpGDLlAN3aE6aiCKHtEzekVv1pP1Yr1bH/PSnLXoOUZ/YH39AI2uoRk=</latexit><latexit sha1_base64="3S3QJz3BblOJ3LKCyxmzZ3GHf3k=">AAACG3icbVC7TsMwFHXKq5RXgZHFokViqpIuIKYKGBiLRB9SG1WOe9O6deLIdipVUf+DhV9hYQAhJiQG/gYn7QAtR7J0dM59+B4v4kxp2/62cmvrG5tb+e3Czu7e/kHx8KipRCwpNKjgQrY9ooCzEBqaaQ7tSAIJPA4tb3yT+q0JSMVE+KCnEbgBGYTMZ5RoI/WK1XI3IHro+cntrJdcjWZlzBTWQ8ASzCAFoc4qsfBxICYM8KhXLNkVOwNeJc6ClNAC9V7xs9sXNA7MLMqJUh3HjrSbEKkZ5TArdGMFEaFjMoCOoSEJQLlJdtsMnxmlj30hzQs1ztTfHQkJlJoGnqlMD1HLXir+53Vi7V+6CQujWENI54v8mGMtcBoU7jMJVPOpIYRKZv6K6ZBIQrWJs2BCcJZPXiXNasUx/L5aql0v4sijE3SKzpGDLlAN3aE6aiCKHtEzekVv1pP1Yr1bH/PSnLXoOUZ/YH39AI2uoRk=</latexit><latexit sha1_base64="3S3QJz3BblOJ3LKCyxmzZ3GHf3k=">AAACG3icbVC7TsMwFHXKq5RXgZHFokViqpIuIKYKGBiLRB9SG1WOe9O6deLIdipVUf+DhV9hYQAhJiQG/gYn7QAtR7J0dM59+B4v4kxp2/62cmvrG5tb+e3Czu7e/kHx8KipRCwpNKjgQrY9ooCzEBqaaQ7tSAIJPA4tb3yT+q0JSMVE+KCnEbgBGYTMZ5RoI/WK1XI3IHro+cntrJdcjWZlzBTWQ8ASzCAFoc4qsfBxICYM8KhXLNkVOwNeJc6ClNAC9V7xs9sXNA7MLMqJUh3HjrSbEKkZ5TArdGMFEaFjMoCOoSEJQLlJdtsMnxmlj30hzQs1ztTfHQkJlJoGnqlMD1HLXir+53Vi7V+6CQujWENI54v8mGMtcBoU7jMJVPOpIYRKZv6K6ZBIQrWJs2BCcJZPXiXNasUx/L5aql0v4sijE3SKzpGDLlAN3aE6aiCKHtEzekVv1pP1Yr1bH/PSnLXoOUZ/YH39AI2uoRk=</latexit><latexit sha1_base64="3S3QJz3BblOJ3LKCyxmzZ3GHf3k=">AAACG3icbVC7TsMwFHXKq5RXgZHFokViqpIuIKYKGBiLRB9SG1WOe9O6deLIdipVUf+DhV9hYQAhJiQG/gYn7QAtR7J0dM59+B4v4kxp2/62cmvrG5tb+e3Czu7e/kHx8KipRCwpNKjgQrY9ooCzEBqaaQ7tSAIJPA4tb3yT+q0JSMVE+KCnEbgBGYTMZ5RoI/WK1XI3IHro+cntrJdcjWZlzBTWQ8ASzCAFoc4qsfBxICYM8KhXLNkVOwNeJc6ClNAC9V7xs9sXNA7MLMqJUh3HjrSbEKkZ5TArdGMFEaFjMoCOoSEJQLlJdtsMnxmlj30hzQs1ztTfHQkJlJoGnqlMD1HLXir+53Vi7V+6CQujWENI54v8mGMtcBoU7jMJVPOpIYRKZv6K6ZBIQrWJs2BCcJZPXiXNasUx/L5aql0v4sijE3SKzpGDLlAN3aE6aiCKHtEzekVv1pP1Yr1bH/PSnLXoOUZ/YH39AI2uoRk=</latexit>
Matrix completion
30
k
H k Dd
n
Hi: is the representation of user i<latexit sha1_base64="ii4eC00MMWd4/r63aJXZUiI3+90=">AAACGnicbVDLSsNAFJ3UV62vqEs3g63gqiTdKK6KbrqsYFuhDWEyvWmHTh7MTIQS8h1u/BU3LhRxJ278GydpFtp6YOBw7rn3zj1ezJlUlvVtVNbWNza3qtu1nd29/QPz8Kgvo0RQ6NGIR+LeIxI4C6GnmOJwHwsggcdh4M1u8vrgAYRkUXin5jE4AZmEzGeUKC25pt0YBURNPT/tZG7KrrIGZhKrKWABepCEUBVOHPk4kSAwc8261bQK4FVil6SOSnRd83M0jmgS6FGUEymHthUrJyVCMcohq4303JjQGZnAUNOQBCCdtDgtw2daGWM/EvqFChfq746UBFLOA0878zvkci0X/6sNE+VfOikL40RBSBeL/IRjFeE8JzxmAqjic00IFUz/FdMpEYQqnWZNh2Avn7xK+q2mrfltq96+LuOoohN0is6RjS5QG3VQF/UQRY/oGb2iN+PJeDHejY+FtWKUPcfoD4yvH8apoLA=</latexit><latexit sha1_base64="ii4eC00MMWd4/r63aJXZUiI3+90=">AAACGnicbVDLSsNAFJ3UV62vqEs3g63gqiTdKK6KbrqsYFuhDWEyvWmHTh7MTIQS8h1u/BU3LhRxJ278GydpFtp6YOBw7rn3zj1ezJlUlvVtVNbWNza3qtu1nd29/QPz8Kgvo0RQ6NGIR+LeIxI4C6GnmOJwHwsggcdh4M1u8vrgAYRkUXin5jE4AZmEzGeUKC25pt0YBURNPT/tZG7KrrIGZhKrKWABepCEUBVOHPk4kSAwc8261bQK4FVil6SOSnRd83M0jmgS6FGUEymHthUrJyVCMcohq4303JjQGZnAUNOQBCCdtDgtw2daGWM/EvqFChfq746UBFLOA0878zvkci0X/6sNE+VfOikL40RBSBeL/IRjFeE8JzxmAqjic00IFUz/FdMpEYQqnWZNh2Avn7xK+q2mrfltq96+LuOoohN0is6RjS5QG3VQF/UQRY/oGb2iN+PJeDHejY+FtWKUPcfoD4yvH8apoLA=</latexit><latexit sha1_base64="ii4eC00MMWd4/r63aJXZUiI3+90=">AAACGnicbVDLSsNAFJ3UV62vqEs3g63gqiTdKK6KbrqsYFuhDWEyvWmHTh7MTIQS8h1u/BU3LhRxJ278GydpFtp6YOBw7rn3zj1ezJlUlvVtVNbWNza3qtu1nd29/QPz8Kgvo0RQ6NGIR+LeIxI4C6GnmOJwHwsggcdh4M1u8vrgAYRkUXin5jE4AZmEzGeUKC25pt0YBURNPT/tZG7KrrIGZhKrKWABepCEUBVOHPk4kSAwc8261bQK4FVil6SOSnRd83M0jmgS6FGUEymHthUrJyVCMcohq4303JjQGZnAUNOQBCCdtDgtw2daGWM/EvqFChfq746UBFLOA0878zvkci0X/6sNE+VfOikL40RBSBeL/IRjFeE8JzxmAqjic00IFUz/FdMpEYQqnWZNh2Avn7xK+q2mrfltq96+LuOoohN0is6RjS5QG3VQF/UQRY/oGb2iN+PJeDHejY+FtWKUPcfoD4yvH8apoLA=</latexit><latexit sha1_base64="ii4eC00MMWd4/r63aJXZUiI3+90=">AAACGnicbVDLSsNAFJ3UV62vqEs3g63gqiTdKK6KbrqsYFuhDWEyvWmHTh7MTIQS8h1u/BU3LhRxJ278GydpFtp6YOBw7rn3zj1ezJlUlvVtVNbWNza3qtu1nd29/QPz8Kgvo0RQ6NGIR+LeIxI4C6GnmOJwHwsggcdh4M1u8vrgAYRkUXin5jE4AZmEzGeUKC25pt0YBURNPT/tZG7KrrIGZhKrKWABepCEUBVOHPk4kSAwc8261bQK4FVil6SOSnRd83M0jmgS6FGUEymHthUrJyVCMcohq4303JjQGZnAUNOQBCCdtDgtw2daGWM/EvqFChfq746UBFLOA0878zvkci0X/6sNE+VfOikL40RBSBeL/IRjFeE8JzxmAqjic00IFUz/FdMpEYQqnWZNh2Avn7xK+q2mrfltq96+LuOoohN0is6RjS5QG3VQF/UQRY/oGb2iN+PJeDHejY+FtWKUPcfoD4yvH8apoLA=</latexit>
D:j is the representation of movie j<latexit sha1_base64="3S3QJz3BblOJ3LKCyxmzZ3GHf3k=">AAACG3icbVC7TsMwFHXKq5RXgZHFokViqpIuIKYKGBiLRB9SG1WOe9O6deLIdipVUf+DhV9hYQAhJiQG/gYn7QAtR7J0dM59+B4v4kxp2/62cmvrG5tb+e3Czu7e/kHx8KipRCwpNKjgQrY9ooCzEBqaaQ7tSAIJPA4tb3yT+q0JSMVE+KCnEbgBGYTMZ5RoI/WK1XI3IHro+cntrJdcjWZlzBTWQ8ASzCAFoc4qsfBxICYM8KhXLNkVOwNeJc6ClNAC9V7xs9sXNA7MLMqJUh3HjrSbEKkZ5TArdGMFEaFjMoCOoSEJQLlJdtsMnxmlj30hzQs1ztTfHQkJlJoGnqlMD1HLXir+53Vi7V+6CQujWENI54v8mGMtcBoU7jMJVPOpIYRKZv6K6ZBIQrWJs2BCcJZPXiXNasUx/L5aql0v4sijE3SKzpGDLlAN3aE6aiCKHtEzekVv1pP1Yr1bH/PSnLXoOUZ/YH39AI2uoRk=</latexit><latexit sha1_base64="3S3QJz3BblOJ3LKCyxmzZ3GHf3k=">AAACG3icbVC7TsMwFHXKq5RXgZHFokViqpIuIKYKGBiLRB9SG1WOe9O6deLIdipVUf+DhV9hYQAhJiQG/gYn7QAtR7J0dM59+B4v4kxp2/62cmvrG5tb+e3Czu7e/kHx8KipRCwpNKjgQrY9ooCzEBqaaQ7tSAIJPA4tb3yT+q0JSMVE+KCnEbgBGYTMZ5RoI/WK1XI3IHro+cntrJdcjWZlzBTWQ8ASzCAFoc4qsfBxICYM8KhXLNkVOwNeJc6ClNAC9V7xs9sXNA7MLMqJUh3HjrSbEKkZ5TArdGMFEaFjMoCOoSEJQLlJdtsMnxmlj30hzQs1ztTfHQkJlJoGnqlMD1HLXir+53Vi7V+6CQujWENI54v8mGMtcBoU7jMJVPOpIYRKZv6K6ZBIQrWJs2BCcJZPXiXNasUx/L5aql0v4sijE3SKzpGDLlAN3aE6aiCKHtEzekVv1pP1Yr1bH/PSnLXoOUZ/YH39AI2uoRk=</latexit><latexit sha1_base64="3S3QJz3BblOJ3LKCyxmzZ3GHf3k=">AAACG3icbVC7TsMwFHXKq5RXgZHFokViqpIuIKYKGBiLRB9SG1WOe9O6deLIdipVUf+DhV9hYQAhJiQG/gYn7QAtR7J0dM59+B4v4kxp2/62cmvrG5tb+e3Czu7e/kHx8KipRCwpNKjgQrY9ooCzEBqaaQ7tSAIJPA4tb3yT+q0JSMVE+KCnEbgBGYTMZ5RoI/WK1XI3IHro+cntrJdcjWZlzBTWQ8ASzCAFoc4qsfBxICYM8KhXLNkVOwNeJc6ClNAC9V7xs9sXNA7MLMqJUh3HjrSbEKkZ5TArdGMFEaFjMoCOoSEJQLlJdtsMnxmlj30hzQs1ztTfHQkJlJoGnqlMD1HLXir+53Vi7V+6CQujWENI54v8mGMtcBoU7jMJVPOpIYRKZv6K6ZBIQrWJs2BCcJZPXiXNasUx/L5aql0v4sijE3SKzpGDLlAN3aE6aiCKHtEzekVv1pP1Yr1bH/PSnLXoOUZ/YH39AI2uoRk=</latexit><latexit sha1_base64="3S3QJz3BblOJ3LKCyxmzZ3GHf3k=">AAACG3icbVC7TsMwFHXKq5RXgZHFokViqpIuIKYKGBiLRB9SG1WOe9O6deLIdipVUf+DhV9hYQAhJiQG/gYn7QAtR7J0dM59+B4v4kxp2/62cmvrG5tb+e3Czu7e/kHx8KipRCwpNKjgQrY9ooCzEBqaaQ7tSAIJPA4tb3yT+q0JSMVE+KCnEbgBGYTMZ5RoI/WK1XI3IHro+cntrJdcjWZlzBTWQ8ASzCAFoc4qsfBxICYM8KhXLNkVOwNeJc6ClNAC9V7xs9sXNA7MLMqJUh3HjrSbEKkZ5TArdGMFEaFjMoCOoSEJQLlJdtsMnxmlj30hzQs1ztTfHQkJlJoGnqlMD1HLXir+53Vi7V+6CQujWENI54v8mGMtcBoU7jMJVPOpIYRKZv6K6ZBIQrWJs2BCcJZPXiXNasUx/L5aql0v4sijE3SKzpGDLlAN3aE6aiCKHtEzekVv1pP1Yr1bH/PSnLXoOUZ/YH39AI2uoRk=</latexit>
Example Hi: = [like comedies, . . .] = [1, . . .]
D:j = [is a comedy, . . .] = [1, . . .]
Hi:D:j = 1 + . . .<latexit sha1_base64="lXg+k7tM1f4dDIz3PGZoQpv6aVc=">AAACqXicjVFNa9wwEJXdj6TuR7btsZehS0MhJdglkBIIhDaFHHpIoLtZujaLLI8TdWXLSOOSxfi/9Tf0ln9Tedcp6aaHDgie3rx5Gs2klZKWwvDa8+/df/BwY/NR8PjJ02dbg+cvxlbXRuBIaKXNJOUWlSxxRJIUTiqDvEgVnqfzT13+/AcaK3X5lRYVJgW/KGUuBSdHzQY/Y8Iraj5f8aJSCC3EBafLNG9O2lkjD1rYPoTpSqPkHEHoAjOJtn0Hsco02QScIPpzi+PgxuHYORx8v+0gLfCVw+I/6m86WPc7hAh2en0QzAbDcDdcBtwFUQ+GrI/T2eBXnGlRF1iSUNzaaRRWlDTckBQK2yCuLVZczPkFTh0seYE2aZaTbuGNYzLItXGnJFiytysaXli7KFKn7Lq267mO/FduWlP+IWlkWdWEpVg9lNcKSEO3NsikQUFq4QAXRrpeQVxywwW55XZDiNa/fBeM3+9GDp/tDY8+9uPYZK/Ya/aWRWyfHbETdspGTHjb3hdv5I39Hf/Mn/jfVlLf62tesr/CF78B0tHOvg==</latexit><latexit sha1_base64="lXg+k7tM1f4dDIz3PGZoQpv6aVc=">AAACqXicjVFNa9wwEJXdj6TuR7btsZehS0MhJdglkBIIhDaFHHpIoLtZujaLLI8TdWXLSOOSxfi/9Tf0ln9Tedcp6aaHDgie3rx5Gs2klZKWwvDa8+/df/BwY/NR8PjJ02dbg+cvxlbXRuBIaKXNJOUWlSxxRJIUTiqDvEgVnqfzT13+/AcaK3X5lRYVJgW/KGUuBSdHzQY/Y8Iraj5f8aJSCC3EBafLNG9O2lkjD1rYPoTpSqPkHEHoAjOJtn0Hsco02QScIPpzi+PgxuHYORx8v+0gLfCVw+I/6m86WPc7hAh2en0QzAbDcDdcBtwFUQ+GrI/T2eBXnGlRF1iSUNzaaRRWlDTckBQK2yCuLVZczPkFTh0seYE2aZaTbuGNYzLItXGnJFiytysaXli7KFKn7Lq267mO/FduWlP+IWlkWdWEpVg9lNcKSEO3NsikQUFq4QAXRrpeQVxywwW55XZDiNa/fBeM3+9GDp/tDY8+9uPYZK/Ya/aWRWyfHbETdspGTHjb3hdv5I39Hf/Mn/jfVlLf62tesr/CF78B0tHOvg==</latexit><latexit sha1_base64="lXg+k7tM1f4dDIz3PGZoQpv6aVc=">AAACqXicjVFNa9wwEJXdj6TuR7btsZehS0MhJdglkBIIhDaFHHpIoLtZujaLLI8TdWXLSOOSxfi/9Tf0ln9Tedcp6aaHDgie3rx5Gs2klZKWwvDa8+/df/BwY/NR8PjJ02dbg+cvxlbXRuBIaKXNJOUWlSxxRJIUTiqDvEgVnqfzT13+/AcaK3X5lRYVJgW/KGUuBSdHzQY/Y8Iraj5f8aJSCC3EBafLNG9O2lkjD1rYPoTpSqPkHEHoAjOJtn0Hsco02QScIPpzi+PgxuHYORx8v+0gLfCVw+I/6m86WPc7hAh2en0QzAbDcDdcBtwFUQ+GrI/T2eBXnGlRF1iSUNzaaRRWlDTckBQK2yCuLVZczPkFTh0seYE2aZaTbuGNYzLItXGnJFiytysaXli7KFKn7Lq267mO/FduWlP+IWlkWdWEpVg9lNcKSEO3NsikQUFq4QAXRrpeQVxywwW55XZDiNa/fBeM3+9GDp/tDY8+9uPYZK/Ya/aWRWyfHbETdspGTHjb3hdv5I39Hf/Mn/jfVlLf62tesr/CF78B0tHOvg==</latexit><latexit sha1_base64="lXg+k7tM1f4dDIz3PGZoQpv6aVc=">AAACqXicjVFNa9wwEJXdj6TuR7btsZehS0MhJdglkBIIhDaFHHpIoLtZujaLLI8TdWXLSOOSxfi/9Tf0ln9Tedcp6aaHDgie3rx5Gs2klZKWwvDa8+/df/BwY/NR8PjJ02dbg+cvxlbXRuBIaKXNJOUWlSxxRJIUTiqDvEgVnqfzT13+/AcaK3X5lRYVJgW/KGUuBSdHzQY/Y8Iraj5f8aJSCC3EBafLNG9O2lkjD1rYPoTpSqPkHEHoAjOJtn0Hsco02QScIPpzi+PgxuHYORx8v+0gLfCVw+I/6m86WPc7hAh2en0QzAbDcDdcBtwFUQ+GrI/T2eBXnGlRF1iSUNzaaRRWlDTckBQK2yCuLVZczPkFTh0seYE2aZaTbuGNYzLItXGnJFiytysaXli7KFKn7Lq267mO/FduWlP+IWlkWdWEpVg9lNcKSEO3NsikQUFq4QAXRrpeQVxywwW55XZDiNa/fBeM3+9GDp/tDY8+9uPYZK/Ya/aWRWyfHbETdspGTHjb3hdv5I39Hf/Mn/jfVlLf62tesr/CF78B0tHOvg==</latexit>
How fill in missing data?
• The goal in factorization is to find X = H D
• This corresponds to finding Xij = H_{i,:} D_{:j} for all i, j
• The H_{i,:} is shared for i, across all j
• The D_{:,j} is shared for j, across all i
• We can learn something about H_{i,:}, as long as X_ij available for some j
• We can learn something about D_{:j}, as long as X_ij available for some i
31
Algorithm
32
minH,D
X
available (i,j)
(Xij �Hi:D:j)2
<latexit sha1_base64="2tPoulXvHkDxFvuqbMsygaHUoL8=">AAACQHicbVBNTxsxEPUCLRBKCe2xF4uoEkiAdhESiBNqOeQIEiGRsmE163jBwfau7FlEZO1P49KfwI1zLz20Qlw54YQI8dEnWX7z5sPjlxZSWAzD22BqeubDx9m5+drCp8XPS/XlLyc2Lw3jLZbL3HRSsFwKzVsoUPJOYTioVPJ2evFzlG9fcmNFro9xWPCegjMtMsEAvZTU27ESOnHNdXpQ0diWKnEx8it0cAlCgh9DK7oq1gdr/uokTgwqukFjBXieZq5ZeWWveo4PfLw3qNZOt5J6I9wMx6DvSTQhDTLBYVK/ifs5KxXXyCRY243CAnsODAomeVWLS8sLYBdwxruealDc9tzYgIp+90qfZrnxRyMdqy87HChrhyr1laNN7dvcSPxfrltitttzQhclcs2eHspKSTGnIzdpXxjOUA49AWaE35WyczDA0Hte8yZEb7/8npxsbUaeH2039n9M7Jgj38gKWSUR2SH7pEkOSYswck1+k7/kX/Ar+BPcBfdPpVPBpOcreYXg4REHWa9v</latexit><latexit sha1_base64="2tPoulXvHkDxFvuqbMsygaHUoL8=">AAACQHicbVBNTxsxEPUCLRBKCe2xF4uoEkiAdhESiBNqOeQIEiGRsmE163jBwfau7FlEZO1P49KfwI1zLz20Qlw54YQI8dEnWX7z5sPjlxZSWAzD22BqeubDx9m5+drCp8XPS/XlLyc2Lw3jLZbL3HRSsFwKzVsoUPJOYTioVPJ2evFzlG9fcmNFro9xWPCegjMtMsEAvZTU27ESOnHNdXpQ0diWKnEx8it0cAlCgh9DK7oq1gdr/uokTgwqukFjBXieZq5ZeWWveo4PfLw3qNZOt5J6I9wMx6DvSTQhDTLBYVK/ifs5KxXXyCRY243CAnsODAomeVWLS8sLYBdwxruealDc9tzYgIp+90qfZrnxRyMdqy87HChrhyr1laNN7dvcSPxfrltitttzQhclcs2eHspKSTGnIzdpXxjOUA49AWaE35WyczDA0Hte8yZEb7/8npxsbUaeH2039n9M7Jgj38gKWSUR2SH7pEkOSYswck1+k7/kX/Ar+BPcBfdPpVPBpOcreYXg4REHWa9v</latexit><latexit sha1_base64="2tPoulXvHkDxFvuqbMsygaHUoL8=">AAACQHicbVBNTxsxEPUCLRBKCe2xF4uoEkiAdhESiBNqOeQIEiGRsmE163jBwfau7FlEZO1P49KfwI1zLz20Qlw54YQI8dEnWX7z5sPjlxZSWAzD22BqeubDx9m5+drCp8XPS/XlLyc2Lw3jLZbL3HRSsFwKzVsoUPJOYTioVPJ2evFzlG9fcmNFro9xWPCegjMtMsEAvZTU27ESOnHNdXpQ0diWKnEx8it0cAlCgh9DK7oq1gdr/uokTgwqukFjBXieZq5ZeWWveo4PfLw3qNZOt5J6I9wMx6DvSTQhDTLBYVK/ifs5KxXXyCRY243CAnsODAomeVWLS8sLYBdwxruealDc9tzYgIp+90qfZrnxRyMdqy87HChrhyr1laNN7dvcSPxfrltitttzQhclcs2eHspKSTGnIzdpXxjOUA49AWaE35WyczDA0Hte8yZEb7/8npxsbUaeH2039n9M7Jgj38gKWSUR2SH7pEkOSYswck1+k7/kX/Ar+BPcBfdPpVPBpOcreYXg4REHWa9v</latexit><latexit sha1_base64="2tPoulXvHkDxFvuqbMsygaHUoL8=">AAACQHicbVBNTxsxEPUCLRBKCe2xF4uoEkiAdhESiBNqOeQIEiGRsmE163jBwfau7FlEZO1P49KfwI1zLz20Qlw54YQI8dEnWX7z5sPjlxZSWAzD22BqeubDx9m5+drCp8XPS/XlLyc2Lw3jLZbL3HRSsFwKzVsoUPJOYTioVPJ2evFzlG9fcmNFro9xWPCegjMtMsEAvZTU27ESOnHNdXpQ0diWKnEx8it0cAlCgh9DK7oq1gdr/uokTgwqukFjBXieZq5ZeWWveo4PfLw3qNZOt5J6I9wMx6DvSTQhDTLBYVK/ifs5KxXXyCRY243CAnsODAomeVWLS8sLYBdwxruealDc9tzYgIp+90qfZrnxRyMdqy87HChrhyr1laNN7dvcSPxfrltitttzQhclcs2eHspKSTGnIzdpXxjOUA49AWaE35WyczDA0Hte8yZEb7/8npxsbUaeH2039n9M7Jgj38gKWSUR2SH7pEkOSYswck1+k7/kX/Ar+BPcBfdPpVPBpOcreYXg4REHWa9v</latexit>
rD
X
available (i,j)
(Xij �Hi:D:j)2 =
X
available (i,j)
rD(Xij �Hi:D:j)2
<latexit sha1_base64="/6gvRiVHWpvPPoUCy6WHDPYY5S4=">AAAClHiclVFtSxtBEN47bbXpW6LQL35ZGgoKrdxJISIK8Q38JBYaDeTSY24zpxv39o7dOTEc94v8N37rv+kmBmm1UDow7DPPvO5MUihpKQh+ev7C4ouXS8uvGq/fvH33vtlaObd5aQT2RK5y00/AopIaeyRJYb8wCFmi8CK5Ppz6L27QWJnr7zQpcJjBpZapFECOipt3kYZEQXzEI1tmcRUR3lIFNyCV45HXfF1+Hm+4px9XclzzLzzKgK6StDqpHbNTP9pHzt4Z1xs/thp7/6r22PV/ysbNdrAZzIQ/B+EctNlczuLmfTTKRZmhJqHA2kEYFDSswJAUCutGVFosQFzDJQ4c1JChHVazpdb8k2NGPM2NU018xv6eUUFm7SRLXOR0UvvUNyX/5huUlG4PK6mLklCLh0ZpqTjlfHohPpIGBamJAyCMdLNycQUGBLk7NtwSwqdffg7OtzZDh799bXcP5utYZmvsI1tnIeuwLjthZ6zHhNfyOl7X2/c/+Lv+oX/8EOp785xV9of4p78AL/DH4Q==</latexit><latexit sha1_base64="/6gvRiVHWpvPPoUCy6WHDPYY5S4=">AAAClHiclVFtSxtBEN47bbXpW6LQL35ZGgoKrdxJISIK8Q38JBYaDeTSY24zpxv39o7dOTEc94v8N37rv+kmBmm1UDow7DPPvO5MUihpKQh+ev7C4ouXS8uvGq/fvH33vtlaObd5aQT2RK5y00/AopIaeyRJYb8wCFmi8CK5Ppz6L27QWJnr7zQpcJjBpZapFECOipt3kYZEQXzEI1tmcRUR3lIFNyCV45HXfF1+Hm+4px9XclzzLzzKgK6StDqpHbNTP9pHzt4Z1xs/thp7/6r22PV/ysbNdrAZzIQ/B+EctNlczuLmfTTKRZmhJqHA2kEYFDSswJAUCutGVFosQFzDJQ4c1JChHVazpdb8k2NGPM2NU018xv6eUUFm7SRLXOR0UvvUNyX/5huUlG4PK6mLklCLh0ZpqTjlfHohPpIGBamJAyCMdLNycQUGBLk7NtwSwqdffg7OtzZDh799bXcP5utYZmvsI1tnIeuwLjthZ6zHhNfyOl7X2/c/+Lv+oX/8EOp785xV9of4p78AL/DH4Q==</latexit><latexit sha1_base64="/6gvRiVHWpvPPoUCy6WHDPYY5S4=">AAAClHiclVFtSxtBEN47bbXpW6LQL35ZGgoKrdxJISIK8Q38JBYaDeTSY24zpxv39o7dOTEc94v8N37rv+kmBmm1UDow7DPPvO5MUihpKQh+ev7C4ouXS8uvGq/fvH33vtlaObd5aQT2RK5y00/AopIaeyRJYb8wCFmi8CK5Ppz6L27QWJnr7zQpcJjBpZapFECOipt3kYZEQXzEI1tmcRUR3lIFNyCV45HXfF1+Hm+4px9XclzzLzzKgK6StDqpHbNTP9pHzt4Z1xs/thp7/6r22PV/ysbNdrAZzIQ/B+EctNlczuLmfTTKRZmhJqHA2kEYFDSswJAUCutGVFosQFzDJQ4c1JChHVazpdb8k2NGPM2NU018xv6eUUFm7SRLXOR0UvvUNyX/5huUlG4PK6mLklCLh0ZpqTjlfHohPpIGBamJAyCMdLNycQUGBLk7NtwSwqdffg7OtzZDh799bXcP5utYZmvsI1tnIeuwLjthZ6zHhNfyOl7X2/c/+Lv+oX/8EOp785xV9of4p78AL/DH4Q==</latexit><latexit sha1_base64="/6gvRiVHWpvPPoUCy6WHDPYY5S4=">AAAClHiclVFtSxtBEN47bbXpW6LQL35ZGgoKrdxJISIK8Q38JBYaDeTSY24zpxv39o7dOTEc94v8N37rv+kmBmm1UDow7DPPvO5MUihpKQh+ev7C4ouXS8uvGq/fvH33vtlaObd5aQT2RK5y00/AopIaeyRJYb8wCFmi8CK5Ppz6L27QWJnr7zQpcJjBpZapFECOipt3kYZEQXzEI1tmcRUR3lIFNyCV45HXfF1+Hm+4px9XclzzLzzKgK6StDqpHbNTP9pHzt4Z1xs/thp7/6r22PV/ysbNdrAZzIQ/B+EctNlczuLmfTTKRZmhJqHA2kEYFDSswJAUCutGVFosQFzDJQ4c1JChHVazpdb8k2NGPM2NU018xv6eUUFm7SRLXOR0UvvUNyX/5huUlG4PK6mLklCLh0ZpqTjlfHohPpIGBamJAyCMdLNycQUGBLk7NtwSwqdffg7OtzZDh799bXcP5utYZmvsI1tnIeuwLjthZ6zHhNfyOl7X2/c/+Lv+oX/8EOp785xV9of4p78AL/DH4Q==</latexit>
rD:j (Xij �Hi:D:j)2 = �2(Xij �Hi:D:j)Hi:
<latexit sha1_base64="oM8UY7ylBPcHKEtIDuzLK1B1Hgw=">AAACZXiclVHLSgMxFM2Mr1qttlXcuDBYhLqwzBRBKQhFu+iygn1AW4dMmmnTZjJDkhHKOD/pzq0bf8P0gWjrxguBk3PP4d6cuCGjUlnWu2FubG5t76R203v7mYPDbC7fkkEkMGnigAWi4yJJGOWkqahipBMKgnyXkbY7eZj12y9ESBrwJzUNSd9HQ049ipHSlJN97XHkMuTENSeujJMEFjtOTMcJvII9H6mR68X1RDOV5PteS+bSy+dy+g5elf/hWBU42YJVsuYF14G9BAWwrIaTfesNAhz5hCvMkJRd2wpVP0ZCUcxIku5FkoQIT9CQdDXkyCeyH89TSuCFZgbQC4Q+XME5+9MRI1/Kqe9q5WxNudqbkX/1upHybvsx5WGkCMeLQV7EoArgLHI4oIJgxaYaICyo3hXiERIIK/0xaR2CvfrkddAql2yNH68L1ftlHClwCs5BEdjgBlRBHTRAE2DwYaSMnJE3Ps2MeWyeLKSmsfQcgV9lnn0B4dq3Eg==</latexit><latexit sha1_base64="oM8UY7ylBPcHKEtIDuzLK1B1Hgw=">AAACZXiclVHLSgMxFM2Mr1qttlXcuDBYhLqwzBRBKQhFu+iygn1AW4dMmmnTZjJDkhHKOD/pzq0bf8P0gWjrxguBk3PP4d6cuCGjUlnWu2FubG5t76R203v7mYPDbC7fkkEkMGnigAWi4yJJGOWkqahipBMKgnyXkbY7eZj12y9ESBrwJzUNSd9HQ049ipHSlJN97XHkMuTENSeujJMEFjtOTMcJvII9H6mR68X1RDOV5PteS+bSy+dy+g5elf/hWBU42YJVsuYF14G9BAWwrIaTfesNAhz5hCvMkJRd2wpVP0ZCUcxIku5FkoQIT9CQdDXkyCeyH89TSuCFZgbQC4Q+XME5+9MRI1/Kqe9q5WxNudqbkX/1upHybvsx5WGkCMeLQV7EoArgLHI4oIJgxaYaICyo3hXiERIIK/0xaR2CvfrkddAql2yNH68L1ftlHClwCs5BEdjgBlRBHTRAE2DwYaSMnJE3Ps2MeWyeLKSmsfQcgV9lnn0B4dq3Eg==</latexit><latexit sha1_base64="oM8UY7ylBPcHKEtIDuzLK1B1Hgw=">AAACZXiclVHLSgMxFM2Mr1qttlXcuDBYhLqwzBRBKQhFu+iygn1AW4dMmmnTZjJDkhHKOD/pzq0bf8P0gWjrxguBk3PP4d6cuCGjUlnWu2FubG5t76R203v7mYPDbC7fkkEkMGnigAWi4yJJGOWkqahipBMKgnyXkbY7eZj12y9ESBrwJzUNSd9HQ049ipHSlJN97XHkMuTENSeujJMEFjtOTMcJvII9H6mR68X1RDOV5PteS+bSy+dy+g5elf/hWBU42YJVsuYF14G9BAWwrIaTfesNAhz5hCvMkJRd2wpVP0ZCUcxIku5FkoQIT9CQdDXkyCeyH89TSuCFZgbQC4Q+XME5+9MRI1/Kqe9q5WxNudqbkX/1upHybvsx5WGkCMeLQV7EoArgLHI4oIJgxaYaICyo3hXiERIIK/0xaR2CvfrkddAql2yNH68L1ftlHClwCs5BEdjgBlRBHTRAE2DwYaSMnJE3Ps2MeWyeLKSmsfQcgV9lnn0B4dq3Eg==</latexit><latexit sha1_base64="oM8UY7ylBPcHKEtIDuzLK1B1Hgw=">AAACZXiclVHLSgMxFM2Mr1qttlXcuDBYhLqwzBRBKQhFu+iygn1AW4dMmmnTZjJDkhHKOD/pzq0bf8P0gWjrxguBk3PP4d6cuCGjUlnWu2FubG5t76R203v7mYPDbC7fkkEkMGnigAWi4yJJGOWkqahipBMKgnyXkbY7eZj12y9ESBrwJzUNSd9HQ049ipHSlJN97XHkMuTENSeujJMEFjtOTMcJvII9H6mR68X1RDOV5PteS+bSy+dy+g5elf/hWBU42YJVsuYF14G9BAWwrIaTfesNAhz5hCvMkJRd2wpVP0ZCUcxIku5FkoQIT9CQdDXkyCeyH89TSuCFZgbQC4Q+XME5+9MRI1/Kqe9q5WxNudqbkX/1upHybvsx5WGkCMeLQV7EoArgLHI4oIJgxaYaICyo3hXiERIIK/0xaR2CvfrkddAql2yNH68L1ftlHClwCs5BEdjgBlRBHTRAE2DwYaSMnJE3Ps2MeWyeLKSmsfQcgV9lnn0B4dq3Eg==</latexit>
Gradient descent on H and D until convergence
Matrix completion solution
• Once learn H and D, can complete the matrix
• Take entry (i,j) that was missing, compute Xij = H_{i:} D_{:j}
• H_{i:} is the representation of user i• also called an embedding, where similar users in terms of movie
preferences should have similar H_{i:}
• D_{:j} is the representation of movie j
33
Conclusion: factorization enables missing data to be inferred, and provides a new metric between items
kHi: �Hs:k2<latexit sha1_base64="CNlHd9iX4zEetJU7QF2lbokOXFE=">AAACEXicbZDLSsNAFIYnXmu9RV26GSxCN5akCIqropsuK9gLNCFMppN26GQSZiZCSfMKbnwVNy4UcevOnW/jpM2itv4w8POdc5hzfj9mVCrL+jHW1jc2t7ZLO+Xdvf2DQ/PouCOjRGDSxhGLRM9HkjDKSVtRxUgvFgSFPiNdf3yX17uPREga8Qc1iYkboiGnAcVIaeSZVWfqhEiN/CBtZl5KbzJ4AReJ1MSZenXPrFg1aya4auzCVEChlmd+O4MIJyHhCjMkZd+2YuWmSCiKGcnKTiJJjPAYDUlfW45CIt10dlEGzzUZwCAS+nEFZ3RxIkWhlJPQ1535rnK5lsP/av1EBdduSnmcKMLx/KMgYVBFMI8HDqggWLGJNggLqneFeIQEwkqHWNYh2Msnr5pOvWZrf39ZadwWcZTAKTgDVWCDK9AATdACbYDBE3gBb+DdeDZejQ/jc966ZhQzJ+CPjK9fEvGdIw==</latexit><latexit sha1_base64="CNlHd9iX4zEetJU7QF2lbokOXFE=">AAACEXicbZDLSsNAFIYnXmu9RV26GSxCN5akCIqropsuK9gLNCFMppN26GQSZiZCSfMKbnwVNy4UcevOnW/jpM2itv4w8POdc5hzfj9mVCrL+jHW1jc2t7ZLO+Xdvf2DQ/PouCOjRGDSxhGLRM9HkjDKSVtRxUgvFgSFPiNdf3yX17uPREga8Qc1iYkboiGnAcVIaeSZVWfqhEiN/CBtZl5KbzJ4AReJ1MSZenXPrFg1aya4auzCVEChlmd+O4MIJyHhCjMkZd+2YuWmSCiKGcnKTiJJjPAYDUlfW45CIt10dlEGzzUZwCAS+nEFZ3RxIkWhlJPQ1535rnK5lsP/av1EBdduSnmcKMLx/KMgYVBFMI8HDqggWLGJNggLqneFeIQEwkqHWNYh2Msnr5pOvWZrf39ZadwWcZTAKTgDVWCDK9AATdACbYDBE3gBb+DdeDZejQ/jc966ZhQzJ+CPjK9fEvGdIw==</latexit><latexit sha1_base64="CNlHd9iX4zEetJU7QF2lbokOXFE=">AAACEXicbZDLSsNAFIYnXmu9RV26GSxCN5akCIqropsuK9gLNCFMppN26GQSZiZCSfMKbnwVNy4UcevOnW/jpM2itv4w8POdc5hzfj9mVCrL+jHW1jc2t7ZLO+Xdvf2DQ/PouCOjRGDSxhGLRM9HkjDKSVtRxUgvFgSFPiNdf3yX17uPREga8Qc1iYkboiGnAcVIaeSZVWfqhEiN/CBtZl5KbzJ4AReJ1MSZenXPrFg1aya4auzCVEChlmd+O4MIJyHhCjMkZd+2YuWmSCiKGcnKTiJJjPAYDUlfW45CIt10dlEGzzUZwCAS+nEFZ3RxIkWhlJPQ1535rnK5lsP/av1EBdduSnmcKMLx/KMgYVBFMI8HDqggWLGJNggLqneFeIQEwkqHWNYh2Msnr5pOvWZrf39ZadwWcZTAKTgDVWCDK9AATdACbYDBE3gBb+DdeDZejQ/jc966ZhQzJ+CPjK9fEvGdIw==</latexit><latexit sha1_base64="CNlHd9iX4zEetJU7QF2lbokOXFE=">AAACEXicbZDLSsNAFIYnXmu9RV26GSxCN5akCIqropsuK9gLNCFMppN26GQSZiZCSfMKbnwVNy4UcevOnW/jpM2itv4w8POdc5hzfj9mVCrL+jHW1jc2t7ZLO+Xdvf2DQ/PouCOjRGDSxhGLRM9HkjDKSVtRxUgvFgSFPiNdf3yX17uPREga8Qc1iYkboiGnAcVIaeSZVWfqhEiN/CBtZl5KbzJ4AReJ1MSZenXPrFg1aya4auzCVEChlmd+O4MIJyHhCjMkZd+2YuWmSCiKGcnKTiJJjPAYDUlfW45CIt10dlEGzzUZwCAS+nEFZ3RxIkWhlJPQ1535rnK5lsP/av1EBdduSnmcKMLx/KMgYVBFMI8HDqggWLGJNggLqneFeIQEwkqHWNYh2Msnr5pOvWZrf39ZadwWcZTAKTgDVWCDK9AATdACbYDBE3gBb+DdeDZejQ/jc966ZhQzJ+CPjK9fEvGdIw==</latexit>