matrixfactorizationwcohen/10-605/matrix-factors.pdf · • the neatest little guide to stock market...
TRANSCRIPT
Matrix Factorization
1
Recovering latent factors in a matrix
m columns
v11 …
… …
vij
…
vnm
n ro
ws
2
Recovering latent factors in a matrix
K * m
n *
K
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
3
What is this for?
K * m
n *
K
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
4
MF for collaborative @iltering
5
What is collaborative @iltering?
What is collaborative @iltering?
What is collaborative @iltering?
What is collaborative @iltering?
Recovering latent factors in a matrix
m movies
v11 …
… …
vij
…
vnm
V[i,j] = user i’s rating of movie j
n us
ers
11
Recovering latent factors in a matrix
m movies
n us
ers
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
12
13
MF for image modeling
14
15
MF for images
10,000 pixels
1000
imag
es
1000 * 10,000,00
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 … … …
… …
vij
…
vnm
~
V[i,j] = pixel j in image i
2 prototypes
PC1
PC2
MF for modeling text
17
• The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common Sense Investing: The Only
Way to Guarantee Your Fair Share of Stock Market Returns
• The Little Book of Value Investing • Value Investing: From Graham to Buffett and Beyond • Rich Dad’s Guide to Investing: What the Rich Invest in,
That the Poor and the Middle Class Do Not! • Investing in Real Estate, 5th Edition • Stock Investing For Dummies • Rich Dad’s Advisors: The ABC’s of Real Estate
Investing: The Secrets of Finding Hidden Profits Most Investors Miss
https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/
https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/
TFIDF counts would be better
Recovering latent factors in a matrix
m terms
n do
cum
ents
doc term matrix
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
V[i,j] = TFIDF score of term j in doc i
20
=
Investing for real estate
Rich Dad’s Advisor’s: The ABCs of Real
Estate Investment …
The little book of common
sense investing: …
Neatest Little Guide to Stock
Market Investing
MF is like clustering
24
k-‐means as MF
cluster means
n ex
ampl
es
0 1
1 0
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
original data set indicators for r
clusters
Z
M
X
How do you do it?
K * m
n *
K
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
26
talk pilfered from à …..
KDD 2011
27
28
Recovering latent factors in a matrix
m movies
n us
ers
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm v11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
r
W
H
V
29
30
31
32
for image denoising
33
Matrix factorization as SGD
step size why does this work?
34
Matrix factorization as SGD -‐ why does this work? Here’s the key claim:
35
Checking the claim
Think for SGD for logistic regression • LR loss = compare y and ŷ = dot(w,x) • similar but now update w (user weights) and x (movie weight)
36
What loss functions are possible? N1, N2 -‐ diagonal
matrixes, sort of like IDF factors for the users/
movies
“generalized” KL-‐divergence
37
What loss functions are possible?
38
What loss functions are possible?
39
ALS = alternating least squares
40
talk pilfered from à …..
KDD 2011
41
42
43
44
Similar to McDonnell et al with perceptron learning
45
Slow convergence…..
46
47
48
49
50
51
52
More detail…. • Randomly permute rows/cols of matrix • Chop V,W,H into blocks of size d x d
– m/d blocks in W, n/d blocks in H • Group the data:
– Pick a set of blocks with no overlapping rows or columns (a stratum)
– Repeat until all blocks in V are covered • Train the SGD
– Process strata in series – Process blocks within a stratum in parallel
53
More detail….
Z was V
54
More detail…. • Initialize W,H randomly
– not at zero J • Choose a random ordering (random sort) of the points in a stratum in each “sub-‐epoch”
• Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch k
• Use “bold driver” to set step size: – increase step size when loss decreases (in an epoch) – decrease step size when loss increases
• Implemented in Hadoop and R/Snowfall
M=
55
56
Wall Clock Time�8 nodes, 64 cores, R/snow
57
58
59
60
61
Number of Epochs
62
63
64
65
66
Varying rank�100 epochs for all �
67
Hadoop scalability
Hadoop process setup time starts
to dominate
68
Hadoop scalability
69
70