matrixfactorizationwcohen/10-605/matrix-factors.pdf · • the neatest little guide to stock market...

70
Matrix Factorization 1

Upload: others

Post on 14-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Matrix  Factorization

1

Page 2: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Recovering  latent  factors  in  a  matrix

m columns

v11 …

… …

vij

vnm

n ro

ws

2

Page 3: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Recovering  latent  factors  in  a  matrix

K * m

n *

K

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

3

Page 4: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

What  is  this  for?

K * m

n *

K

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

4

Page 5: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

MF  for  collaborative  @iltering

5

Page 6: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

What  is  collaborative  @iltering?

Page 7: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

What  is  collaborative  @iltering?

Page 8: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

What  is  collaborative  @iltering?

Page 9: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

What  is  collaborative  @iltering?

Page 10: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common
Page 11: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Recovering  latent  factors  in  a  matrix

m movies

v11 …

… …

vij

vnm

V[i,j] = user i’s rating of movie j

n us

ers

11

Page 12: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Recovering  latent  factors  in  a  matrix

m movies

n us

ers

m movies

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

V[i,j] = user i’s rating of movie j

12

Page 13: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

13

Page 14: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

MF  for  image  modeling

14

Page 15: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

15

Page 16: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

MF  for  images

10,000 pixels

1000

imag

es

1000 * 10,000,00

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 … … …

… …

vij

vnm

~

V[i,j] = pixel j in image i

2 prototypes

PC1

PC2

Page 17: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

MF  for  modeling  text

17

Page 18: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

•  The Neatest Little Guide to Stock Market Investing •  Investing For Dummies, 4th Edition •  The Little Book of Common Sense Investing: The Only

Way to Guarantee Your Fair Share of Stock Market Returns

•  The Little Book of Value Investing •  Value Investing: From Graham to Buffett and Beyond •  Rich Dad’s Guide to Investing: What the Rich Invest in,

That the Poor and the Middle Class Do Not! •  Investing in Real Estate, 5th Edition •  Stock Investing For Dummies •  Rich Dad’s Advisors: The ABC’s of Real Estate

Investing: The Secrets of Finding Hidden Profits Most Investors Miss

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

Page 19: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

TFIDF counts would be better

Page 20: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Recovering  latent  factors  in  a  matrix

m terms

n do

cum

ents

doc term matrix

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

V[i,j] = TFIDF score of term j in doc i

20

Page 21: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

=

Page 22: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Investing for real estate

Rich Dad’s Advisor’s: The ABCs of Real

Estate Investment …

Page 23: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

The little book of common

sense investing: …

Neatest Little Guide to Stock

Market Investing

Page 24: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

MF  is  like  clustering

24

Page 25: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

k-­‐means  as  MF

cluster means

n ex

ampl

es

0 1

1 0

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

original data set indicators for r

clusters

Z

M

X

Page 26: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

How  do  you  do  it?

K * m

n *

K

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

26

Page 27: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

talk pilfered from à …..

KDD 2011

27

Page 28: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

28

Page 29: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Recovering  latent  factors  in  a  matrix

m movies

n us

ers

m movies

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

V[i,j] = user i’s rating of movie j

r

W

H

V

29

Page 30: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

30

Page 31: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

31

Page 32: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

32

Page 33: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

for image denoising

33

Page 34: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Matrix  factorization  as  SGD

step size why does this work?

34

Page 35: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Matrix  factorization  as  SGD  -­‐  why  does  this  work?    Here’s  the  key  claim:

35

Page 36: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Checking  the  claim

Think for SGD for logistic regression •  LR loss = compare y and ŷ = dot(w,x) •  similar but now update w (user weights) and x (movie weight)

36

Page 37: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

What  loss  functions  are  possible? N1,  N2  -­‐  diagonal  

matrixes,  sort  of  like  IDF  factors  for  the  users/

movies  

“generalized”  KL-­‐divergence

37

Page 38: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

What  loss  functions  are  possible?

38

Page 39: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

What  loss  functions  are  possible?

39

Page 40: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

ALS = alternating least squares

40

Page 41: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

talk pilfered from à …..

KDD 2011

41

Page 42: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

42

Page 43: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

43

Page 44: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

44

Page 45: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Similar to McDonnell et al with perceptron learning

45

Page 46: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Slow convergence…..

46

Page 47: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

47

Page 48: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

48

Page 49: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

49

Page 50: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

50

Page 51: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

51

Page 52: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

52

Page 53: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

More  detail…. •  Randomly  permute  rows/cols  of  matrix •  Chop  V,W,H  into  blocks  of  size  d  x  d

– m/d  blocks  in  W,  n/d  blocks  in  H •  Group  the  data:

– Pick  a  set  of  blocks  with  no  overlapping  rows  or  columns  (a  stratum)

– Repeat  until  all  blocks  in  V  are  covered •  Train  the  SGD

– Process  strata  in  series – Process  blocks  within  a  stratum  in  parallel

53

Page 54: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

More  detail….

Z was V

54

Page 55: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

More  detail…. •  Initialize  W,H  randomly

– not  at  zero  J •  Choose  a  random  ordering  (random  sort)  of  the  points  in  a  stratum  in  each  “sub-­‐epoch”

•  Pick  strata  sequence  by  permuting  rows  and  columns  of  M,  and  using  M’[k,i]  as  column  index  of  row  i  in  subepoch  k  

•  Use  “bold  driver”  to  set  step  size: –  increase  step  size  when  loss  decreases  (in  an  epoch) – decrease  step  size  when  loss  increases

•  Implemented  in  Hadoop  and  R/Snowfall

M=

55

Page 56: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

56

Page 57: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Wall  Clock  Time�8  nodes,  64  cores,  R/snow

57

Page 58: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

58

Page 59: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

59

Page 60: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

60

Page 61: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

61

Page 62: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Number  of  Epochs

62

Page 63: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

63

Page 64: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

64

Page 65: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

65

Page 66: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

66

Page 67: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Varying  rank�100  epochs  for  all  �

67

Page 68: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Hadoop  scalability

Hadoop process setup time starts

to dominate

68

Page 69: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

Hadoop  scalability

69

Page 70: MatrixFactorizationwcohen/10-605/matrix-factors.pdf · • The Neatest Little Guide to Stock Market Investing • Investing For Dummies, 4th Edition • The Little Book of Common

70