matrixfactorizationwcohen/10-605/matrix-factors.pdf · • the neatest little guide to stock market...

Post on 14-Jun-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Matrix  Factorization

1

Recovering  latent  factors  in  a  matrix

m columns

v11 …

… …

vij

vnm

n ro

ws

2

Recovering  latent  factors  in  a  matrix

K * m

n *

K

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

3

What  is  this  for?

K * m

n *

K

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

4

MF  for  collaborative  @iltering

5

What  is  collaborative  @iltering?

What  is  collaborative  @iltering?

What  is  collaborative  @iltering?

What  is  collaborative  @iltering?

Recovering  latent  factors  in  a  matrix

m movies

v11 …

… …

vij

vnm

V[i,j] = user i’s rating of movie j

n us

ers

11

Recovering  latent  factors  in  a  matrix

m movies

n us

ers

m movies

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

V[i,j] = user i’s rating of movie j

12

13

MF  for  image  modeling

14

15

MF  for  images

10,000 pixels

1000

imag

es

1000 * 10,000,00

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 … … …

… …

vij

vnm

~

V[i,j] = pixel j in image i

2 prototypes

PC1

PC2

MF  for  modeling  text

17

•  The Neatest Little Guide to Stock Market Investing •  Investing For Dummies, 4th Edition •  The Little Book of Common Sense Investing: The Only

Way to Guarantee Your Fair Share of Stock Market Returns

•  The Little Book of Value Investing •  Value Investing: From Graham to Buffett and Beyond •  Rich Dad’s Guide to Investing: What the Rich Invest in,

That the Poor and the Middle Class Do Not! •  Investing in Real Estate, 5th Edition •  Stock Investing For Dummies •  Rich Dad’s Advisors: The ABC’s of Real Estate

Investing: The Secrets of Finding Hidden Profits Most Investors Miss

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

TFIDF counts would be better

Recovering  latent  factors  in  a  matrix

m terms

n do

cum

ents

doc term matrix

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

V[i,j] = TFIDF score of term j in doc i

20

=

Investing for real estate

Rich Dad’s Advisor’s: The ABCs of Real

Estate Investment …

The little book of common

sense investing: …

Neatest Little Guide to Stock

Market Investing

MF  is  like  clustering

24

k-­‐means  as  MF

cluster means

n ex

ampl

es

0 1

1 0

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

original data set indicators for r

clusters

Z

M

X

How  do  you  do  it?

K * m

n *

K

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

26

talk pilfered from à …..

KDD 2011

27

28

Recovering  latent  factors  in  a  matrix

m movies

n us

ers

m movies

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bm v11 …

… …

vij

vnm

~

V[i,j] = user i’s rating of movie j

r

W

H

V

29

30

31

32

for image denoising

33

Matrix  factorization  as  SGD

step size why does this work?

34

Matrix  factorization  as  SGD  -­‐  why  does  this  work?    Here’s  the  key  claim:

35

Checking  the  claim

Think for SGD for logistic regression •  LR loss = compare y and ŷ = dot(w,x) •  similar but now update w (user weights) and x (movie weight)

36

What  loss  functions  are  possible? N1,  N2  -­‐  diagonal  

matrixes,  sort  of  like  IDF  factors  for  the  users/

movies  

“generalized”  KL-­‐divergence

37

What  loss  functions  are  possible?

38

What  loss  functions  are  possible?

39

ALS = alternating least squares

40

talk pilfered from à …..

KDD 2011

41

42

43

44

Similar to McDonnell et al with perceptron learning

45

Slow convergence…..

46

47

48

49

50

51

52

More  detail…. •  Randomly  permute  rows/cols  of  matrix •  Chop  V,W,H  into  blocks  of  size  d  x  d

– m/d  blocks  in  W,  n/d  blocks  in  H •  Group  the  data:

– Pick  a  set  of  blocks  with  no  overlapping  rows  or  columns  (a  stratum)

– Repeat  until  all  blocks  in  V  are  covered •  Train  the  SGD

– Process  strata  in  series – Process  blocks  within  a  stratum  in  parallel

53

More  detail….

Z was V

54

More  detail…. •  Initialize  W,H  randomly

– not  at  zero  J •  Choose  a  random  ordering  (random  sort)  of  the  points  in  a  stratum  in  each  “sub-­‐epoch”

•  Pick  strata  sequence  by  permuting  rows  and  columns  of  M,  and  using  M’[k,i]  as  column  index  of  row  i  in  subepoch  k  

•  Use  “bold  driver”  to  set  step  size: –  increase  step  size  when  loss  decreases  (in  an  epoch) – decrease  step  size  when  loss  increases

•  Implemented  in  Hadoop  and  R/Snowfall

M=

55

56

Wall  Clock  Time�8  nodes,  64  cores,  R/snow

57

58

59

60

61

Number  of  Epochs

62

63

64

65

66

Varying  rank�100  epochs  for  all  �

67

Hadoop  scalability

Hadoop process setup time starts

to dominate

68

Hadoop  scalability

69

70

top related