information retrieval through various approximate matrix decompositions

1

Information Retrieval through Various Approximate Matrix Decompositions

Kathryn Linehan

Advisor: Dr. Dianne O’Leary

2

Information Retrieval

Extracting information from databases

We need an efficient way of searching large amounts of data

Example: web search engine

3

Querying a Document Database

We want to return documents that are relevant to entered search terms

Given data: • Term-Document Matrix, A

• Entry ( i , j ): importance of term i in document j

• Query Vector, q• Entry ( i ): importance of term i in the query

4

Term-Document Matrix Entry ( i, j) : weight of term i in document j

15000

20000

010200

05100

020015

00015

Document

1 2 3 4

Term

Mark

Twain

Samuel

Clemens

Purple

Fairy

Example:

Example taken from [5]

5

Query Vector Entry ( i ) : weight of term i in the query

0

0

0

0

1

1 Example:

search for “Mark Twain”

Term

Mark

Twain

Samuel

Clemens

Purple

Fairy


6

Document Scoring

Doc 1 and Doc 3 will be returned as relevant, but Doc 2 will not

T

T

0

20

0

30

15000

20000

010200

05100

020015

00015

*

0

0

0

0

1

1

Document

1 2 3 4Term

Mark

Twain

Samuel

Clemens

Purple

Fairy

Scores

Doc 1

Doc 2

Doc 3

Doc 4


7

Can we do better if we replace the matrix by an approximation?

Singular Value Decomposition (SVD)

•

Nonnegative Matrix Factorization (NMF)

•

CUR Decomposition

•

WHA

CURA

TVUA

8

Nonnegative Matrix Factorization (NMF)

HWA *

m x n m x k

k x n

• W and H are nonnegative

• kWHrank )(

Storage: k(m + n) entries

9

NMF

Multiplicative update algorithm of Lee and Seung found in [1]• Find W, H to minimize

• Random initialization for W,H

• Gradient descent method

• Slow due to matrix multiplications in iteration

2

2

1FWHA

10

NMF Validation

A: 5 x 3 random dense matrix. Average over 5 runs.

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4NMF Validation: Relative Error

k: rank(WH), rank(SVD) <= k

rela

tive

erro

r: F

robe

nius

nor

m

NMF

SVD

40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8NMF Validation: Relative Error

k: rank(WH), rank(SVD) <= kre

lativ

e er

ror:

Fro

beni

us n

orm

NMF

SVD

B: 500 x 200 random sparse matrix. Average over 5 runs.

11

NMF Validation

10 20 30 40 50 60 70 80 90 1000.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

Iteration Number

Rel

ativ

e er

ror:

Fro

beni

us N

orm

Relative Error vs. Iteration Number

NMF

SVD

B: 500 x 200 random sparse matrix. Rank(NMF) = 80.

12

CUR Decomposition

**A C U R

m x n m x c

c x r r x n

• C (R) holds c (r) sampled and rescaled columns (rows) of A

• U is computed using C and R

• kCURrank )(where k is a rank parameter

,

Storage: (nz(C) + cr + nz(R)) entries

13

CUR Implementations

CUR algorithm in [3] by Drineas, Kannan, and Mahoney• Linear time algorithm

• Improvement: Compact Matrix Decomposition (CMD) in [6] by Sun, Xie, Zhang, and Faloutsos

• Modification: use ideas in [4] by Drineas, Mahoney, Muthukrishnan (no longer linear time)

• Other Modifications: our ideas

Deterministic CUR code by G. W. Stewart [2]

14

Sampling Column (Row) norm sampling [3]

• Prob(col j) = (similar for row i)

Subspace Sampling [4]• Uses rank-k SVD of A for column probabilities

• Prob(col j) =

• Uses “economy size” SVD of C for row probabilities

• Prob(row i) =

Sampling without replacement

22)(:, FAjA

kjV kA2

, :),(

ciUC2:),(

15

Computation of U

Linear U [3]: approximately solves

•

Optimal U: solves

•

FU

UCA ˆminˆ

RCCACCCU Tk

TTT )()(ˆ

2min F

U

CURA

)()( TTTT RRARCCCU

16

Deterministic CUR

Code by G. W. Stewart [2] Uses a RRQR algorithm that does not

store Q• We only need the permutation vector

• Gives us the columns (rows) for C (R)

Uses an optimal U

17

Compact Matrix Decomposition (CMD) Improvement

Remove repeated columns (rows) in C (R) Decreases storage while still achieving the

same relative error [6]

Algorithm [3] [3] with CMD

Runtime 0.008060 0.007153

Storage 880.5 550.5

Relative Error 0.820035 0.820035

A: 50 x 30 random sparse matrix, k = 15. Average over 10 runs.

18

CUR: Sampling with Replacement Validation

A: 5 x 3 random dense matrix. Average over 5 runs.Legend: Sampling, U

5 6 7 8 9 10 11 12 13 14 150

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5CUR Validation: Relative Error

c/r: number of columns/rows sampled

rela

tive

erro

r: F

robe

nius

nor

m

CN,L

CN,O

S,LS,O

SVD

500 600 700 800 900 1000 1100 1200 1300 1400 15000

0.05

0.1

0.15

0.2

0.25

0.3

0.35



rela

tive

erro

r: F

robe

nius

nor

m

CN,L

CN,O

S,LS,O

SVD

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3

x 105

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35



rela

tive

erro

r: F

robe

nius

nor

m

CN,L

CN,O

S,LS,O

SVD

19

Sampling without Replacement: Scaling vs. No Scaling

Invert scaling factor applied to

• T

kT CCU )(

20

CUR: Sampling without Replacement Validation

A: 5 x 3 random dense matrix. Average over 5 runs.

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8CUR Validation: Relative Error, r = 2k, c = k

k: rank(CUR) <= k, rank(SVD) <= k

rela

tive

erro

r: F

robe

nius

nor

m

w/o R,L,w/o Sc

w/o R,L,Sc

SVD

40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1CUR Validation: Relative Error, c = 3k, r = k


rela

tive

erro

r: F

robe

nius

nor

m

w/o R,L,w/o Sc

w/o R,L,ScSVD

Legend: Sampling, U, Scaling

B: 500 x 200 random sparse matrix. Average over 5 runs.

21

CUR Comparison

40 60 80 100 120 140 160 180 2000

0.2

0.4

0.6

0.8

1

1.2

1.4CUR Validation: Relative Error, r = c = 2k


rela

tive

erro

r: F

robe

nius

nor

m

CN,L

CN,OS,L

S,O

w/o R,L,w/o Sc

w/o R,L,ScD

SVD

40 60 80 100 120 140 160 180 2000

0.2

0.4

0.6

0.8

1

1.2

1.4CUR Validation: Relative Error, r = c = k


rela

tive

erro

r: F

robe

nius

nor

m

CN,L

CN,OS,L

S,O

w/o R,L,w/o Sc

w/o R,L,ScD

SVD

B: 500 x 200 random sparse matrix. Average over 5 runs.Legend: Sampling, U, Scaling

22

Judging Success: Precision and Recall

Measurement of performance for document retrieval

• Average precision and recall, where the average is taken over all queries in the data set

• Let Retrieved = number of documents retrieved,

Relevant = total number of relevant documents to the query,

RetRel = number of documents retrieved that are relevant.

• Precision:

• Recall:

Retrieved

RetRel)Retrieved( P

Relevant

RetRel)Retrieved( R

23

LSI Results

Term-document matrix size: 5831 x 1033. All matrix approximations are rank 100 approximations (CUR: r = c = k). Average query time is less than 10-3 seconds for all matrix approximations.

0 200 400 600 800 1000 12000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

number of documents retrieved

aver

age

prec

isio

n

Average Precision vs. number of documents retrieved

SVDNMF

CUR:cn,lin

CUR:cn,opt

CUR:sub,linCUR:sub,opt

CUR:w/oR,no

CUR:w/oR,yes

CUR:GWSLTM

0 200 400 600 800 1000 12000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


aver

age

reca

ll

Average Recall vs. number of documents retrieved

SVDNMF

CUR:cn,lin

CUR:cn,opt


CUR:w/oR,no

CUR:w/oR,yes

CUR:GWSLTM

24

LSI Results

5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8


aver

age

prec

isio

n

Average Precision vs. number of documents retrieved

SVDNMF

CUR:cn,lin

CUR:cn,opt


CUR:w/oR,no

CUR:w/oR,yes

CUR:GWSLTM

5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8


aver

age

reca

ll

Average Recall vs. number of documents retrieved

SVDNMF

CUR:cn,lin

CUR:cn,opt


CUR:w/oR,no

CUR:w/oR,yes

CUR:GWSLTM

Term-document matrix size: 5831 x 1033.

All matrix approximations are rank 100 approximations. (CUR: r = c = k)

25

Matrix Approximation Results Rel. Error (F-norm) Storage (nz) Runtime (sec)

SVD 0.8203 686500 22.5664

NMF 0.8409 686400 23.0210

CUR: cn,lin 1.4151 17242 0.1741

CUR: cn,opt 0.9724 16358 0.2808

CUR: sub,lin 1.2093 16175 48.7651

CUR: sub,opt 0.9615 16108 49.0830

CUR: w/oR,no 0.9931 17932 0.3466

CUR: w/oR,yes 0.9957 17220 0.2734

CUR:GWS 0.9437 25020 2.2857

LTM -- 52003 --

26

Conclusions

We may not be able to store an entire term-document matrix and it may be too expensive to compute an SVD

We can achieve LSI results that are almost as good with cheaper approximations• Less storage

• Less computation time

27

Completed Project Goals

Code/validate NMF and CUR Analyze relative error, runtime, and

storage of NMF and CUR Improve CUR algorithm of [3] Analyze use of NMF and CUR in LSI

28

ReferencesMichael W. Berry, Murray Browne, Amy N. Langville, V. Paul Pauca, and Robert J. Plemmons. Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics and Data Analysis, 52(1):155-173, September 2007.

M.W. Berry, S.A. Pulatova, and G.W. Stewart. Computing sparse reduced-rank approximations to sparse matrices. Technical Report UMIACS TR-2004-34 CMSC TR-4591, University of Maryland, May 2004.

Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast Monte Carlo algorithms for matrices iii: Computing a compressed approximate matrix decomposition. SIAM Journal on Computing, 36(1):184-206, 2006.

Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844-881, 2008.

Tamara G. Kolda and Dianne P. O'Leary. A semidiscrete matrix decomposition for latent semantic indexing in information retrieval. ACM Transactions on Information Systems, 16(4):322-346, October 1998.

Jimeng Sun, Yinglian Xie, Hui Zhang, and Christos Faloutsos. Less is more: Sparse graph mining with compact matrix decomposition. Statistical Analysis and Data Mining, 1(1):6-22, February 2008.

[3]

[2]

[1]

[4]

[5]

[6]

information retrieval through various approximate matrix decompositions

Documents

random sparse matrix

random dense matrix

x c c x r r x n c r

matrix multiplications

x kk x n w

u sampling

document j document

decomposition c u rm