information retrieval through various approximate matrix decompositions
DESCRIPTION
Information Retrieval through Various Approximate Matrix Decompositions. Kathryn Linehan Advisor: Dr. Dianne O’Leary. Information Retrieval. Extracting information from databases We need an efficient way of searching large amounts of data Example: web search engine. - PowerPoint PPT PresentationTRANSCRIPT
1
Information Retrieval through Various Approximate Matrix Decompositions
Kathryn Linehan
Advisor: Dr. Dianne O’Leary
2
Information Retrieval
Extracting information from databases
We need an efficient way of searching large amounts of data
Example: web search engine
3
Querying a Document Database
We want to return documents that are relevant to entered search terms
Given data: • Term-Document Matrix, A
• Entry ( i , j ): importance of term i in document j
• Query Vector, q• Entry ( i ): importance of term i in the query
4
Term-Document Matrix Entry ( i, j) : weight of term i in document j
15000
20000
010200
05100
020015
00015
Document
1 2 3 4
Term
Mark
Twain
Samuel
Clemens
Purple
Fairy
Example:
Example taken from [5]
5
Query Vector Entry ( i ) : weight of term i in the query
0
0
0
0
1
1 Example:
search for “Mark Twain”
Term
Mark
Twain
Samuel
Clemens
Purple
Fairy
Example taken from [5]
6
Document Scoring
Doc 1 and Doc 3 will be returned as relevant, but Doc 2 will not
T
T
0
20
0
30
15000
20000
010200
05100
020015
00015
*
0
0
0
0
1
1
Document
1 2 3 4Term
Mark
Twain
Samuel
Clemens
Purple
Fairy
Scores
Doc 1
Doc 2
Doc 3
Doc 4
Example taken from [5]
7
Can we do better if we replace the matrix by an approximation?
Singular Value Decomposition (SVD)
•
Nonnegative Matrix Factorization (NMF)
•
CUR Decomposition
•
WHA
CURA
TVUA
8
Nonnegative Matrix Factorization (NMF)
HWA *
m x n m x k
k x n
• W and H are nonnegative
• kWHrank )(
Storage: k(m + n) entries
9
NMF
Multiplicative update algorithm of Lee and Seung found in [1]• Find W, H to minimize
• Random initialization for W,H
• Gradient descent method
• Slow due to matrix multiplications in iteration
2
2
1FWHA
10
NMF Validation
A: 5 x 3 random dense matrix. Average over 5 runs.
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4NMF Validation: Relative Error
k: rank(WH), rank(SVD) <= k
rela
tive
erro
r: F
robe
nius
nor
m
NMF
SVD
40 60 80 100 120 140 160 180 2000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8NMF Validation: Relative Error
k: rank(WH), rank(SVD) <= kre
lativ
e er
ror:
Fro
beni
us n
orm
NMF
SVD
B: 500 x 200 random sparse matrix. Average over 5 runs.
11
NMF Validation
10 20 30 40 50 60 70 80 90 1000.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Iteration Number
Rel
ativ
e er
ror:
Fro
beni
us N
orm
Relative Error vs. Iteration Number
NMF
SVD
B: 500 x 200 random sparse matrix. Rank(NMF) = 80.
12
CUR Decomposition
**A C U R
m x n m x c
c x r r x n
• C (R) holds c (r) sampled and rescaled columns (rows) of A
• U is computed using C and R
• kCURrank )(where k is a rank parameter
,
Storage: (nz(C) + cr + nz(R)) entries
13
CUR Implementations
CUR algorithm in [3] by Drineas, Kannan, and Mahoney• Linear time algorithm
• Improvement: Compact Matrix Decomposition (CMD) in [6] by Sun, Xie, Zhang, and Faloutsos
• Modification: use ideas in [4] by Drineas, Mahoney, Muthukrishnan (no longer linear time)
• Other Modifications: our ideas
Deterministic CUR code by G. W. Stewart [2]
14
Sampling Column (Row) norm sampling [3]
• Prob(col j) = (similar for row i)
Subspace Sampling [4]• Uses rank-k SVD of A for column probabilities
• Prob(col j) =
• Uses “economy size” SVD of C for row probabilities
• Prob(row i) =
Sampling without replacement
22)(:, FAjA
kjV kA2
, :),(
ciUC2:),(
15
Computation of U
Linear U [3]: approximately solves
•
Optimal U: solves
•
FU
UCA ˆminˆ
RCCACCCU Tk
TTT )()(ˆ
2min F
U
CURA
)()( TTTT RRARCCCU
16
Deterministic CUR
Code by G. W. Stewart [2] Uses a RRQR algorithm that does not
store Q• We only need the permutation vector
• Gives us the columns (rows) for C (R)
Uses an optimal U
17
Compact Matrix Decomposition (CMD) Improvement
Remove repeated columns (rows) in C (R) Decreases storage while still achieving the
same relative error [6]
Algorithm [3] [3] with CMD
Runtime 0.008060 0.007153
Storage 880.5 550.5
Relative Error 0.820035 0.820035
A: 50 x 30 random sparse matrix, k = 15. Average over 10 runs.
18
CUR: Sampling with Replacement Validation
A: 5 x 3 random dense matrix. Average over 5 runs.Legend: Sampling, U
5 6 7 8 9 10 11 12 13 14 150
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5CUR Validation: Relative Error
c/r: number of columns/rows sampled
rela
tive
erro
r: F
robe
nius
nor
m
CN,L
CN,O
S,LS,O
SVD
500 600 700 800 900 1000 1100 1200 1300 1400 15000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4CUR Validation: Relative Error
c/r: number of columns/rows sampled
rela
tive
erro
r: F
robe
nius
nor
m
CN,L
CN,O
S,LS,O
SVD
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
x 105
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4CUR Validation: Relative Error
c/r: number of columns/rows sampled
rela
tive
erro
r: F
robe
nius
nor
m
CN,L
CN,O
S,LS,O
SVD
19
Sampling without Replacement: Scaling vs. No Scaling
Invert scaling factor applied to
• T
kT CCU )(
20
CUR: Sampling without Replacement Validation
A: 5 x 3 random dense matrix. Average over 5 runs.
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8CUR Validation: Relative Error, r = 2k, c = k
k: rank(CUR) <= k, rank(SVD) <= k
rela
tive
erro
r: F
robe
nius
nor
m
w/o R,L,w/o Sc
w/o R,L,Sc
SVD
40 60 80 100 120 140 160 180 2000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1CUR Validation: Relative Error, c = 3k, r = k
k: rank(CUR) <= k, rank(SVD) <= k
rela
tive
erro
r: F
robe
nius
nor
m
w/o R,L,w/o Sc
w/o R,L,ScSVD
Legend: Sampling, U, Scaling
B: 500 x 200 random sparse matrix. Average over 5 runs.
21
CUR Comparison
40 60 80 100 120 140 160 180 2000
0.2
0.4
0.6
0.8
1
1.2
1.4CUR Validation: Relative Error, r = c = 2k
k: rank(CUR) <= k, rank(SVD) <= k
rela
tive
erro
r: F
robe
nius
nor
m
CN,L
CN,OS,L
S,O
w/o R,L,w/o Sc
w/o R,L,ScD
SVD
40 60 80 100 120 140 160 180 2000
0.2
0.4
0.6
0.8
1
1.2
1.4CUR Validation: Relative Error, r = c = k
k: rank(CUR) <= k, rank(SVD) <= k
rela
tive
erro
r: F
robe
nius
nor
m
CN,L
CN,OS,L
S,O
w/o R,L,w/o Sc
w/o R,L,ScD
SVD
B: 500 x 200 random sparse matrix. Average over 5 runs.Legend: Sampling, U, Scaling
22
Judging Success: Precision and Recall
Measurement of performance for document retrieval
• Average precision and recall, where the average is taken over all queries in the data set
• Let Retrieved = number of documents retrieved,
Relevant = total number of relevant documents to the query,
RetRel = number of documents retrieved that are relevant.
• Precision:
• Recall:
Retrieved
RetRel)Retrieved( P
Relevant
RetRel)Retrieved( R
23
LSI Results
Term-document matrix size: 5831 x 1033. All matrix approximations are rank 100 approximations (CUR: r = c = k). Average query time is less than 10-3 seconds for all matrix approximations.
0 200 400 600 800 1000 12000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
number of documents retrieved
aver
age
prec
isio
n
Average Precision vs. number of documents retrieved
SVDNMF
CUR:cn,lin
CUR:cn,opt
CUR:sub,linCUR:sub,opt
CUR:w/oR,no
CUR:w/oR,yes
CUR:GWSLTM
0 200 400 600 800 1000 12000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
number of documents retrieved
aver
age
reca
ll
Average Recall vs. number of documents retrieved
SVDNMF
CUR:cn,lin
CUR:cn,opt
CUR:sub,linCUR:sub,opt
CUR:w/oR,no
CUR:w/oR,yes
CUR:GWSLTM
24
LSI Results
5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
number of documents retrieved
aver
age
prec
isio
n
Average Precision vs. number of documents retrieved
SVDNMF
CUR:cn,lin
CUR:cn,opt
CUR:sub,linCUR:sub,opt
CUR:w/oR,no
CUR:w/oR,yes
CUR:GWSLTM
5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
number of documents retrieved
aver
age
reca
ll
Average Recall vs. number of documents retrieved
SVDNMF
CUR:cn,lin
CUR:cn,opt
CUR:sub,linCUR:sub,opt
CUR:w/oR,no
CUR:w/oR,yes
CUR:GWSLTM
Term-document matrix size: 5831 x 1033.
All matrix approximations are rank 100 approximations. (CUR: r = c = k)
25
Matrix Approximation Results Rel. Error (F-norm) Storage (nz) Runtime (sec)
SVD 0.8203 686500 22.5664
NMF 0.8409 686400 23.0210
CUR: cn,lin 1.4151 17242 0.1741
CUR: cn,opt 0.9724 16358 0.2808
CUR: sub,lin 1.2093 16175 48.7651
CUR: sub,opt 0.9615 16108 49.0830
CUR: w/oR,no 0.9931 17932 0.3466
CUR: w/oR,yes 0.9957 17220 0.2734
CUR:GWS 0.9437 25020 2.2857
LTM -- 52003 --
26
Conclusions
We may not be able to store an entire term-document matrix and it may be too expensive to compute an SVD
We can achieve LSI results that are almost as good with cheaper approximations• Less storage
• Less computation time
27
Completed Project Goals
Code/validate NMF and CUR Analyze relative error, runtime, and
storage of NMF and CUR Improve CUR algorithm of [3] Analyze use of NMF and CUR in LSI
28
ReferencesMichael W. Berry, Murray Browne, Amy N. Langville, V. Paul Pauca, and Robert J. Plemmons. Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics and Data Analysis, 52(1):155-173, September 2007.
M.W. Berry, S.A. Pulatova, and G.W. Stewart. Computing sparse reduced-rank approximations to sparse matrices. Technical Report UMIACS TR-2004-34 CMSC TR-4591, University of Maryland, May 2004.
Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast Monte Carlo algorithms for matrices iii: Computing a compressed approximate matrix decomposition. SIAM Journal on Computing, 36(1):184-206, 2006.
Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844-881, 2008.
Tamara G. Kolda and Dianne P. O'Leary. A semidiscrete matrix decomposition for latent semantic indexing in information retrieval. ACM Transactions on Information Systems, 16(4):322-346, October 1998.
Jimeng Sun, Yinglian Xie, Hui Zhang, and Christos Faloutsos. Less is more: Sparse graph mining with compact matrix decomposition. Statistical Analysis and Data Mining, 1(1):6-22, February 2008.
[3]
[2]
[1]
[4]
[5]
[6]