![Page 1: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/1.jpg)
Randomized Algorithms for Linear Algebraic Computations & Applications
To access my web page:
Petros DrineasPetros Drineas
Rensselaer Polytechnic InstituteComputer Science Department
drineas
![Page 2: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/2.jpg)
Matrices and network applications
Matrices represent networks …
Graphs are often used in network applications. (e.g., network traffic monitoring, web structure analysis, social network mining, protein interaction networks, etc.)
Graphs are represented by matrices.(e.g., adjacency matrices, edge-incidence matrices, etc.)
More recently: time-evolving graphs, represented by tensors – multi-mode arrays – in recent literature.
Goal: discover patterns and anomalies in spite of the high dimensionality of data.
![Page 3: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/3.jpg)
Matrices and network applications
Matrices represent networks …
Graphs are often used in network applications. (e.g., network traffic monitoring, web structure analysis, social network mining, protein interaction networks, etc.)
Graphs are represented by matrices.(e.g., adjacency matrices, edge-incidence matrices, etc.)
More recently: time-evolving graphs, represented by tensors – multi-mode arrays – in recent literature.
Goal: discover patterns and anomalies in spite of the high dimensionality of data.
Linear algebra and numerical analysis provide the fundamental mathematical and algorithmic tools to deal with matrix and tensor computations.
![Page 4: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/4.jpg)
Randomized algorithms
Randomization and sampling allow us to design provably accurate algorithms for problems that are:
Massive
(e.g., matrices (graphs) so large that can not be stored at all, or can only be stored in slow, secondary memory devices)
Computationally expensive or even NP-hard
(e.g., combinatorial problems such as the Column Subset Selection Problem)
![Page 5: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/5.jpg)
Example: monitoring IP flows
Network administrator monitoring the (source, destination) IP flows over time:
m sources
n destinations
A ij = count of exchanged flows
between i-th source and j-th destination
Tasks
- Find patterns, summaries, anomalies, etc. in the static setting, or
- in the dynamic setting (tensor representation).
![Page 6: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/6.jpg)
Interplay
Theoretical Computer Science
Randomized and approximation algorithms
Numerical Linear Algebra
Matrix computations and Linear Algebra (ie., perturbation
theory)
Applications
Data mining & information retrieval (e.g., human genetics, internet data, electronic circuit testing data, etc.)
![Page 7: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/7.jpg)
Students and collaborators
Students Collaborators
Asif Javed (PhD, graduated) M.W. Mahoney (Stanford U)
Christos Boutsidis (PhD, 3rd year) J. Sun (IBM T.J. Watson Research Center)
Jamey Lewis (PhD, 2nd year) S. Muthu (Google)
Elena Sebe (undergrad, now at RPI) E. Ziv (UCSF, School of Medicine)
John Payne (undergrad, now at CMU) K. K. Kidd (Yale U, Dept. of Genetics)
Richard Alimi (undergrad, now at Yale) P. Paschou (Democritus U., Greece, Genetics)
Our group keeps close ties with:
Yahoo! Research
Sandia National Laboratories
Funding from NSF and Yahoo! Research.
![Page 8: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/8.jpg)
Overview
From the Singular Value Decomposition (SVD) to CUR-type decompositions
Additive and relative error CUR-type decompositions
Future directions
![Page 9: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/9.jpg)
The Singular Value Decomposition (SVD)
feature 1fe
atu
re 2
Object x
Object d
(d,x)
Matrix rows: points (vectors) in a Euclidean space,
e.g., given 2 objects (x & d), each described with respect to two features, we get a 2-by-2 matrix.
Two objects are “close” if the angle between their corresponding vectors is small.
m object
s
n features
![Page 10: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/10.jpg)
4.0 4.5 5.0 5.5 6.02
3
4
5
SVD, intuition
Let the blue circles represent m data points in a 2-D Euclidean space.
Then, the SVD of the m-by-2 matrix of the data will return …
1st (right) singular vector
1st (right) singular vector:
direction of maximal variance,
2nd (right) singular vector
2nd (right) singular vector:
direction of maximal variance, after removing the projection of the data along the first singular vector.
![Page 11: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/11.jpg)
4.0 4.5 5.0 5.5 6.02
3
4
5
1st (right) singular vector
2nd (right) singular vector
Singular values
1: measures how much of the data variance is explained by the first singular vector.
2: measures how much of the data variance is explained by the second singular vector.1
2
![Page 12: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/12.jpg)
SVD: formal definition
rank of A
U (V): orthogonal matrix containing the left (right) singular vectors of A.
: diagonal matrix containing the singular values of A.
Let 1 ¸ 2 ¸ … ¸ be the entries of .
Exact computation of the SVD takes O(min{mn2 , m2n}) time.
The top k left/right singular vectors/values can be computed faster using Lanczos/Arnoldi methods.
0
0
![Page 13: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/13.jpg)
A VTU=
objects
features
significant
noisenois
e noise
signifi
cant
sig.
=
Rank-k approximations via the SVD
![Page 14: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/14.jpg)
Rank-k approximations (Ak)
Uk (Vk): orthogonal matrix containing the top k left (right) singular vectors of A.
k: diagonal matrix containing the top k singular values of A.
![Page 15: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/15.jpg)
PCA and SVD
feature 1
fea
ture
2
Object x
Object d
(d,x)
Principal Components Analysis (PCA) essentially amounts to the computation of the Singular Value Decomposition (SVD) of a covariance matrix.
SVD is “the Rolls-Royce and the Swiss Army Knife of Numerical Linear Algebra.”*
*Dianne O’Leary, MMDS ’06
![Page 16: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/16.jpg)
Uk solves an optimization problem…
(1) It turns out that C = Uk is (one of many) solutions to the above problem.
(2) The minimal residual error is equal to the Frobenius norm of A-Ak.
(3) The above observations hold for any unitarily invariant norm (e.g., the spectral norm).
Notation: PCA is the projection of A on the subspace spanned by the columns of C.
Given an m£n matrix A, we seek an m-by-k matrix C that minimizes the residual
Frobenius norm:
![Page 17: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/17.jpg)
SVD issues …
SVD and PCA are often used to summarize and approximate matrices and they have enjoyed enormous success in data analysis.
BUT
![Page 18: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/18.jpg)
SVD issues …
SVD and PCA are often used to summarize and approximate matrices and they have enjoyed enormous success in data analysis.
BUT
- For large sparse graphs they require large amounts of memory, exactly because the resulting matrices are not sparse any more.
(Common networks such as the web, the Internet topology graph, the who-trusts-whom social network are all large and sparse.)
- Running time becomes an issue.
- Assigning “meaning” to the singular vectors (reification) is tricky …
![Page 19: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/19.jpg)
Alternative: CUR-type decompositions
1. A “sketch” consisting of a few rows/columns of the matrix may replace the SVD.
2. Rows/columns are drawn randomly, using various importance sampling schemes.
3. The choice of the sampling probabilities is critical for the accuracy of the approximation.
Create an approximation to the original matrix which can be stored in much less space.
O(1) columnsO(1) rows
Carefully chosen U
![Page 20: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/20.jpg)
Advantages
In applications where large, sparse matrices appear we can design algorithms for CUR-type decompositions that
• are computable after two “passes” (sequential READS) through the matrices,
• require O(m+n) RAM space (compare to O(mn) for the SVD),
• can be computed very fast (extra O(m+n) time after the two passes), and
• preserve sparsity.
![Page 21: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/21.jpg)
Advantages
In applications where large, sparse matrices appear we can design algorithms for CUR-type decompositions that
• are computable after two “passes” (sequential READS) through the matrices,
• require O(m+n) RAM space (compare to O(mn) for the SVD),
• can be computed very fast (extra O(m+n) time after the two passes), and
• preserve sparsity.
Caveat: accuracy loss.
![Page 22: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/22.jpg)
Some notation…
Given an m£n matrix A:
(i) A(i) denotes the i-th column of the matrix as a column vector,
(ii) A(i) denotes the i-th row of the matrix as a row vector,
(iii) |A(i)| or |A(i)| denotes the Euclidean norm of the corresponding row or column,
(iv)
Finally, will be an accuracy parameter in (0,1) and a failure probability.
![Page 23: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/23.jpg)
Computing CUR(Drineas & Kannan SODA ’03, Drineas, Kannan, & Mahoney SICOMP ’06)
C consists of c = O(1/ε2) columns of A and R consists of r = O(1/ε2) rows of A.
![Page 24: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/24.jpg)
Computing CUR(Drineas & Kannan SODA ’03, Drineas, Kannan, & Mahoney SICOMP ’06)
C consists of c = O(1/ε2) columns of A and R consists of r = O(1/ε2) rows of A.
C (and R) is created using importance sampling, e.g. columns (rows) are picked in i.i.d. trials with respect to probabilities
![Page 25: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/25.jpg)
Computing U
Intuition:
The CUR algorithm essentially expresses every row of the matrix A as a linear combination of a small subset of the rows of A.
• This small subset consists of the rows in R.
![Page 26: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/26.jpg)
Computing U
Intuition:
The CUR algorithm essentially expresses every row of the matrix A as a linear combination of a small subset of the rows of A.
• This small subset consists of the rows in R.
• Given a row of A – say A(i) – the algorithm computes a good fit for the row A(i) using the rows in R as the basis, by approximately solving
Notice that only c = O(1) element of the i-th row are given as input.
However, a vector of coefficients u can still be computed.
![Page 27: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/27.jpg)
Computing U
Given c elements of A(i) the algorithm computes a good fit for the row A(i) using the rows in R as the basis, by approximately solving:
In the process of computing U , we fix its rank to be a positive constant k, which is part of the input.
![Page 28: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/28.jpg)
Computing U
Given c elements of A(i) the algorithm computes a good fit for the row A(i) using the rows in R as the basis, by approximately solving:
In the process of computing U , we fix its rank to be a positive constant k, which is part of the input.
Note: Since CUR is of rank k, ||A-CUR||2,F > ||A-Ak||2,F.
Thus, we should choose a k such that ||A-Ak||2,F is small.
![Page 29: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/29.jpg)
Error bounds (Frobenius norm)
Assume Ak is the “best” rank k approximation to A (through SVD). Then
We need to pick O(k/ε2) rows and O(k/ε2) columns.
![Page 30: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/30.jpg)
Error bounds (2-norm)
Assume Ak is the “best” rank k approximation to A (through SVD). Then
We need to pick O(1/ε2) rows and O(1/ε2) columns and set k = (1/ε)-1.
![Page 31: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/31.jpg)
Application to network flow data(Sun, Xie, Zhang, & Faloutsos SDM ’07)
The data:
- A traffic trace consisting of TCP flow records collected at the backbone router of a class-B university network.
- Each record in the trace corresponds to a directional TCP flow between two hosts; timestamps indicate when the flow started and finished.
- Approx. 22,000 hosts/destinations; approx. 800,000 packets per hour between all host-destination pairs.
![Page 32: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/32.jpg)
Application to network flow data(Sun, Xie, Zhang, & Faloutsos SDM ’07)
The data:
- A traffic trace consisting of TCP flow records collected at the backbone router of a class-B university network.
- Each record in the trace corresponds to a directional TCP flow between two hosts; timestamps indicate when the flow started and finished.
- Approx. 22,000 hosts/destinations; approx. 800,000 packets per hour between all host-destination pairs.
Data from 10am-11am on January 6,
2005
![Page 33: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/33.jpg)
Application to network flow data (Sun, Xie, Zhang, & Faloutsos SDM ’07)
Goal:
- Detect abnormal (anomalous) hosts by measuring the reconstruction error for each host!
- In other words, if a host can not be accurately expressed as a linear combination of a small set of other hosts, then it potentially represents an anomaly and should be flagged.
![Page 34: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/34.jpg)
Application to network flow data (Sun, Xie, Zhang, & Faloutsos SDM ’07)
Goal:
- Detect abnormal (anomalous) hosts by measuring the reconstruction error for each host!
- In other words, if a host can not be accurately expressed as a linear combination of a small set of other hosts, then it potentially represents an anomaly and should be flagged.
Simulation experiments:
- Starting with real flow matrices, abnormalities are injected in the data by inserting:
Abnormal source hosts: A source host is randomly selected and all the corresponding row entries are set to 1 (scanner host that sends flows to every other destination in the network)
Abnormal destination hosts: A destination is randomly picked, and 90% of its corresponding column entries are set to 1 (denial of service attack from a large number of hosts).
![Page 35: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/35.jpg)
Application to network flow data (Sun, Xie, Zhang, & Faloutsos SDM ’07)
Goal:
- Detect abnormal (anomalous) hosts by measuring the reconstruction error for each host!
- In other words, if a host can not be accurately expressed as a linear combination of a small set of other hosts, then it potentially represents an anomaly and should be flagged.
Simulation experiments:
- Starting with real flow matrices, abnormalities are injected in the data by inserting:
Abnormal source hosts: A source host is randomly selected and all the corresponding row entries are set to 1 (scanner host that sends flows to every other destination in the network)
Abnormal destination hosts: A destination is randomly picked, and 90% of its corresponding column entries are set to 1 (denial of service attack from a large number of hosts).
Results:
By applying a variant of CUR (removing duplicate rows/columns) and by keeping about 500 columns and rows, recall is close to 100% and precision is close to 97%.
![Page 36: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/36.jpg)
More accurate CUR decompositions?
An alternative perspective:
Q. Can we find the “best” set of columns and rows to include in C and R?
Randomized and/or deterministic strategies are acceptable, even at the loss of efficiency (running time, memory, and sparsity).
Optimal U
![Page 37: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/37.jpg)
More accurate CUR decompositions?
An alternative perspective:
Prior results by E. E. Tyrtyrshnikov and collaborators (LAA ’97) implied rather weak error bounds if we choose the columns and rows of A that define a parallelpiped of maximal volume.
Q. Can we find the “best” set of columns and rows to include in C and R?
Randomized and/or deterministic strategies are acceptable, even at the loss of efficiency (running time, memory, and sparsity).
Optimal U
![Page 38: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/38.jpg)
Relative-error CUR-type decompositions
O(1) columnsO(1) rows
Carefully chosen U
Goal: make (some norm) of A-CUR small.
For any matrix A, we can find C, U and R such that the norm of A – CUR is almost equal to the norm of A-Ak.
![Page 39: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/39.jpg)
From SVD to relative error CUR
Exploit structural properties of CUR to analyze data:
Instead of reifying the Principal Components:
• Use PCA (a.k.a. SVD) to find how many Principal Components are needed to “explain” the data.
• Run CUR and pick columns/rows instead of eigen-columns and eigen-rows!
• Assign meaning to actual columns/rows of the matrix! Much more intuitive!
m objects
n features
![Page 40: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/40.jpg)
From SVD to relative error CUR
Exploit structural properties of CUR to analyze data:
Instead of reifying the Principal Components:
• Use PCA (a.k.a. SVD) to find how many Principal Components are needed to “explain” the data.
• Run CUR and pick columns/rows instead of eigen-columns and eigen-rows!
• Assign meaning to actual columns/rows of the matrix! Much more intuitive!
Caveat:
Relative-error CUR-type decompositions retain sparsity, but necessitate the same amount of time as the computation of Ak and O(mn) memory.
m objects
n features
![Page 41: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/41.jpg)
Theorem: relative error CUR(Drineas, Mahoney, & Muthukrishnan SIMAX ’08)
For any k, O(SVDk(A)) time suffices to construct C, U, and R s.t.
holds with probability at least .7, by picking
O( k log k / 2 ) columns, and
O( k log2k / 6 ) rows.
O(SVDk(A)): time to compute the top k left/right singular vectors and values of A.
![Page 42: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/42.jpg)
Comparison with additive error CUR
Let Ak be the “best” rank k approximation to A. Then, after two passes through A, we can pick O(k/4) rows and O(k/4) columns, such that
Additive error might prohibitively large in many applications!
This “coarse” CUR might not capture the relevant structure in the data.
![Page 43: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/43.jpg)
A constant factor CUR construction(Mahoney and Drineas, PNAS, to appear)
For any k, O(SVDk(A)) time suffices to construct C, U, and R s.t.
holds with probability at least .7, by picking
O( k log k / 2 ) columns and O( k log k / 2 ) rows.
![Page 44: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/44.jpg)
Relative-error CUR decomposition
1. How do we draw the columns and rows of A to include in C and R?
2. How do we construct U?
Create an approximation to A, using rows and columns of A
O(1) columnsO(1) rows
Carefully chosen U
Goal: Provide very good bounds for some norm of A – CUR.
![Page 45: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/45.jpg)
Step 1: subspace sampling for C
INPUT: matrix A, rank parameter k, number of columns c
OUTPUT: matrix of selected columns C
• Compute the probabilities pi;
• For each j = 1,2,…,n, pick the j-th column of A with probability min{1,cpj}
• Let C be the matrix containing the sampled columns;
(C has · c columns in expectation)
![Page 46: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/46.jpg)
Subspace sampling (Frobenius norm)
Remark: The rows of VkT are orthonormal vectors, but its columns
(VkT)(i) are not.
Subspace sampling
Vk: orthogonal matrix containing the top k right singular vectors of A.
k: diagonal matrix containing the top k singular values of A.
Normalization s.t. the pj sum up to 1
![Page 47: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/47.jpg)
Subspace sampling (Frobenius norm)
Remark: The rows of VkT are orthonormal vectors, but its columns
(VkT)(i) are not.
Subspace sampling
Vk: orthogonal matrix containing the top k right singular vectors of A.
k: diagonal matrix containing the top k singular values of A.
Normalization s.t. the pj sum up to 1
Leverage scores
(many references in the statistics community)
![Page 48: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/48.jpg)
Step 1: subspace sampling for R
INPUT: matrix A, rank parameter k, number of rows r
OUTPUT: matrix of selected rows R
• Compute the probabilities pi;
• For each j = 1,2,…,m, pick the j-th row of A with probability min{1,rpj}.
• Let R be the matrix containing the sampled rows.
(R has · r rows in expectation)
![Page 49: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/49.jpg)
Subspace sampling (Frobenius norm)
Subspace sampling
Uk: orthogonal matrix containing the top k left singular vectors of A.
k: diagonal matrix containing the top k singular values of A.
Normalization s.t. the pi sum up to 1
Leverage scores
(many references in the statistics community)
Remark: The columns of Uk are orthonormal vectors, but its rows (Uk)(i) are not.
![Page 50: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/50.jpg)
Algorithm CUR
C contains columns 7, 14, and 43 of A
R contains rows 10, 33, and 42 of A
U = C+AR+
(c = r = O(k log(k) /2) in worst case)
14 43
42
33
10
7
ColumnSelect on A
ColumnSelect on AT
U = C+AR+
Leverage (columns)
Leverage (rows)
![Page 51: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/51.jpg)
Computing subspace probabilities faster
Subspace sampling
(expensive)
Open problem: Is it possible to compute/approximate the subspace sampling probabilities faster?
![Page 52: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/52.jpg)
Computing subspace probabilities faster
Subspace sampling
(expensive)
Open problem: Is it possible to compute/approximate the subspace sampling probabilities faster?
Partial answer: Assuming k is O(1), it is known that by leveraging the Fast Johnson-Lindenstrauss transform of (Ailon & Chazelle STOC ’06) the singular vectors of a matrix can be approximated very accurately in O(mn) time.
(Sarlos FOCS ’06, Drineas, Mahoney, Muthukrishnan, & Sarlos ’07, as well as work by Tygert, Rokhlin, and collaborators in PNAS)
![Page 53: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/53.jpg)
Computing subspace probabilities faster
Subspace sampling
(expensive)
Open problem: Is it possible to compute/approximate the subspace sampling probabilities faster?
Partial answer: Assuming k is O(1), it is known that by leveraging the Fast Johnson-Lindenstrauss transform of (Ailon & Chazelle STOC ’06) the singular vectors of a matrix can be approximated very accurately in O(mn) time.
(Sarlos FOCS ’06, Drineas, Mahoney, Muthukrishnan, & Sarlos ’07, as well as work by Tygert, Rokhlin, and collaborators in PNAS)
Problem: To the best of my knowledge, this does not immediately imply an algorithm to (provably) approximate the subspace sampling probabilities.
![Page 54: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/54.jpg)
CUR decompositions: a summary
G.W. Stewart(Num. Math. ’99, TR ’04 )
C: variant of the QR algorithmR: variant of the QR algorithmU: minimizes ||A-CUR||F
No a priori boundsSolid experimental performance
Goreinov, Tyrtyshnikov, & Zamarashkin
(LAA ’97, Cont. Math. ’01)
C: columns that span max volumeU: W+
R: rows that span max volume
Existential resultError bounds depend on ||W+||2
Spectral norm bounds!
Williams & Seeger(NIPS ’01)
C: uniformly at randomU: W+
R: uniformly at random
Experimental evaluationA is assumed PSDConnections to Nystrom method
Drineas and Kannan(SODA ’03, SICOMP ’06)
C: w.r.t. column lengthsU: in linear/constant timeR: w.r.t. row lengths
Randomized algorithmProvable, a priori, boundsExplicit dependency on A – Ak
Drineas, Mahoney, & Muthukrishnan
(SODA ’06, SIMAX ’08)
C: depends on singular vectors of A. U: (almost) W+
R: depends on singular vectors of C
(1+) approximation to A – Ak
Computable in O(mnk2) time (experimentally)
Mahoney & Drineas(PNAS, to appear)
C: depends on singular vectors of A. U: C+AR+
R: depends on singular vectors of A
(2+) approximation to A – Ak
Computable in O(mnk2) time (experimentally)
![Page 55: Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer](https://reader030.vdocuments.net/reader030/viewer/2022032702/56649f425503460f94c624e3/html5/thumbnails/55.jpg)
Future directions
Faster algorithms Leveraging preprocessing tools (i.e., the Fast Johnson-Lindenstrauss Transform, Ailon-Chazelle ’06)
Deterministic variantsUpcoming work by C. Boutsidis
Matching lower bounds
Implementations and widely disseminated software
Similar results for non-linear dimensionality reduction techniques: kernel-PCA.