sketching as a tool for numerical linear algebrasassadi/stuff/presentations/sketch-num-1.pdf ·...

Post on 19-Aug-2018

232 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Sketching as a Tool for Numerical LinearAlgebra

David P. Woodruffpresented by Sepehr Assadi

o(n) Big Data Reading GroupUniversity of Pennsylvania

February, 2015

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 1 / 25

GoalNew survey by David Woodruff:

I Sketching as a Tool for Numerical Linear AlgebraTopics:

I Subspace EmbeddingsI Least Squares RegressionI Least Absolute Deviation RegressionI Low Rank ApproximationI Graph SparsificationI Sketching Lower Bounds

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 2 / 25

GoalNew survey by David Woodruff:

I Sketching as a Tool for Numerical Linear AlgebraTopics:

I Subspace EmbeddingsI Least Squares RegressionI Least Absolute Deviation RegressionI Low Rank ApproximationI Graph SparsificationI Sketching Lower Bounds

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 3 / 25

IntroductionYou have “Big” data!

I Computationally expensive to deal withI Excessive storage requirementI Hard to communicateI . . .

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 4 / 25

IntroductionYou have “Big” data!

I Computationally expensive to deal withI Excessive storage requirementI Hard to communicateI . . .

Summarize your dataI Sampling

F A representative subset of the dataI Sketching

F An aggregate summary of the whole data

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 5 / 25

ModelInput:

I matrix A ∈ Rn×d

I vector b ∈ Rn .Output: function F(A,b, . . .)

I e.g. least square regressionDifferent goals:

I Faster algorithmsI StreamingI Distributed

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 6 / 25

Linear SketchingInput:

I matrix A ∈ Rn×d

Let r n and S ∈ Rr×n be a random matrixLet S ·A be the sketchCompute F(S ·A) instead of F(A)

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 7 / 25

Linear Sketching (cont.)Pros:

I Compute on a r × d matrix instead of n × dI Smaller representation and faster computationI Linearity:

F S · (A + B) = S ·A + S ·BF We can compose linear sketches !

Cons:I F(S ·A) is an approximation of F(A)

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 8 / 25

Least Square Regression (`2-regression)Input:

I matrix A ∈ Rn×d (full column rank)I vector b ∈ Rn

Output x∗ ∈ Rd :

x∗ = arg minx‖Ax− b‖2

Closed form solution:

x∗ = (ATA)−1ATb

Θ(nd2)-time algorithm using naive matrix multiplication

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 9 / 25

Approximate `2-regressionInput:

I matrix A ∈ Rn×d (full column rank)I vector b ∈ Rn

I parameter 0 < ε < 1Output x ∈ Rd :

‖Ax− b‖2 ≤ (1 + ε) arg minx‖Ax− b‖2

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 10 / 25

Approximate `2-regression (cont.)A sketching algorithm:

I Sample a random matrix S ∈ Rr×n

I Compute S ·A and S · bI Output x = arg minx ‖(SA)x− (Sb)‖2

Which randomized family of matrices S and what value of r?

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 11 / 25

Approximate `2-regression (cont.)An introductory construction:

I Let r = Θ(d/ε2)I Let S ∈ Rr×n be a matrix of i.i.d normal random variables with

mean zero and variance 1/r

Proof Sketch.On the board

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 12 / 25

Approximate `2-regression (cont.)Problems:

I Computing S ·A takes Θ(nrd) timeI Constructing S requires Θ(nr) space

Different constructions for S:I Fast Johnson-Lindenstrauss transforms:

O(nd log d) + poly(d/ε) time [Sarlos, FOCS ’06]I Optimal O(nnz(A)) + poly(d/ε) time algorithm [Clarkson,

Woodruff, STOC ’13]I Random sign matrices with Θ(d)-wise independent entries:

O(d2/ε log (nd))-space streaming algorithm [Clarkson,Woodruff, STOC ’09]

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 13 / 25

Subspace EmbeddingDefinition (`2-subspace embedding)A (1± ε) `2-subspace embedding for a matrix A ∈ Rn×d is a matrixS for which for all x ∈ Rn

‖SAx‖22 = (1± ε) ‖Ax‖2

2

Actually subspace embedding for column space of AOblivious `2-subspace embedding

I The distribution from which S is chosen is oblivious to AOne very common tool for (oblivious) `2-subspace embedding isJohnson-Lindenstrauss transform (JLT)

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 14 / 25

Johnson-Lindenstrauss transformDefinition (JLT(ε, δ, f ))A random matrix S ∈ Rr×d forms a JLT(ε, δ, f ), if with probability atleast 1− δ, for any f -element subset V ⊆ Rn, it holds that:

∀ v,v′ ∈ V |〈Sv,Sv′〉 − 〈v,v′〉| ≤ ε ‖v‖2 ‖v′‖2

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 15 / 25

Johnson-Lindenstrauss transformDefinition (JLT(ε, δ, f ))A random matrix S ∈ Rr×d forms a JLT(ε, δ, f ), if with probability atleast 1− δ, for any f -element subset V ⊆ Rn, it holds that:

∀ v,v′ ∈ V |〈Sv,Sv′〉 − 〈v,v′〉| ≤ ε ‖v‖2 ‖v′‖2

Usual statement (i.e. original Johnson-Lindenstrauss Lemma)

Lemma (JLL)Given N points q1, . . . ,qN ∈ Rn, there exists a matrix S ∈ Rt×n

(linear map) for t = Θ(log N/ε2) such that with high probability,simultaneously for all pairs qi and qj ,

‖S(qi − qj)‖2 = (1± ε) ‖(qi − qj)‖2

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 16 / 25

Johnson-Lindenstrauss transform (cont.)A simple construction of JLT(ε, δ, f ):

TheoremLet 0 < ε, δ < 1 and S = 1√

r R ∈ Rr×n where the entries Ri,j areindependent standard normal random variables. Assumingr = Ω(ε−2 log (f /δ)) then S is a JLT(ε, δ, f ).

Other constructions:I Random sign matrices

[Achlioptas, ’03],[Clarkson, Woodruff, STOC ’09]I Random sparse matrices

[Dasgupta, Kumar, Sarlos, STOC ’10],[Kane, Nelson, J. ACM’14]

I Fast Johnson-Lindenstrauss transforms[Ailon, Chazelle, STOC ’06]

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 17 / 25

JLT results in `2-subspace embeddingClaimS = JLT(ε, δ, f ) is an oblivious `2-subspace embedding for A ∈ Rn×d

Challenge:I JLT(ε, δ, f ) provides a guarantee for a single finite set in Rn

I `2-subspace embedding requires the guarantee for an infiniteset, i.e. the column space of A

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 18 / 25

JLT results in `2-subspace embedding (cont.)Let S be the unit sphere in column space of A

S = y ∈ Rn | y = Ax for some x ∈ Rd and ‖y‖2 = 1

We seek a finite subset N ⊆ S so that if

∀ w,w′ ∈ N 〈Sw,Sw′〉 = 〈w,w′〉 ± ε

then∀ y ∈ S ‖Sy‖ = (1± ε) ‖y‖

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 19 / 25

JLT results in `2-subspace embedding (cont.)

Lemma (12-net for S)

Suffices to choose any N such that

∀y ∈ S ∃w ∈ N s.t. ‖y−w‖2 ≤ 1/2

Proof.1 Decompose y:

y = y(0) + y(1) + y(2) + . . .

where∥∥∥y(i)

∥∥∥2≤ 1

2i and yi

‖y(i)‖ ∈ N

2 ‖Sy‖22 =

∥∥∥S(y(0) + y(1) + y(2) + . . .)∥∥∥ = 1±O(ε)

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 20 / 25

12-net of SLemmaThere exists a 1

2 -net N of S for which |N | ≤ 5d

Proof.1 Find a set N ′ of maximal number of points in Rd so that no two

points are within 1/2 distance from each other2 Let U be the orthonormal matrix of column space of A3 N = y ∈ Rn | y = Ux for some x ∈ N ′ and ‖y‖2 = 1

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 21 / 25

Subspace Embedding via JLTTheoremLet 0 < ε, δ < 1 and S = JLT(ε, δ, 5d). For any fixed matrixA ∈ Rn×d , with probability 1− δ, S is a (1± ε) `2-subspaceembedding for A, i.e.

∀x ∈ Rd , ‖SAx‖2 = (1± ε) ‖Ax‖2

Results inI O(nnz(A) · ε−1 log d) time algorithm using column-sparsity

transform of Kane and Nelson [Kane, Nelson, J. ACM ’14]I O(nd log n) time algorithm using Fast Johnson-Lindenstrauss

transform of Ailon and Chazelle [Ailon, Chazelle, STOC ’06]

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 22 / 25

Other Subspace Embedding AlgorithmsNot JLT-based subspace embedding

I O(nnz(A)) + poly(d/ε) time algorithm [Clarkson, Woodruff,STOC ’13]

None oblivious subspace embeddingsI Based on Leverage Score Sampling [Drineas, Mahoney,

Muthukrishnan, SODA ’06]

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 23 / 25

`2-regression via Oblivious Subspace EmbeddingTheoremLet S ∈ Rr×n be any oblivious subspace embedding matrix andx = arg minx ‖SAx− Sb‖2; then,

‖SAx− Sb‖2 ≤ (1 + ε) arg minx‖Ax− b‖2

Proof.1 Let matrix U ∈ Rn×(d+1) be the orthonormal basis of columns of

A together with vector b2 Suppose S is a `2-subspace embedding for U

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 24 / 25

Questions?

Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 25 / 25

top related