direct qr factorizations for tall-and-skinny matrices in mapreduce architectures (ieee bigdata)

35
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures Austin Benson ICME, Stanford University David Gleich (Purdue) and Jim Demmel (UC-Berkeley) A Q R m n m n n n IEEE DATA October 8, 2013

Upload: austin-benson

Post on 13-Jul-2015

68 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Direct QR factorizations for tall-and-skinnymatrices in MapReduce architectures

Austin BensonICME, Stanford University

David Gleich (Purdue) and Jim Demmel (UC-Berkeley)

A Q

R

m

n

m

n n

n

IEEEDATAOctober 8, 2013

Page 2: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Contributions 2

I Numerically stable and scalable algorithm for QR and SVD oftall-and-skinny matrices in MapReduce

I Performance and stability tradeoffs of several methods

I Performance model: prediction within a factor of two

I Code: https://github.com/arbenson/mrtsqr

Page 3: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

MapReduce overview 3

Two functions that operate on key value pairs:

(key , value)map−−→ (key , value)

(key , 〈value1, . . . , valuen〉)reduce−−−−→ (key , value)

shuffle stage between map and reduce to sort values by key.

Page 4: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

MapReduce overview 4

The programmer implements:

I map(key, value)

I reduce(key, 〈 value1, . . ., valuen 〉)

Handled by MapReduce framework, e.g., Hadoop:

I shuffle

I load balancing

I reading and writing data

I data serialization

I fault tolerance

I ...

Page 5: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

MapReduce Example: ColorCount 5

(key, value) input is (image id, image)

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

shuffle  

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1 Map Reduce

1

5

2

1

1

4 2

def ColorCountMap(key , val ) :for pixel in val :

yield ( pixel , 1)

def ColorCountReduce(key , vals ) :total = sum( vals )yield (key , total )

Page 6: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Why MapReduce? (for scientists) 6

MapReduce is restrictive! Why Bother?

I Easy

I load balancing

I structured data I/O

I fault tolerance

I cheap clusters with large data storage

Hadoop may not be the best option...

Generate lots of data on supercomputer

Post-process and analyze on MapReduce cluster

Page 7: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Tall-and-skinny matrices 7

What are tall-and-skinny matrices? m >> n

A m

n

Examples: rows are data samples; blocks of A are images from avideo; Krylov subspaces; unrolled tensors

Page 8: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Matrix representation 8

We have matrices, so what are the key-value pairs?

A =

1.0 0.02.4 3.70.8 4.29.0 9.0

(1, [1.0, 0.0])(2, [2.4, 3.7])(3, [0.8, 4.2])(4, [9.0, 9.0])

(key, value) → (row index, row)

Page 9: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Matrix representation: an example 9

Scientific example: (x, y, z) coordinates and model number:

((47570,103.429767811242,0,-16.525510963787,iDV7), [0.00019924

-4.706066e-05 2.875293979e-05 2.456653e-05 -8.436627e-06 -1.508808e-05

3.731976e-06 -1.048795e-05 5.229153e-06 6.323812e-06])

Figure: Aircraft simulation data. Aero/Astro Department, Stanford

Page 10: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Tall-and-skinny matrices 10

Tall-and-skinny: m� n

A m

n

Slightly more rigorous definition:It is “cheap” to pass O(n2) data to all processors.

Page 11: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Quick QR and SVD review 11

A Q R

VT

n

n

n

n

n

n

A U n

n

n

n

n

n

Σ

n

n

Figure: Q, U, and V are orthogonal matrices. R is upper triangular andΣ is diagonal with decreasing, nonnegative entries.

Page 12: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Tall-and-skinny QR 12

A Q

R

m

n

m

n n

n

Tall-and-skinny (TS): m >> n. QTQ = I .

Page 13: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

TS-QR → TS-SVD 13

A Q

R Σ   VT  

Q

UR

U

R is small, so computing its SVD is cheap.

Page 14: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Why Tall-and-skinny QR and SVD? 14

1. Regression with many samples

2. Principle Component Analysis (PCA)

3. Model Reduction

Pressure, Dilation, Jet Engine

Figure: Dynamic mode decomposition of the screech of a jet. JoeNichols, University of Minnesota.

Page 15: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Cholesky QR 15

Cholesky QR

ATA = (QR)T (QR) = RTQTQR = RTR

I Computing ATA in MapReduce is easy and well-studied.

I We call this Cholesky QR.

Page 16: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Cholesky QR: Getting Q 16

Q = AR−1.

A

A1 R-1 map

R

Q1

A2 R-1 map

Q2

A3 R-1 map

Q3

A4 R-1 map

Q4

emit

emit

emit

emit

distribute

Local MatMul

Page 17: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Stability problems 17

I Can get Q = AR−1

I Problem: Columns can be far from orthogonal and ATAsquares condition number (data later)

I Idea: Use a more advanced algorithm.

Page 18: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Communication-avoiding TSQR 18

A =

A1

A2

A3

A4

︸ ︷︷ ︸8n×4n

=

Q1

Q2

Q3

Q4

︸ ︷︷ ︸

8n×4n

R1

R2

R3

R4

︸ ︷︷ ︸4n×n

=

=Q︷ ︸︸ ︷Q1

Q2

Q3

Q4

︸ ︷︷ ︸

8n×4n

Q̃︸︷︷︸4n×n

R︸︷︷︸n×n

Demmel et al. 2008

Page 19: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Communication-avoiding TSQR 19

A =

A1

A2

A3

A4

︸ ︷︷ ︸8n×4n

=

Q1

Q2

Q3

Q4

︸ ︷︷ ︸

8n×4n

R1

R2

R3

R4

︸ ︷︷ ︸4n×n

Ai = QiRi can be computed in parallel. If we only need R, thenwe can throw out the intermediate Qi factors.

Page 20: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

MapReduce TSQR 20

S(1)

A

A1

A2

A3

A3

R1 map

A2

emit R2 map

A3

emit R3 map

A4

emit R4 map

shuffle

S1

A2

reduce

S2 R2,2

reduce

R2,1 emit

emit

emit

shuffle

A2 S3 R2,3

reduce emit

Local TSQR

identity map

A2 S(2) R reduce emit

Local TSQR Local TSQR

Figure: S (1) is the matrix consisting of the rows of all of the Ri factors.Similarly, S (2) consists of all of the rows of the R2,j factors.

Page 21: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

MapReduce TSQR: Getting Q 21

I Again: have R, want Q

A = QR → Q = AR−1

I We call this method Indirect TSQR.

I Problem: Q can be far from orthogonal (again).

Page 22: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Indirect TSQR: Iterative Refinement 22

Iterative refinement: repeat TSQR for a more orthogonal Q

A

A1 R-1 map

R

Q1

A2 R-1 map

Q2

A3 R-1 map

Q3

A4 R-1 map

Q4

emit

emit

emit

emit

distribute

TSQ

R

Q

Q1 R1

-1 map

R1

Q1

Q2 R1

-1 map

Q2

Q3 R1

-1 map

Q3

Q4 R1

-1 map

Q4

emit

emit

emit

emit

distribute

Local MatMul Local MatMul

Iterative Refinement step

Page 23: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Indirect TSQR: a randomized approach 23

I Idea: Take a small sample of rows of A and form Rs

I Refinement step by Qs = AR−1s , Qs → R1, Q = QsR

−11

I R = R1Rs , QTQ ≈ I for ill-conditioned A

I Theory on why this works, need ≈ 100n log n rows[Mahoney 2011], [Avron, Maymounkov, and Toledo 2010],[Ipsen and Wentworth 2012]

We call this Pseudo-Iterative Refinement

Page 24: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Pseudo-Iterative Refinement 24

A

A1 Rs

-1 map

Rs

Q1

A2 Rs

-1 map

Q2

A3 Rs

-1 map

Q3

A4 Rs

-1 map

Q4

emit

emit

emit

emit

distribute

TSQ

R

Q

Q1 R1

-1 map

R1

Q1

Q2 R1

-1 map

Q2

Q3 R1

-1 map

Q3

Q4 R1

-1 map

Q4

emit

emit

emit

emit

distribute

Local MatMul Local MatMul

Iterative Refinement step Form Qs

A1

TSQ

R

Form Rs

(In the implementation, combine AR−1s and TSQR in one pass)

Page 25: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Direct TSQR 25

Why is computing truly orthogonal Q difficult in MapReduce?

I Orthogonality is a global property, but we compute locally.

I Can only label data via keys and file names.

Page 26: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Communication-avoiding TSQR 26

A =

A1

A2

A3

A4

︸ ︷︷ ︸8n×4n

=

Q1

Q2

Q3

Q4

︸ ︷︷ ︸

8n×4n

R1

R2

R3

R4

︸ ︷︷ ︸4n×n

=

Q1

Q2

Q3

Q4

︸ ︷︷ ︸

8n×4n

Q1,2

Q2,2

Q3,2

Q4,2

︸ ︷︷ ︸4n×n

R︸︷︷︸n×n

=

Q1Q1,2

Q2Q2,2

Q3Q1,2

Q4Q1,2

︸ ︷︷ ︸

8n×n

R︸︷︷︸n×n

= QR

Page 27: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Gathering Q 27

R1

R2

R3

R4

︸ ︷︷ ︸

n·#(mappers)×n

=

Q1,2

Q2,2

Q3,2

Q4,2

︸ ︷︷ ︸

n·#(mappers)×n

R︸︷︷︸n×n

I Idea: Compute QR (n ·#(mappers) rows) in serial.

I Idea: Pass Qi ,2 (n rows each) in second pass to reconstruct Q.

I We call this Direct TSQR

Page 28: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Direct TSQR: Steps 1 and 2 28

A

A1 R1 map

emit Q1 emit

A2 R2 map

emit Q2

emit

A3 R3 map

emit Q3

emit

A4 R4 map

emit Q4

emit

First step

R1

R2

R3

R4

Q1,2

Q2,2

Q3,2

Q4,2

R

reduce

emit

emit

emit

emit

emit

Second step

shuffle

Page 29: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Direct TSQR: Step 3 29

Q1 Q1,2

emit Q map

Q2 Q2,2

emit Q map

Q3 Q3,2

emit Q map

Q41 Q4,2

emit Q map

Q12 Q22

Q32 Q42 distribute

Third step

Page 30: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Stability 30

100

102

104

106

108

1010

1012

1014

1016

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

κ2(A)

||Q

TQ

− I||

2Numerical stability: 10,000x10 matrices

Indir. TSQR + PIR

Dir. TSQR

Indir. TSQR

Indir. TSQR + IR

Chol.

Chol. + IR

Page 31: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Performance model 31

I Only count reads and writes

I Streaming benchmark for read and write bandwidth of system

I Within a factor of two of experimental data for all algorithms

I I/O dominates runtime

I Algorithms take same time as a few passes over data

Page 32: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Performance 32

4B x 4(134.6 GB)

2.5B x 10(193.1 GB)

600M x 25(112.0 GB)

500M x 50(183.6 GB)

150M x 100(109 GB)

Matrix size

0

1000

2000

3000

4000

5000

6000

7000Time to solution (seconds)

Performance of QR algorithms on MapReduce

Chol

Indir TSQR

Chol + PIR

Indir TSQR + PIR

Chol + IR

Indir TSQR + IR

Direct TSQR

Page 33: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Direct TSQR: recursive extension 33

R1

R2

R3

R4

︸ ︷︷ ︸

n·#(mappers)×n

TSQR−−−−→

Q1,2

Q2,2

Q3,2

Q4,2

︸ ︷︷ ︸

n·#(mappers)×n

R︸︷︷︸n×n

I n ·#(mappers) rows is too large → recurse

Page 34: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

Direct TSQR: recursive performance 34

0 50 100 150 2000

2000

4000

6000

number of columns

runnin

g tim

e (

s)

150M rows

0 50 100 150 200 2500

2000

4000

6000

8000

number of columns

runnin

g tim

e (

s)

100M rows

0 50 100 150 200 250 3000

5000

10000

15000

number of columns

runnin

g tim

e (

s)

50M rows

no recursion

recursion

no recursion

recursion

no recursion

recursion

Page 35: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (IEEE BigData)

End 35

Contributions:

I Numerically stable and scalable QR

I Performance and stability tradeoffs

I Performance model

I Code: https://github.com/arbenson/mrtsqr

Contact:

I [email protected]