making static pivoting scalable and dependable

68
Making Static Pivoting Scalable and Dependable Ph.D. Dissertation Talk E. Jason Riedy [email protected] EECS Department University of California, Berkeley Committee: Dr. James Demmel (chair), Dr. Katherine Yelick, Dr. Sanjay Govindjee 17 December, 2010

Upload: jason-riedy

Post on 11-May-2015

558 views

Category:

Technology


0 download

DESCRIPTION

Thesis presentation

TRANSCRIPT

Page 1: Making Static Pivoting Scalable and Dependable

Making Static Pivoting Scalable and DependablePh.D. Dissertation Talk

E. Jason [email protected]

EECS DepartmentUniversity of California, Berkeley

Committee: Dr. James Demmel (chair), Dr. Katherine Yelick, Dr. Sanjay Govindjee

17 December, 2010

Page 2: Making Static Pivoting Scalable and Dependable

Outline

1 Introduction

2 Solving Ax = b dependably

3 Extending dependability to static pivoting

4 Distributed matching for static pivoting

5 Summary

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 2 / 59

Page 3: Making Static Pivoting Scalable and Dependable

Motivation: Ever Larger Ax = b

Systems Ax = b are growing larger, more difficultOmega3P: n = 7.5 million with τ = 300 million entries

Quantum Mechanics: precondition with blocks of dimension200-350 thousand

Large barrier-based optimization problems: Many solves, similarstructure, increasing condition number

Huge systems are generated, solved, and analyzed automatically.

Large, highly unsymmetric systems need scalable parallel solvers.

Low-level routines: No expert in the loop!

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 3 / 59

Page 4: Making Static Pivoting Scalable and Dependable

Motivation: Solving Ax = b better

Many people work to solve Ax = b faster.Today we start with how to solve it better.

Better enables faster.

Use extra floating-point precision within iterative refinement toobtain a dependable solution, adding O(n2) work after an O(n3)factorization.

Accelerate sparse factorization through static pivoting,decoupling symbolic, numeric phases.

Refine the perturbed solution without needing extra triangularsolves for condition estimation.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 4 / 59

Page 5: Making Static Pivoting Scalable and Dependable

Contributions

Iterative refinementExtend iterative refinement to provide small forward errorsdependably (to be defined)

Set and use a methodology to demonstrate dependability

Show that condition estimation (expensive for sparse systems) isnot necessary for obtaining a dependable solution

Static pivotingImprove static pivoting heuristics

Demonstrate that an approximate maximum weight bipartitematching is faster and just as accurate

Develop a memory-scalable distributed memory auctionalgorithm for static pivoting

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 5 / 59

Page 6: Making Static Pivoting Scalable and Dependable

Defining “dependable”

A dependable solver for Ax = b returns a result x with small erroroften enough that you expect success with a small error, and clearlysignals results that likely contain large errors.

True error Difficulty Alg. reports w/likeliness

O(mach. precision) not bad success Very likelyfailure Somewhat rare

larger not bad success (not yet seen)failure Practically certain

O(mach. precision) difficult success Whenever feasiblefailure Practically certain

larger difficult success (not yet seen)failure Very likely

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 6 / 59

Page 7: Making Static Pivoting Scalable and Dependable

Introducing the errors and targets

(A, b)

(A, b)

y1

x

A−1 b

A−1b

LU: Small backward error

Difficulty

Err

or

2−45

2−40

2−35

2−30

2−25

25

210

215

220

225

Percent

1%

2%

3%

4%

LU: Error in y ∝ difficulty

Difficulty

Err

or

2−30

2−20

2−10

20

20

25

210

215

220

225

230

Percent

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 7 / 59

Page 8: Making Static Pivoting Scalable and Dependable

Introducing the errors and targets

(A, b)

(A, b)

y1

x

yk

ykA−1 b

A−1b

Refined: Accepted with small errors in y , or flagged with unknown error.

Difficulty

Err

or

2−60

2−50

2−40

2−30

2−20

2−10

20

Successful

210

220

230

240

Flagged

210

220

230

240

% of systems

0.2%

0.4%

0.6%

0.8%

1.0%

1.2%

1.4%

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 7 / 59

Page 9: Making Static Pivoting Scalable and Dependable

Iterative refinement

Newton’s method applied to Ax = b.

Repeat until done:

1 Compute the residual ri = b − Ayi using extra precision εr .

2 Solve Ady i = ri for the correction using working precision εw .

3 Increment yi+1 = yi + dy i , maintaining y to extra precision εx .

Precisions:

Working precision εw The precision used for storing (and factoring)A: IEEE754 single (εw = 2−24), double (εw = 2−53), etc.

Residual precision εr At least double working precision, εr ≤ ε2wSolution precision εx At least double working precision, εx ≤ ε2wLatter two may be implemented in software.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 8 / 59

Page 10: Making Static Pivoting Scalable and Dependable

Definitions

Errors:I Backward (relative) errorI Forward (relative) error

Difficulty:I Condition numbers: sensitivity to perturbationsI Element growth: error from factorization

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 9 / 59

Page 11: Making Static Pivoting Scalable and Dependable

Error measures: Backward error

How close is the nearest system satisfying Ay1 = b?

(A, b)

(A, b)

y1

x

A−1 b

A−1b

Three ways, given r1 = b − Ay1:

Normwise ‖r1‖∞‖A‖∞ ‖y1‖∞+‖b‖∞

Columnwise ‖r1‖∞(max |A|) |y1|+‖b‖∞

Componentwise ‖ |r1||A| |y1|+|b|‖∞

Note: Elementwise division, 0/0 = 0,

and max produces a row vector

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 10 / 59

Page 12: Making Static Pivoting Scalable and Dependable

Error measures: Forward error

How close is y1 to x?

(A, b)

(A, b)

y1

x

A−1 b

A−1bTwo ways and two measuring sticks:

Normwise ‖y1−x‖∞‖x‖∞

Componentwise ‖ y1−xx‖∞

‖y1−x‖∞‖y1‖∞

‖ y1−xy1‖∞

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 10 / 59

Page 13: Making Static Pivoting Scalable and Dependable

Error sensitivity: Conditioning

How sensitive is y1 to perturbations in A and b?

(A, b)

(A, b)

y1

x

A−1 b

A−1b

forward error ≤ condition number × backward error

Each combination has a condition number. We choose two for use inour difficulty measure.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 10 / 59

Page 14: Making Static Pivoting Scalable and Dependable

Difficulty: condition number × element growth

Condition number:

Backward error κ(A−1) = κ(A) = ‖A−1‖∞ ‖A‖∞Normwise forw. err.

κ(A, x , b) = ‖A−1‖∞ (‖A‖∞ ‖x‖∞ + ‖b‖∞)Componentwise forw. err.

ccond(A, x , b) = ‖ |A−1| (|A| |x |+ |b|)‖∞Element growth, est. δAi in (A + δAi)y = b:

|δAi | ≤ 3nd |L| |U | ≤ p(nd)g1r max |A|We use a col.-scaling-indep. expression allowing |L| > 1,

gc = maxj(max1≤k≤j maxi |L|(i ,k))·(maxi |U|(i ,j))

maxi |A|(i ,j)

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 11 / 59

Page 15: Making Static Pivoting Scalable and Dependable

Dense test systems

30× 30 single, double, complex, and double complex:250k, 4 right-hand sides, 1M test systems

Size chosen to sample ill-conditioned region wellGenerated as in Demmel, et al., plus b → x

κ∞(A) = ‖A−1‖∞ ‖A‖∞

Difficulty

Pe

rce

nt

of

po

pu

latio

n

0%

5%

10%

15%

0%

5%

10%

15%

Single

Complex

20

210

220

230

240

250

260

270

Double

Double Complex

20

210

220

230

240

250

260

270

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 12 / 59

Page 16: Making Static Pivoting Scalable and Dependable

Dense test systems

30× 30 single, double, complex, and double complex:250k, 4 right-hand sides, 1M test systems

Size chosen to sample ill-conditioned region wellGenerated as in Demmel, et al., plus b → x

κ(A, x , b) = ‖A−1‖∞ (‖A‖∞ ‖x‖∞ + ‖b‖∞)

Difficulty

Pe

rce

nt

of

po

pu

latio

n

0%

2%

4%

6%

8%

10%

12%

14%

0%

2%

4%

6%

8%

10%

12%

14%

Single

Complex

20

210

220

230

240

250

260

270

Double

Double Complex

20

210

220

230

240

250

260

270

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 12 / 59

Page 17: Making Static Pivoting Scalable and Dependable

Dense test systems

30× 30 single, double, complex, and double complex:250k, 4 right-hand sides, 1M test systems

Size chosen to sample ill-conditioned region wellGenerated as in Demmel, et al., plus b → x

ccond(A, x , b) = ‖|A−1| (|A| |x |+ |b|)‖∞

Difficulty

Pe

rce

nt

of

po

pu

latio

n

0%

2%

4%

6%

8%

10%

12%

0%

2%

4%

6%

8%

10%

12%

Single

Complex

20

220

240

260

280

Double

Double Complex

20

220

240

260

280

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 12 / 59

Page 18: Making Static Pivoting Scalable and Dependable

Results: Dependable errors

Difficulty

Err

or

2−602−502−402−302−202−10

20

2−602−502−402−302−202−10

20

2−602−502−402−302−202−10

20

2−602−502−402−302−202−10

20

nberr

20 210220230240

colberr

20 210220230240

cberr

20 210220230240

nferr

20 210220230240

nferrx

20 210220230240

cferr

20 210220230240

cferrx

20 210220230240

Converged

No P

rogressU

nstableIteration Lim

it

% of systems

10−5

10−4

10−3

10−2

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 13 / 59

Page 19: Making Static Pivoting Scalable and Dependable

How?

Difficulty

Err

or

2−60

2−50

2−40

2−30

2−20

2−10

20

cberr

25 210 215 220 225 230 235 240

cferr

25 210 215 220 225 230 235 240

% of systems0.00%0.01%0.10%1.00%

Carry the intermediate soln. yi to twice the working precision.

Refine the backward error down to nearly ε2w .

By “forward error ≤ conditioning × backward error”, theforward error for well-enough conditioned problems is nearly εw .

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 14 / 59

Page 20: Making Static Pivoting Scalable and Dependable

How?

Difficulty

Err

or

2−60

2−50

2−40

2−30

2−20

2−10

20

cberr

25 210 215 220 225 230 235 240

cferr

25 210 215 220 225 230 235 240

% of systems0.00%0.01%0.10%1.00%

Carry the intermediate soln. yi to twice the working precision.

Refine the backward error down to nearly ε2w .

By “forward error ≤ conditioning × backward error”, theforward error for well-enough conditioned problems is nearly εw .

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 14 / 59

Page 21: Making Static Pivoting Scalable and Dependable

Results: Comparison with xGESVXX

Precision Accepted Rejectedwell ill well ill

Single 79% 15% 1% 5%Single complex 76% 19% 1% 4%Double 87% 9% 1% 5%Double complex 85% 11% 1% 3%

Accepted, ill-conditioned systems are those gained by our routinethat xGESVXX rejects.

Rejected, well-conditioned systems are those lost by our routinebut accepted by xGESVXX.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 15 / 59

Page 22: Making Static Pivoting Scalable and Dependable

Results: Iteration counts, single precision

Difficulty

# I

tera

tio

ns

5

10

15

20

25

30

5

10

15

20

25

30

5

10

15

20

25

30

5

10

15

20

25

30

nberr

20

210

220

230

240

colberr

20

210

220

230

240

cberr

20

210

220

230

240

ndx

20

210

220

230

240

cdx

20

210

220

230

240

Co

nve

rge

dN

o P

rog

ress

Un

sta

ble

Itera

tion

Lim

it

% of systems

1%

2%

3%

4%

5%

6%

Set limit at five.Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 16 / 59

Page 23: Making Static Pivoting Scalable and Dependable

Results: Iteration counts, single complex precision

Difficulty

# I

tera

tio

ns

5

10

15

20

25

30

5

10

15

20

25

30

5

10

15

20

25

30

5

10

15

20

25

30

nberr

20

252

102

152

202

252

302

35

colberr

20

252

102

152

202

252

302

35

cberr

20

252

102

152

202

252

302

35

ndx

20

252

102

152

202

252

302

35

cdx

20

252

102

152

202

252

302

35

Co

nve

rge

dN

o P

rog

ress

Un

sta

ble

Itera

tion

Lim

it

% of systems

2%

4%

6%

8%

Set limit at seven.Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 17 / 59

Page 24: Making Static Pivoting Scalable and Dependable

Results: Iteration counts, double precision

Difficulty

# I

tera

tio

ns

5

10

15

20

25

30

5

10

15

20

25

30

5

10

15

20

25

30

5

10

15

20

25

30

nberr

20

220

240

260

colberr

20

220

240

260

cberr

20

220

240

260

ndx

20

220

240

260

cdx

20

220

240

260

Co

nve

rge

dN

o P

rog

ress

Un

sta

ble

Itera

tion

Lim

it

% of systems

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

Set limit at ten.Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 18 / 59

Page 25: Making Static Pivoting Scalable and Dependable

Results: Iteration counts, double complex precision

Difficulty

# I

tera

tio

ns

5

10

15

20

25

30

5

10

15

20

25

30

5

10

15

20

25

30

5

10

15

20

25

30

nberr

202

102

202

302

402

502

60

colberr

202

102

202

302

402

502

60

cberr

202

102

202

302

402

502

60

ndx

202

102

202

302

402

502

60

cdx

202

102

202

302

402

502

60

Co

nve

rge

dN

o P

rog

ress

Un

sta

ble

Itera

tion

Lim

it

% of systems

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

Set limit at 15.Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 19 / 59

Page 26: Making Static Pivoting Scalable and Dependable

Static pivoting

If a pivot |A(j , j)| < T , perturb up to T by adding

sign(A(j , j)) · (T − |A(j , j)|).

Forcibly increases backward error, decreases element growth

In sparse systems, few updates should occur to an entry.

Large diagonal entries should remain large...

Thresholding heuristics

SuperLU γ · ‖A‖1column-relative γ ·max |A(:, j)|

diagonal-relative γ · |A(j , j)|γ = 2−26 ≈ √εw , 2−38, or 2−43 = 210εw

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 20 / 59

Page 27: Making Static Pivoting Scalable and Dependable

Sparse test systems

Matrices are from the UF Collection, chosen from existingcomparisons between SuperLU, MUMPS, and UMFPACK.

I Wide range of conditioning and numerical scaling

Compute “True” solutions using a doubled-double-extendedfactorization and quad-double-extended refinement with amodified TAUCS.

Refinement uses LAPACK-style numerical scaling throughout,but the test systems are generated in the matrix’s given scaling.

Also tested on singular systems; no solutions accepted.

At some point, plan on feeding the “true” solutions into the UF

Collection...

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 21 / 59

Page 28: Making Static Pivoting Scalable and Dependable

Sparse normwise conditioning

Difficulty

Perc

ent of popula

tion

0%

2%

4%

6%

8%

210

220

230

240

250

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 22 / 59

Page 29: Making Static Pivoting Scalable and Dependable

Sparse componentwise conditioning

Difficulty

Perc

ent of popula

tion

0%

2%

4%

6%

8%

220

230

240

250

260

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 23 / 59

Page 30: Making Static Pivoting Scalable and Dependable

Results: SuperLU perturbation heuristic

Before refinement, by max. perturbation amount

Difficulty

Err

or

/ sq

rt(m

ax r

ow

de

g.)

2−60

2−50

2−40

2−30

2−20

2−10

20

2−60

2−50

2−40

2−30

2−20

2−10

20

2−60

2−50

2−40

2−30

2−20

2−10

20

nberr

202

102

202

302

402

502

60

colberr

202

102

202

302

402

502

60

cberr

202

102

202

302

402

502

60

nferr

202

102

202

302

402

502

60

nferrx

202

102

202

302

402

502

60

cferr

202

102

202

302

402

502

60

cferrx

202

102

202

302

402

502

60

2^1

0 * e

ps

2^−

12

* sq

rt(ep

s)

sq

rt(ep

s)

% of systems

0.1%

0.3%

1.0%

3.2%

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 24 / 59

Page 31: Making Static Pivoting Scalable and Dependable

Results: Column-relative perturbation heuristic

Before refinement, by max. perturbation amount

Difficulty

Err

or

/ sq

rt(m

ax r

ow

de

g.)

2−60

2−50

2−40

2−30

2−20

2−10

20

2−60

2−50

2−40

2−30

2−20

2−10

20

2−60

2−50

2−40

2−30

2−20

2−10

20

nberr

202

102

202

302

402

502

60

colberr

202

102

202

302

402

502

60

cberr

202

102

202

302

402

502

60

nferr

202

102

202

302

402

502

60

nferrx

202

102

202

302

402

502

60

cferr

202

102

202

302

402

502

60

cferrx

202

102

202

302

402

502

60

2^1

0 * e

ps

2^−

12

* sq

rt(ep

s)

sq

rt(ep

s)

% of systems

0.1%

0.3%

1.0%

3.2%

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 25 / 59

Page 32: Making Static Pivoting Scalable and Dependable

Results: Diagonal-relative perturbation heuristic

Before refinement, by max. perturbation amount

Difficulty

Err

or

/ sq

rt(m

ax r

ow

de

g.)

2−60

2−50

2−40

2−30

2−20

2−10

20

2−60

2−50

2−40

2−30

2−20

2−10

20

2−60

2−50

2−40

2−30

2−20

2−10

20

nberr

202

102

202

302

402

502

60

colberr

202

102

202

302

402

502

60

cberr

202

102

202

302

402

502

60

nferr

202

102

202

302

402

502

60

nferrx

202

102

202

302

402

502

60

cferr

202

102

202

302

402

502

60

cferrx

202

102

202

302

402

502

60

2^1

0 * e

ps

2^−

12

* sq

rt(ep

s)

sq

rt(ep

s)

% of systems

0.1%

0.3%

1.0%

3.2%

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 26 / 59

Page 33: Making Static Pivoting Scalable and Dependable

Results: SuperLU perturbation heuristic

After refinement, with γ = 2−43 = 210εw

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 27 / 59

Page 34: Making Static Pivoting Scalable and Dependable

Results: Column-relative perturbation heuristic

After refinement, with γ = 2−43 = 210εw

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 28 / 59

Page 35: Making Static Pivoting Scalable and Dependable

Results: Diagonal-relative perturbation heuristic

After refinement, with γ = 2−43 = 210εw

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 29 / 59

Page 36: Making Static Pivoting Scalable and Dependable

results

Level and heuristic ResultTrust both Trust nwise Reject

2−43 = 210 · εfSuperLU 42.9% 8.0% 49.0%Column-relative 55.7% 5.7% 38.6%Diagonal-relative 55.8% 5.9% 38.3%

2−38 =≈ 2−12 · √εfSuperLU 36.6% 6.7% 56.6%Column-relative 52.4% 6.5% 41.2%Diagonal-relative 53.7% 7.2% 39.1%

2−26 ≈ √εfSuperLU 32.4% 4.0% 63.6%Column-relative 42.2% 4.2% 53.6%Diagonal-relative 47.4% 4.7% 47.9%

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 30 / 59

Page 37: Making Static Pivoting Scalable and Dependable

Sparse Matrix to Bipartite Graph to Pivots

Row 2

Row 3

Row 1

Row 4

Row 1

Row 2

Row 3

Row 4

Row 2

Row 3

Row 4

Row 1

Col 1Col 2Col 3Col 4Col 1Col 2Col 3Col 4

Col 4

Col 2

Col 3

Col 1

Bipartite modelEach row and column is a vertex.

Each explicit entry is an edge.

Want to chose “largest” entries for pivots.

Maximum weight complete bipartite matching:linear assignment problem

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 31 / 59

Page 38: Making Static Pivoting Scalable and Dependable

Mathematical Form

“Just” a linear optimization problem:

B n× n matrix of benefits in <∪ {−∞}, often c + log2 |A|X n × n permutation matrix: the matching

pr , πc dual variables, will be price and profit

1r , 1c unit entry vectors corresponding to rows, cols

Lin. assignment prob.

maximizeX∈<n×n

Tr BTX

subject to X 1c = 1r ,

XT1r = 1c , and

X ≥ 0.

Dual problem

minimizepr ,πc

1Tr pr + 1T

c πc

subject to pr1Tc + 1rπ

Tc ≥ B .

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 32 / 59

Page 39: Making Static Pivoting Scalable and Dependable

Mathematical Form

“Just” a linear optimization problem:

B n× n matrix of benefits in <∪ {−∞}, often c + log2 |A|X n × n permutation matrix: the matching

pr , πc dual variables, will be price and profit

1r , 1c unit entry vectors corresponding to rows, cols

Lin. assignment prob.

maximizeX∈<n×n

Tr BTX

subject to X 1c = 1r ,

XT1r = 1c , and

X ≥ 0.

Dual problemImplicit form:

minimizepr

1Tr pr

+∑j∈C

maxi∈R

(B(i , j)

− pr (j)).

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 32 / 59

Page 40: Making Static Pivoting Scalable and Dependable

Do We Need a Special Method?

The LAP:maximizeX∈<n×n

TrBTX

subject to X1c = 1r ,

XT1r = 1c , and

X ≥ 0.

Standard form:minx

cT x

subject to Ax = 1r+c , and

x ≥ 0.

A: 2n × τ vertex-edge matrix

Network optimization kills simplex methods.I (“Smoothed analysis” does not apply.)

Interior point algs need to round the solution.I (And need to solve Ax = b for a much larger A, although

theoretically great in NC.)

Combinatorial methods should be faster.I (But unpredictable!)

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 33 / 59

Page 41: Making Static Pivoting Scalable and Dependable

Properties from Optimization

Complementary slackness

X � (pr1Tc + 1rπ

Tc − B) = 0.

If (i , j) is in the matching (X (i , j) = 0), thenpr (i) + πc(j) = B(i , j).

Used to chose matching edges and modify dual variables incombinatorial algorithms.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 34 / 59

Page 42: Making Static Pivoting Scalable and Dependable

Properties from Optimization

Relaxed problemIntroduce a parameter µ, two interpretations:

from a barrier function related to X ≥ 0, or

from the auction algorithm (later).

ThenTr BTX∗ ≤ 1T

r pr + 1Tc πc ≤ Tr BTX∗ + (n − 1)µ,

or the computed dual value (and hence computed primal matching) iswithin (n − 1)µ of the optimal primal.

Very useful for finding approximately optimal matchings.

Feasibility boundStarting from zero prices:

pr (i) ≤ (n − 1)(µ+ finite range of B)

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 35 / 59

Page 43: Making Static Pivoting Scalable and Dependable

Algorithms for Solving the LAP

Goal: A parallel algorithm that justifies buying big machines.Acceptable: A distributed algorithm; matrix is on many nodes.

Choices:I Simplex or continuous / interior-point

F Plain simplex blows up, network simplex difficult to parallelize.F Rounding for interior point often falls back on matching.F (Optimal IP algorithm: Goldberg, Plotkin, Shmoys, Tardos.

Needs factorization.)I Augmenting-path based (Mc64: Duff and Koster)

F Based on depth- or breadth-first search.F Both are P-complete, inherently sequential (Greenlaw, Reif).

I Auctions (Bertsekas, et al.)F Only length-1 or -2 alternating paths; global sync for duals.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 36 / 59

Page 44: Making Static Pivoting Scalable and Dependable

Auction Algorithms

Discussion will be column-major.

General structure:1 Each unmatched column finds the “best” row, places a bid.

F The dual variable pr holds the prices.F The profit πc is implicit. (No significant FP errors!)F Each entry’s value: benefit B(i , j)− price p(i).F A bid maximally increases the price of the most valuable row.

2 Bids are reconciled.F Highest proposed price wins, forms a match.F Loser needs to re-bid.F Some versions need tie-breaking; here least column.

3 Repeat.F Eventually everyone will be matched, orF some price will be too high.

Seq. implementation in ∼40–50 lines, can compete with Mc64

Some corner cases to handle. . .

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 37 / 59

Page 45: Making Static Pivoting Scalable and Dependable

The Bid-Finding Loop

For each unmatched column:

value = entry − priceSave largest and second−largestBid price incr: diff. in values

Price

Row Index

Row Entry

Differences from sparse matrix-vector products

Not all columns, rows used every iteration. (sparse matrix,sparse vector)

Hence output price updates are scattered.

More local work per entry

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 38 / 59

Page 46: Making Static Pivoting Scalable and Dependable

The Bid-Finding Loop

For each unmatched column:

value = entry − priceSave largest and second−largestBid price incr: diff. in values

Price

Row Index

Row Entry

Little pointsIncrease bid price by µ to avoid loops

I Needs care in floating-point for small µ.

Single adjacent row →∞ priceI Affects feasibility test, computing dual

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 38 / 59

Page 47: Making Static Pivoting Scalable and Dependable

Termination

Once a row is matched, it stays matched.I A new bid may swap it to another column.I The matching (primal) increases monotonically.

Prices only increase.I The dual does not change when a row is newly matched.I But the dual may decrease when a row is taken.I The dual decreases monotonically.

Subtle part: If the dual doesn’t decrease. . .I It’s ok. Can show the new edge begins an augmenting path that

increases the matching or an alternating path that decreases thedual.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 39 / 59

Page 48: Making Static Pivoting Scalable and Dependable

Successive Approximation (µ-scaling)

Simple auctions aren’t really competitive with Mc64.

Start with a rough approximation (large µ) and refine.

Called ε-scaling in the literature, but µ-scaling is better.

Preserve the prices pr at each step, but clear the matching.

Note: Do not clear matches associated with ∞ prices!

Equivalent to finding diagonal scaling DrADc and matchingagain on the new B .

Problem: Performance strongly depends on initial scaling.

Also depends strongly on hidden parameters.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 40 / 59

Page 49: Making Static Pivoting Scalable and Dependable

Sequential performance: Auction v. MC64

Group Name Auction (s) MC64 (s) MC64Auction

Bai af23560 0.025 0.017 0.68FEMLAB poisson3Db 0.014 0.040 2.74

FIDAP ex11 0.060 0.015 0.26GHS indef cont-300 0.007 0.019 2.89GHS indef ncvxqp5 0.338 0.794 2.35

Hamm scircuit 0.048 0.024 0.50Hollinger g7jac200 0.355 0.817 2.30

Mallya lhr14 0.044 0.026 0.60Schenk IBMSDS 3D 51448 3D 0.031 0.010 0.33Schenk IBMSDS matrix 9 0.074 0.024 0.33

Schenk ISEI barrier2-4 0.291 0.044 0.15Vavasis av41092 5.462 3.595 0.66

Zhao Zhao2 1.041 3.237 3.11

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 41 / 59

Page 50: Making Static Pivoting Scalable and Dependable

Sequential performance: Highly variable

Group Name By col (s) By row (s) RowCol

Bai af23560 0.025 0.028 1.13FEMLAB poisson3Db 0.014 0.016 1.11

FIDAP ex11 0.060 0.060 1.00GHS indef cont-300 0.007 0.006 0.84GHS indef ncvxqp5 0.338 0.318 0.94

Hamm scircuit 0.048 0.047 0.99Hollinger g7jac200 0.355 0.339 0.95

Mallya lhr14 0.044 0.065 1.47Schenk IBMSDS 3D 51448 3D 0.031 0.282 9.22Schenk IBMSDS matrix 9 0.074 0.613 8.29

Schenk ISEI barrier2-4 0.291 0.193 0.66Vavasis av41092 5.462 4.083 0.75

Zhao Zhao2 1.041 0.609 0.58

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 42 / 59

Page 51: Making Static Pivoting Scalable and Dependable

Sequential performance: Highly variable

Group Name Float (s) Int (s) IntFloat

Bai af23560 0.025 0.040 1.61FEMLAB poisson3Db 0.015 0.016 1.08

FIDAP ex11 0.060 0.029 0.49GHS indef cont-300 0.007 0.006 0.91GHS indef ncvxqp5 0.338 0.425 1.26

Hamm scircuit 0.048 0.016 0.34Hollinger g7jac200 0.355 1.004 2.83

Mallya lhr14 0.044 0.050 1.12Schenk IBMSDS 3D 51448 3D 0.031 0.020 0.66Schenk IBMSDS matrix 9 0.074 0.066 0.89

Schenk ISEI barrier2-4 0.291 0.261 0.91Vavasis av41092 5.462 5.401 0.99

Zhao Zhao2 1.041 2.269 2.18

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 43 / 59

Page 52: Making Static Pivoting Scalable and Dependable

Approximately maximum matchings

Terminal µ valueName 0 5.96e-08 2.44e-04 5.00e-01

af23560 Primal 1342850 1342850 1342850 1342670Time(s) 0.14 0.05 0.03 0ratio 0.37 0.21 0.02

poisson3Db Primal 2483070 2483070 2483070 2483070Time(s) 0.02 0.02 0.02 0.02ratio 1.01 1.04 1.07

g7jac200 Primal 3533980 3533980 3533980 3533340Time(s) 2.98 1.07 0.28 0.18ratio 0.36 0.09 0.06

av41092 Primal 3156210 3156210 3156210 3155920Time(s) 24.51 8.09 2.48 0.11ratio 0.33 0.10 0.00

Zhao2 Primal 333891 333891 333891 333487Time(s) 7.69 2.37 3.65 0.02ratio 0.31 0.47 0.00

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 44 / 59

Page 53: Making Static Pivoting Scalable and Dependable

Setting / Lowering Parallel Expectations

Performance scalability?

Originally proposed (early 1990s) whencpu speed ≈ memory speed ≈ network speed ≈ slow.

Now:cpu speed � memory latency > network latency.

The number of communication phases dominates matchingalgorithms (auction and others).

Communication patterns are very irregular.

Latency and software overhead is not improving. . .

Scaled back goalIt suffices to not slow down much on distributed data.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 45 / 59

Page 54: Making Static Pivoting Scalable and Dependable

Basic Idea: Run Local Auctions, Treat as Bids

����������������������������

������

������

������������������

������������������

�������������������

�������������������

�������������������

�������������������

B⇒

P1 P2 P3

������������

������������

������������

������������

������

������

������������

������������ ����������

������������������

������������������

������������������

������������������

������������������

������������������

��������������������

��������������������

��������������������

��������������������

Slice the matrix into pieces, run local auctions.

The winning local bids are the slices’ bids.

Merge. . . (“And then a miracle occurs. . .”)

Need to keep some data in sync for termination.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 46 / 59

Page 55: Making Static Pivoting Scalable and Dependable

Basic Idea: Run Local Auctions, Treat as Bids

����������������������������

������

������

������������������

������������������

�������������������

�������������������

�������������������

�������������������

B⇒

P1 P2 P3

������������

������������

������������

������������

������

������

������������

������������ ����������

������������������

������������������

������������������

������������������

������������������

������������������

��������������������

��������������������

��������������������

��������������������

Practically memory scalable: Compact the local pieces.

Have not experimented with simple SMP version.I Sequential performance is limited by the memory system.

Note: Could be useful for multicore w/local memory.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 46 / 59

Page 56: Making Static Pivoting Scalable and Dependable

Speed-up?

Number of processors

Spe

ed−

up

10−3

10−2

10−1

100

101

102

103

104

5 10 15 20

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 47 / 59

Page 57: Making Static Pivoting Scalable and Dependable

Speed-up: A bit better measuring appropriately

Number of processors

Spe

ed−

up r

elat

ive

to r

educ

ing

to th

e ro

ot n

ode

10−3

10−2

10−1

100

101

102

103

104

5 10 15 20

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 48 / 59

Page 58: Making Static Pivoting Scalable and Dependable

Comparing distributed with reduce-to-root

Number of processors

Spe

ed−

up

10−3

10−2

10−1

100

101

102

103

104

●●

●●

●● ●

●●

●●

●●●

● ●

●●●●●

●●

●●

●●●● ●

●●●●

●● ●● ●●

2 3 4 8 12 16 24

To rootDist.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 49 / 59

Page 59: Making Static Pivoting Scalable and Dependable

Iteration order still matters

Number of Processors

Tim

e (

s)

10−1

100

101

102

av41092

l l

l

l

l

l

l

l

l

5 10 15 20

shyy161

l

l l

l

l

l

l

l

l

5 10 15 20

Direction

l Row−major

Col−major

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 50 / 59

Page 60: Making Static Pivoting Scalable and Dependable

Many different speed-up profiles

Number of Processors

Tim

e (

s) 10

−4

10−3

10−2

10−1

100

101

10−4

10−3

10−2

10−1

100

101

af23560

ll

l

l

l

l

l

l

l

garon2

l

l

l

l

l l

l

l

l

5 10 15 20

bmwcra_1

ll l

l

ll

l

ll

stomach

l ll

l

l lll

l

5 10 15 20

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 51 / 59

Page 61: Making Static Pivoting Scalable and Dependable

So what happens in some cases?

Matrix av41092 has one large strongly connected component.I (The square blocks in a Dulmage-Mendelsohn decomposition.)

The SCC spans all the processors.

Every edge in an SCC is a part of some complete matching.

Horrible performance from:I starting along a non-max-weight matching,I making it almost complete,I then an edge-by-edge search for nearby matchings,I requiring a communication phase almost per edge.

Conjecture: This type of performance land-mine will affect any0-1 combinatorial algorithm.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 52 / 59

Page 62: Making Static Pivoting Scalable and Dependable

Improvements?

Approximate matchings: Speeds up the sequential case,eliminating any “speed-up.”

Rearranging deck chairs: few-to-few communicationI Build a directory of which nodes share rows: collapsed BBT .I Send only to/from those neighbors.I Minor improvement over MPI Allgatherv for a huge effort.I Latency not a major factor...

Improving communication may not be worth it. . .I The real problem is the number of comm. phases.I If diagonal is the matching, everything is overhead.I Or if there’s a large SCC. . .

Another alternative: Multiple algorithms at once.I Run Bora Ucar’s alg. on one set of nodes, auction on another,

transposed auction on another, . . .I Requires some painful software engineering.

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 53 / 59

Page 63: Making Static Pivoting Scalable and Dependable

Latency not a dominating factor

Number of nodes x number of procs. per node

Sp

ee

d−

up

re

lative

to

re

du

cin

g t

o t

he

ro

ot

no

de

10−1

100

101

102

103

1x3 3x1 1x8 2x4

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 54 / 59

Page 64: Making Static Pivoting Scalable and Dependable

So, Could This Ever Be Parallel?

For a given matrix-processor layout, constructing a matrixrequiring O(n) communication is pretty easy for combinatorialalgorithms.

I Force almost every local action to be undone at every step.I Non-fractional combinatorial algorithms are too restricted.

Using less-restricted optimization methods is promising, but farslower sequentially.

I Existing algs (Goldberg, et al.) are PRAM with n3 processors.I General purpose methods: Cutting planes, successive SDPsI Someone clever might find a parallel rounding algorithm.I Solving the fractional LAP quickly would become a matter of

finding a magic preconditioner. . .I Maybe not a good thing for a direct method?

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 55 / 59

Page 65: Making Static Pivoting Scalable and Dependable

Review of contributions

Iterative refinementSuccessfully deliver dependable solutions with a little extraprecision.

Removed need for condition estimation.

Built methodology for evaluating Ax = b solution methods’accuracy and dependability.

Static pivotingTuned static pivoting heuristics to provide dependability.

Demonstrated that an approximate maximum weight bipartitematching is faster and just as dependable.

Developed a memory-scalable (although notperformance-scalable) distributed memory auction algorithm forstatic pivoting.Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 56 / 59

Page 66: Making Static Pivoting Scalable and Dependable

Future directions

Iterative refinementLeast-squares refinement demonstrated (Demmel, Hida, Li, &Riedy), but needs... refinement.

Perhaps refinement could render an iterative methoddependable. Could improve accuracy of Ady i = ri with extraiterations as i increases.

Could help build trust in new methods (e.g. CALU).

Distributed matchingInteresting software problem: Run multiple algorithms onportions of a parallel allotment. How do you signal the others toterminate?

Interesting algorithm problem: Is there an efficient roundingmethod for fractional / interior point algorithms?Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 57 / 59

Page 67: Making Static Pivoting Scalable and Dependable

Thank you!

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 58 / 59

Page 68: Making Static Pivoting Scalable and Dependable

Bounds

Backward error

‖D−1i ri‖∞ ≤ (c− ρ)−1 (3(nd + 1)εr + εx)

Here nd is an expression of size, c is the upper bound on per-iterationdecrease, and ρ is a safety factor for the region around 1/εw .

Forward error

‖D−1i ei‖∞ / 2(4 + ρ(nd + 1))εw · (c− ρ)−1

Assuming εr ≤ ε2w , εx ≤ ε2w . Using only one precision, εr = εx = εw ,

(c− ρ)‖D−1i ei‖∞ / 2(5 + 2(nd + 1) ccond(A, yi))εd .

Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 59 / 59