advances in high accuracy matrix computations · 2019. 6. 4. · advances in high accuracy matrix...

Advances in high accuracy matrix computations

Zlatko DrmacDepartment of Mathematics, University of Zagreb, Croatia

Advances in Numerical Linear Algebra:Celebrating the Centenary of the Birth of James H. Wilkinson,

May 29-30, 2019

Manchester 2019

Drmac Advances in NLA Manchester 19 1 / 35

Overview

1 Symmetric Eigenvalue Problem: Backward stability and forward errorCommutative diagram of backward stabilityAn exampleSymmetric definite case – theory and algorithmsEin leiĚtes Verfahren

2 Accurate SVD for general matrices

3 QR factorization with column pivoting


Forward error? Distance to what?!?

X • • Y = F (X + δX)@@@@R�

��

-

backwarderror

exactly

computed• Y = F (X)

X + δX • ‖δX‖ ≤ ε‖X‖

Backward stability: solve exactly with another input, close to XNot preserved under composition of mappings−→ ‖δX‖ ≤ ε‖X‖, ‖δX(:, i)‖ ≤ ε‖X(:, i)‖−→ |δXij | ≤ ε|Xij |, |δXij | ≤ ε

√|XiiXjj |

−→ X + δX same structure as X (sparsity, symmetry, ....)Perturbation theory: ‖Y − Y ‖ ≤ K · ‖δX‖Von Neumann, Turing, Givens, WilkinsonK. O. Friedrichs: Applied mathematics consists in solving exact problemsapproximately and approximate problems exactly.


Backward stability: eig() on real symmetric matrices

Example (Real Symmetric EigenValue Problem is perfectlyconditioned: Hui = λiui, H = HT = UΛUT , Λ = diag(λi)

ni=1 )

? eigenvalues real, eigenvectors orthogonal? algorithms use orthogonal transformations? Weyl: If H H + δH, then λ λ+ δλ, with maxλ |δλ| ≤ ‖δH‖2

· · ·UTk · · · (UT2 (UT1 H U1)U2) · · ·Uk · · ·︸︷︷︸U

−→ Λ

Computed (finite prec., O(n3) flops) U ≈ U , Λ ≈ Λ.Backward stability:

UT (H + δH)U ≈ Λ,‖δH‖F‖H‖F

≤ ε ≈ f(n) · eps small.


Now take symmetric H. What is the spectrum of H?

H =

1040 1029 1019

1029 1020 109

1019 109 1

; use MATLAB, eps ≈ 2.22 · 10−16

eig(H) =

λ1 = 1.000000000000000e+ 040 > 0λ2 = −8.100009764062724e+ 019 < 0λ3 = −3.966787845610502e+ 023 < 0

Cholesky factorization: L = chol(H)′. ‖LLT −H‖2 ≤ O(eps)‖H‖2.

L=

1.0000000e+20 0 0

9.9999999e+8 9.9498743e+9 0

9.9999999e-2 9.0453403e-2 9.9086738e-1

Both eig() and chol() are backward stable.Is H positive definite? What is the spectrum of H?


What is the spectrum of H now?

I eig(H) I eig(PTHP ), P ' (2, 1, 3)

λ1 1.000000000000000e+40 1.000000000000000e+40

λ2 -8.100009764062724e+19 9.900000000000000e+19

λ3 -3.966787845610502e+23 9.818181818181818e-01

♣ 1./eig(inv(H)) ♣ eig(inv(inv(H)))

λ1 1.000000000000000e+40 1.000000000000000e+40

λ2 9.900000000000000e+19 9.900000000000000e+19

λ3 9.818181818181817e-01 9.818181818181817e-01

I eig(H + E1) I eig(H + E2)

λ1 1.000000000000000e+40 1.000000000000000e+40

λ2 -8.100009764062724e+19 1.208844819952007e+24

λ3 -3.966787845610502e+23 9.899993299416013e-01

E1: H22 = 1020 −→ −1020, E2: H13, H31 −→ H13 ∗ (1 + eps),

I backward stable/acceptable !?Drmac Advances in NLA Manchester 19 6 / 35

Forward error

H Λ, U ; H ≈ U ΛU∗, |δλi| ≤ ‖δH‖2

H + δH, ‖δH‖2 ≤ ε‖H‖2, ε small

@@@@R�

��

��*-

backwarderror

exact computation

floatingpoint

Bad news for small λi’s:

|δλi||λi|

≤ ‖δH‖2|λi|

≤ ε‖H‖2|λi|

, maxi

∣∣∣∣δλiλi∣∣∣∣ ≤ κ2(H)

‖δH‖2‖H‖2

,

where κ2(H) = ‖H‖2‖H−1‖2.

Want better accuracy for better inputs.


Error in the eigenvalues

Let H = LLT � 0 and LLT = H + δH � 0,Compare the eigenvalues of H and H = H + δH = LLT :

H = LLT is similar to LTL, H ∼ LTL ≡M .

Let Y =√I + L−1δHL−T . Then

H + δH = L(I + L−1δHL−T )LT = LY Y TLT ∼ Y TLTLY.

Compare λi(LTL)=λi(H) and λi(Y

TLTLY )=λi(H + δH).

Ostrowski: M = Y TMY , then, for all i, λi(M) = λi(M)ξi, whereλmin(Y TY ) ≤ ξi ≤ λmax(Y TY ). Here Y TY=I+L−1δHL−T .

Hence |λi(H)− λi(H)| ≤ λi(H)‖L−1δHL−T ‖2 ,

‖L−1δHL−T ‖2 = ‖L−1D(D−1δHD−1)DL−T ‖2=‖L−1D(δHs)DL−T ‖2

≤ ‖L−1D‖22‖δHs‖2 = ‖DL−TL−1D‖2‖δHs‖2= ‖(D−1HD−1)−1‖2‖δHs‖2 = ‖H−1

s ‖2‖δHs‖2

D = diag(√Hii), (Hs)ij = Hij/

√HiiHjj , (Hs)ii = 1.


Error in the eigenvalues

Theorem

Let Hs = (Hij/√HiiHjj) and δHs = (δHij/

√HiiHjj). Then

maxi

∣∣∣∣λi(H + δH)− λi(H)

λi(H)

∣∣∣∣ ≤ ‖H−1s ‖2

∥∥∥∥∥[

δHij√HiiHjj

]∥∥∥∥∥2

I Compare with maxi

∣∣∣∣λi(H + δH)− λi(H)

λi(H)

∣∣∣∣ ≤ κ2(H)‖δH‖2‖H‖2

J

Van der Sluis: ‖H−1s ‖2 ≤ κ2(Hs) ≤ nminD=diag κ2(DHD).

Our 3× 3 example: H = DHsD, D = diag(1020, 1010, 1), κ2(H) > 10401040 1029 1019

1029 1020 109

1019 109 1

=DHsD=D

1 0.1 0.10.1 1 0.10.1 0.1 1

D,

κ2(Hs) < 1.4, ‖H−1s ‖2 < 1.2. Need an algorithm with backward error δH

with small maxi,j |δHij |/√HiiHjj .


Jacobi, 1844, 1846

Ein leiĚtes Verfahren ... a simple procedure ...H = HT , H(k+1) = UTk H

(k)Uk −→ Λ = diag(λi) (k −→∞) Each Ukannihilates (pk, qk), (qk, pk) positions in H(k).

· · ·UT3 UT2 UT1

• • • •• • • •• • • •• • • •

U1U2U3· · ·=

• ~ ⊗ 0~ • F •⊗ F • •0 • • •

U1 =

(cosψ1 sinψ1

− sinψ1 cosψ1

)⊕In−2, U2 = · · ·

Jacobi rotation cot 2ψk =H

(k)qkqk −H

(k)pkpk

2H(k)pkqk

,

tanψk =sign(cot 2ψk)

| cot 2ψk|+√

1 + cot2 2ψk∈ (−π

4,π

4],

(p, q) = P(k) pivot strategy, P : N→ {(i, j) : i < j}Drmac Advances in NLA Manchester 19 10 / 35

Convergent strategies, superior accuracy

Jacobi: |h(k)pq | = maxi 6=j |h

(k)ij |, P(k) = (p, q).

Reading Jacobi’s 1846. paper recommended.Cyclic: P periodic, one full period called sweep.Row–cyclic and column–cyclic:

( • 1→ 2→ 3• • 4→ 5• • • 6• • • •

),

( • 1→ 2→ 3• • 4→ 5• • • 6• • • •

), . . .;

(• 1↓ 2↓ 4↓• • 3↓ 5↓• • • 6↓• • • •

),

(• 1↓ 2↓ 4↓• • 3↓ 5↓• • • 6↓• • • •

), . . .

Off(H(k)) =

√∑i 6=j

(H(k))2ij −→ 0 (k −→∞)

H(k) −→ Λ, U1 · · ·Uk · · · −→ U as (k −→∞); UTHU = ΛAsymptotically quadratic reduction of Off(H(k)); cubic per quasi-cycle.Forsythe, Henrici, Wilkinson, Rutishauser, Hari, VeselicDemmel and Veselic: Jacobi method is more accurate than QR.


One–sided Jacobi SVD of A ∈ Rm×n

Implicitly diagonalize H = H(0) = ATA; A ≡ A0.H(1) = V T

0 H(0)V0 = V T

0 AT (AV0) = AT1 A1

H(k+1) =V Tk H

(k)Vk =ATk+1Ak+1−→Λ=diag(λi)

↔ Ak+1 =AkVk , where H(k) = ATkAk, and Vk uses Jacobi rotation to

diagonalize

(h

(k)pp hkpqh

(k)qp h

(k)qq

) h(k)pp = ‖Ak(1 : m, p)‖2

h(k)qq = ‖Ak(1 : m, q)‖2

h(k)pq = Ak(1 : m, p)TAk(1 : m, q)

h(k)pp , h

(k)qq scalar update; h

(k)pq BLAS1 SDOT

Block-oriented version provably convergent, BLAS3 based.Ak−→UΣ, Σ=diag(

√λi), UTU=I

V1 · · ·Vk · · · −→ V , V TV = I, AV = UΣ, A = UΣV T the SVD of A.


Jacobi method for +definite Hx = λx (Drmac, Veselic)

Let H = LLT , L Cholesky factor. Apply Jacobi SVD to L,LV = UΣ, where V is the product of Jacobi rotations, thenH = UΣ2UT .

Backward error LLT =H + δH, maxi,j

|δHij |√HiiHjj

≤ nε

One sided Jacobi SVD LV1V2 · · ·Vk · · ·V` → U Σ. In floating point

• L← (((L1 + δL1)V1 + δL2)V2 + δL3)V3 + · · ·Each row of δLi is ε small relative to the corresponding row of Li.The Vj with j > i do not change ‖δLi(k, :)‖2, k = 1, . . . , n. Thisholds for any block transformation and any parallel strategy.

At convergence, U Σ = (L+ δL)V , with Σ = diag(σi),‖δL(i, :)‖ ≤ O(n)ε‖L(i, :)‖ for all i. Now, Cauchy-Schwarz =⇒λi = σ2

i are the eigenvalues of H + ∆H = (L+ δL)(L+ δL)T ,

maxi,j

|∆Hij |√HiiHjj

≤ O(n)ε, maxi

∣∣∣ λi−λiλi

∣∣∣ ≤ ‖H−1s ‖2

∥∥∥∥[ δHij√HiiHjj

]∥∥∥∥2

.


Jacobi SVD ( Z. D., K. Veselic 2008., 2015. )

(ΠA)P =Q

(R0

); Rank Revealing Decomposition (RRD)

R(1 : ρ, 1 : n)∗ = Q1

(R1

0

); ρ = rank(R),

X = R∗1 =

(� 0� �

); XTX − ξI quasi–definite; entropy based decisions

X∞ ≡ UxΣ = X〈J1J2 · · · J∞〉︸︷︷︸Vx

Jacobi rotations

Vx = R−∗1 (X∞)

U = ΠTQ

(Ux 00 Im−ρ

); V = PQ1

(Vx 00 In−ρ

)if ρ = n, Q1Vx = R−1X∞

Delivers provably accurate SVD if A can be written as A = BD with somediagonal D and well conditioned B. If A = D1CD2 with D1, D2 diagonaland C well conditioned, the results are also accurate but theoreticalbounds are lacking. κ2(D), κ2(D1), κ2(D2) irrelevant. LAPACK 3.2.


backward error ←→ perturbation theory

Let rank(A) = n ≤ m, D = diag(‖A(:, i)‖2), and

A 7→ A+ δA = (I + δAA†)A =⇒ σj 7→ σj + δσj .

maxj

|σj − σj |σj

≤ ‖δAA†‖2, ‖δAA†‖2 ≤

‖δA‖2‖A‖2

(‖A†‖2‖A‖2) = ε · κ(A),

‖δAD−1‖2‖(AD−1)†‖2.

‖δAD−1‖2 ≤√nmaxj

‖δA(:,j)‖2‖A(:,j)‖2 ; column-wise small backward error

‖(AD−1)†‖2 ≡ ‖B†‖2 ≤ κ2(B) ≤√nmin∆=diag κ(A∆)

Possible: ‖B†‖2 � κ(A); always ‖B†‖2 ≤√nκ(A).

Jacobi SVD: maxj‖δA(:,j)‖2‖A(:,j)‖2 ≤ ε −→ ‖B

†‖2 −→ more accurate .

bidiagonal SVD: ‖δA‖2‖A‖2 ≤ ε −→ κ(A) −→ less accurate ,

bidiagonalization provokes κ(A).

Demmel and Veselic: Jacobi’s method is more accurate than QR. (1992.)


Error in the σ’s: accurate method. A = BD


Error in the σ’s: LAPACK. A = BD


An extreme case: Hankel matrix, κ2(H) ≈ 0.28 · 10615

0 5 10 15 20 25 30 35 40

10−300

10−200

10−100

100

10100

10200

10300

σj by svd_640

σj by SVD_VTDV

0

Rel. differ. (o−x)/sqrt(ox)n*eps

Figure: The singular values of the 39× 39 product H = VTDV, computed in 16digit arithmetic and the reference values computed in 640 digit arithmetic. Theextreme singular values were σ1 u 1.659563214356268e+306,σ39 u 5.752792768736278e− 309. The maximal measured relative error over allsingular values was 8.632997535220512e− 013.


(U,Σ, V ) = xGESVDQ(A) (A ∈ Cm×n, m ≥ n)

New algorithm; software to appear in LAPACK. (Drmac 2017., 2018.)

1: (ΠrA)Πc = Q

(R0

){Initial row sorting in order of decreasing `∞

norm, and QR factorization with pivoting, e.g. xGEP3, or fullpivoting.}

2: Determine the numerical rank ρ of R and set Rρ = R(1 : ρ, 1 : n).3: if ρ = n and condition estimate needed then4: κ ≈ ‖(Rρdiag(1/‖Rρ(:, i)‖2))−1‖2 {Use e.g. xPOCON and adjust to

the norm ‖ · ‖2.}5: end if6: Compute the SVD Rρ = U

(Σ 0ρ,n−ρ

)V ∗. {Use xGESVD}

7: The SVD of A is A =[ΠTr Q(U 00 Im−ρ

)](Σ 0ρ,n−ρ

0m−ρ,ρ 0m−ρ,n−ρ

)(ΠcV )∗

xGESVDQ – Nearly the same accuracy as Jacobi SVD. Not proved (asyet) but experimentally confirmed. Related work by Jesse Barlow.


Error in the σ’s. Dimension: 1000× 700.

η ≡ maxi=1:n

|σi − σi|σi

≤ g(m,n)εκscaled(A). (1)


Error in the right sing. vectors. Dimension: 1000× 700.

‖vi − vi(v∗i vi)‖2 ≤O(‖∆AA†‖2)

gapi, gapi = min{2,min

j 6=i

|σi − σj |σi

}. (2)


RRD: QRCP with Businger–Golub pivoting

A︸︷︷︸m×n

permutation︷︸︸︷P = Q

(R0

), R =

� � � � � �0 � � � � �0 0 � • � �0 0 0 • � �0 0 0 0 � �0 0 0 0 0 �

Q∗Q = Im.

|Rii| ≥

√√√√ j∑k=i

|Rkj |2, for all 1 ≤ i ≤ j ≤ n. (1)

|R11| ≥ |R22| ≥· · ·≥ |Rρρ| � |Rρ+1,ρ+1| ≥· · ·≥ |Rnn| (2)

The structure (1), (2) may not be rank revealing but it must beguaranteed by the software (e.g. LAPACK, Matlab). Implemented inLINPACK in 1971., adopted by (Sca)LAPACK and used in many packages.


Backward error analysis: QR factorization

QTpQTp−1 · · ·QT2 QT1 A =

(R0

), Q = Q1Q2 · · ·Qp.

A(1) = QT1 A =

( ? ? ?0 × ×0 × ×0 × ×

), A(2) = QT2 A

(1) =

( ? ? ?0 ? ?0 0 ×0 0 ×

),

A(3) = QT3 A(2) =

(? ? ?0 ? ?0 0 ?0 0 0

)≡ A(p) =

(R0

),

A(0) + δA(0)��

QT1

QT1

A(1) + δA(1)��

QT2

QT2

· · · A(p−1) + δA(p−1)��

QTp

QTpA = A(0)

?

- A(1) -

?

· · · - A(p−1) -

?

A(p)

A(i) = QTi ∗A(i−1) = QTi (A(i−1) + δA(i−1)), ‖Qi − Qi‖2 = O(ε)


Analysis .... tedious, technical

Reading the diagram backward, we have

A(p) = QTp (QT

p−1(A(p−2) + δA(p−2)) + δA(p−1))

= QTp Q

Tp−1(A(p−2) + δA(p−2) + Qp−1δA

(p−1))

= QTp Q

Tp−1(QT

p−2(A(p−3) + δA(p−3)) + δA(p−2) + Qp−1δA(p−1))

= QTp · · · QT

1︸︷︷︸QT

(A+ δA(0) + Q1δA(1) + · · ·+ Q1 · · · Qp−1δA

(p−1)︸︷︷︸δA

),

and the (by construction upper triangular) A(p) satisfies

A(p) =

(R0

)= QT (A+ δA), (3)

where (note that ‖A(k)‖F ≤ (1 +O(ε))k‖A‖F )

‖δA‖F ≤p−1∑k=0

‖δA(k)‖F ≤ [(1 +O(ε))p − 1]‖A‖F . (4)


Backward error analysis: QRF, tedious, technical

‖δA(:, j)‖2 ≤p−1∑k=0

‖δA(k)(:, j)‖2 ≤ [(1 +O(ε))p − 1]︸︷︷︸≡ ζ

‖A(:, j)‖2.

In case of pivoting, set A := AP .

F |Rii| ≥

√√√√ j∑k=i

|Rkj |2, for all 1 ≤ i ≤ j ≤ n.

F |R11| ≥ |R22| ≥ · · · ≥ |Rnn|


Examples of failure of F

0 50 100 150 200 250 30010

−18

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

|Rii|, maxj≥i

√∑jk=i |Rkj |2


Examples of failure of F

50 100 150 200 250 300

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

|Rii|, maxj≥i

√∑jk=i |Rkj |2


Consequences

‖Ax− d‖2 → min; x = A\dWarning: Rank deficient, rank = 304 tol = 1.0994e-012.

0 100 200 300 400 500 60010

−20

10−15

10−10

10−5

100

105

rank(A,1.0994e-12) returns 466Drmac Advances in NLA Manchester 19 28 / 35

Consequences ... discussion

Any routine based on xQRDC (LINPACK) or xGEQPF, xGEQP3(LAPACK) in the period 1971.–2007. can catastrophically fail.

xGEQPX (TOMS ] 782, rank revealing QRF)

xGELSX and xGELSY in LAPACK (‖Ax− b‖2 → min)

xGGSVP in LAPACK (GSVD of (A,B))

UTAQ =

0 A12 A13

0 0 A23

0 0 0

, V TBQ =

(0 0 B13

0 0 0

).

60 out of 470 subroutines in the SLICOT library (2010.)

... and many others ... long list. Need a new xGEQP3.


L(IN+A)PACK update, xGEQPF, xGEQP3

DO 30 J = I+1, NIF ( WORK( J ).NE.ZERO ) THEN

TEMP = ONE - ( ABS( A( I, J ) ) / WORK( J ) )**2TEMP = MAX( TEMP, ZERO )TEMP2 = ONE + 0.05*TEMP*( WORK( J ) / WORK( N+J ) )**2WRITE(*,*) TEMP2IF( TEMP2.EQ.ONE ) THEN

IF( M-I.GT.0 ) THENWORK( J ) = SNRM2( M-I, A( I+1, J ), 1 )WORK( N+J ) = WORK( J )

ELSEWORK( J ) = ZEROWORK( N+J ) = ZERO

END IFELSE

WORK( J ) = WORK( J )*SQRT( TEMP )END IF

END IF30 CONTINUE ...

A strategically placed WRITE(*,*) statement may change the computednumerical rank substantially (!!) and thus completely change LS solution,computed properties of a dynamical system (e.g. staircase form, SLICOT).Numerical catastrophes in mission critical applications! Detailed analysisand solution by Z.D. and Z. Bujanovic, ACM TOMS 2006.New code included in LAPACK 3.1., SLICOT 2010.


WRITE(*,*) can change the staircase form (SLICOT)

0

50

100 0

20

40

60

80

100

−30

−20

−10

0

10

20

0

20

40

60

80

100 0

20

40

60

80

100

−20

−10

0

10

20

MB03OY MB03OY with WRITE

r=49

r=82

Figure: Left: The matrix R computed by MB03OY, shown bymeshz(log10(abs(R))). The computed numerical rank is r = 49. Right: Thematrix R computed with MB03OY, with "WRITE(*,*) TEMP2" statement addedafter the line 339 in MB03OY.f. The computed numerical rank is r = 82.


Solution

The computation is backward (mixed) stable by the usual standard.After frustrating experiments we found that

The problem appears at certain distance to singularity.

In such cases it is easy to overestimate the distance to trouble.

Bad results, if found, attributed to singularity and not investigated.

Complicated error analysis shows sensitivity that depends even onhow long certain variables are sitting in the processor’s long registers.Changing compiler options (debugger, optimizer, ...) changes theoutput completely. Inserting WRITE statement changes the rank ofthe matrix!?!

Our solution (2006, Z.D. and Z. Bujanovic): new formulas that can beimplemented provably safely. New software since LAPACK 3.1.The problem still present in libraries that contain the problematic columndown-dating, e.g. ScaLAPACK.


WRITE(*,*) can change the numerical rank in PxGEQPF

Figure: SGEQPF from ScaLAPACK. The problem of failure of rank revealingpivoting still (2019) present in ScaLAPACK. The fix has been available since 2006.


Messages

Ill–conditioning (large ‖H‖2‖H−1‖2) can be artificial, an artifact of aparticular algorithm (”a bull in a china shop”; some state of the artmethods are such), and not the matrix itself.Backward stability is often used to justify the result. Structuredbackward error can yield better results. In many cases the error δH isof much finer structure than only small ‖δH‖2/‖H‖2.Using only orthogonal transformations does not automaticallyguarantee good results.

W. Kahan:

The success of Backward Error-Analysis at explaining floating-pointerrors has been mistaken by many an Old Hand as an excuse to doand expect no better.

J. Wilkinson:

For me, then, the primary purpose of the rounding error analysiswas insight.


advances in high accuracy matrix computations · 2019. 6. 4. · advances in high accuracy matrix...

Documents