lecture 8: fast linear solvers (part 3)zxu2/acms60212-40212/lec-09-3.pdfcholesky factorization...

Post on 22-Jan-2021

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 8: Fast Linear Solvers (Part 3)

1

Cholesky Factorization

â€Ē Matrix ðī is symmetric if ðī = ðī𝑇.

â€Ē Matrix ðī is positive definite if for all 𝒙 ≠ 𝟎, 𝒙𝑇ð‘Ļ𝒙 > 0.

â€Ē A symmetric positive definite matrix ðī has Cholesky factorization ðī = ðŋðŋ𝑇, where ðŋ is a lower triangular matrix with positive diagonal entries.

â€Ē Example.

– ðī = ðī𝑇 with 𝑎𝑖𝑖 > 0 and 𝑎𝑖𝑖 > |𝑎𝑖𝑗|𝑗≠𝑖 is positive definite.

2

ðī =

𝑎11 𝑎12 â€Ķ 𝑎1𝑛𝑎12 𝑎22 â€Ķ 𝑎2𝑛â‹Ū

𝑎1𝑛

â‹Ū𝑎2𝑛

⋱â€Ķ

â‹Ū𝑎𝑛𝑛

=

𝑙11 0 â€Ķ 0𝑙21 𝑙22 â€Ķ 0â‹Ū𝑙𝑛1

â‹Ū𝑙𝑛2

⋱â€Ķ

â‹Ū𝑙𝑛𝑛

𝑙11 𝑙21 â€Ķ 𝑙𝑛10 𝑙22 â€Ķ 𝑙𝑛2â‹Ū0

â‹Ū0

⋱â€Ķ

â‹Ū𝑙𝑛𝑛

3

In 2 × 2 matrix size case 𝑎11 𝑎21𝑎21 𝑎22

=𝑙11 0𝑙21 𝑙22

𝑙11 𝑙210 𝑙22

𝑙11 = 𝑎11; 𝑙21 = 𝑎21/𝑙11; 𝑙22 = 𝑎22 − 𝑙212

4

Submatrix Cholesky Factorization Algorithm

5

for 𝑘 = 1 to 𝑛 𝑎𝑘𝑘 = 𝑎𝑘𝑘 for 𝑖 = 𝑘 + 1 to 𝑛 𝑎𝑖𝑘 = 𝑎𝑖𝑘/𝑎𝑘𝑘 end for 𝑗 = 𝑘 + 1 to 𝑛 // reduce submatrix for 𝑖 = 𝑗 to 𝑛 𝑎𝑖𝑗 = 𝑎𝑖𝑗 − 𝑎𝑖𝑘𝑎𝑗𝑘

end end end

Equivalent to 𝑎𝑖𝑘𝑎𝑘𝑗 due

to symmetry

6

Remark:

1. This is a variation of Gaussian Elimination algorithm.

2. Storage of matrix ðī is used to hold matrix ðŋ .

3. Only lower triangle of ðī is used (See 𝑎𝑖𝑗 = 𝑎𝑖𝑗 − 𝑎𝑖𝑘𝑎𝑗𝑘).

4. Pivoting is not needed for stability.

5. About 𝑛3/6 multiplications and about 𝑛3/6 additions are required.

Data Access Pattern

Read only

Read and write

Column Cholesky Factorization Algorithm

7

for 𝑗 = 1 to 𝑛 for 𝑘 = 1 to 𝑗 − 1 for 𝑖 = 𝑗 to 𝑛 𝑎𝑖𝑗 = 𝑎𝑖𝑗 − 𝑎𝑖𝑘𝑎𝑗𝑘

end end 𝑎𝑗𝑗 = 𝑎𝑗𝑗

for 𝑖 = 𝑗 + 1 to 𝑛 𝑎𝑖𝑗 = 𝑎𝑖𝑗/𝑎𝑗𝑗

end end

Data Access Pattern

8

Read only

Read and write

Parallel Algorithm

â€Ē Parallel algorithms are similar to those for LU factorization.

9

References â€Ē X. S. Li and J. W. Demmel, SuperLU_Dist: A scalable

distributed-memory sparse direct solver for unsymmetric linear systems, ACM Trans. Math. Software 29:110-140, 2003

â€Ē P. HÃĐnon, P. Ramet, and J. Roman, PaStiX: A high-performance parallel direct solver for sparse symmetric positive definite systems, Parallel Computing 28:301-321, 2002

QR Factorization and HouseHolder Transformation

Theorem Suppose that matrix ðī is an 𝑚 × 𝑛 matrix with linearly independent columns, then ðī can be factored as ðī = 𝑄𝑅

where 𝑄 is an 𝑚 × 𝑛 matrix with orthonormal columns and 𝑅 is an invertible 𝑛 × 𝑛 upper triangular matrix.

10

QR Factorization by Gram-Schmidt Process

Consider matrix ðī = [𝒂1 𝒂2 â€Ķ |𝒂𝑛]

Then,

𝒖1 = 𝒂1, 𝒒1 =𝒖1

||𝒖1||

𝒖2 = 𝒂2 − (𝒂2 ∙ 𝒒1)𝒒1, 𝒒2 =𝒖2

||𝒖2||

â€Ķ

𝒖𝑘 = 𝒂𝑘 − 𝒂𝑘 ∙ 𝒒1 𝒒1 −â‹Ŋ− 𝒂𝑘 ∙ 𝒒𝑘−1 𝒒𝑘−1, 𝒒𝑘 =𝒖𝑘

||𝒖𝑘||

ðī = 𝒂1 𝒂2 â€Ķ 𝒂𝑛

= 𝒒1 𝒒2 â€Ķ 𝒒𝑛

𝒂1 ∙ 𝒒1 𝒂2 ∙ 𝒒1 â€Ķ 𝒂𝑛 ∙ 𝒒1

0 𝒂2 ∙ 𝒒2 â€Ķ 𝒂𝑛 ∙ 𝒒2

â‹Ū â‹Ū ⋱ â‹Ū0 0 â€Ķ 𝒂𝑛 ∙ 𝒒𝑛

= 𝑄𝑅

11

Classical/Modified Gram-Schmidt Process

for j=1 to n

𝒖𝑗 = 𝒂𝑗

for i = 1 to j-1

𝑟𝑖𝑗 = 𝒒𝑖

∗𝒂𝑗 (ðķ𝑙𝑎𝑠𝑠𝑖𝑐𝑎𝑙)

𝑟𝑖𝑗 = 𝒒𝑖∗𝒖𝑗 (𝑀𝑜𝑑𝑖𝑓𝑖𝑒𝑑)

𝒖𝑗 = 𝒖𝑗 − 𝑟𝑖𝑗𝒒𝑖

endfor

𝑟𝑗𝑗 = ||𝒖𝑗||2

𝒒𝑗 = 𝒖𝑗/𝑟𝑗𝑗

endfor 12

Householder Transformation

Let ð‘Ģ ∈ 𝑅𝑛 be a nonzero vector, the 𝑛 × 𝑛 matrix

ðŧ = 𝐞 − 2𝒗𝒗𝑇

𝒗𝑇𝒗

is called a Householder transformation (or reflector).

â€Ē Alternatively, let 𝒖 = 𝒗/||𝒗||, ðŧ can be rewritten as

ðŧ = 𝐞 − 2𝒖𝒖𝑇.

Theorem. A Householder transformation ðŧ is symmetric and orthogonal, so ðŧ = ðŧ𝑇 = ðŧ−1.

13

1. Let vector 𝒛 be perpendicular to 𝒗.

𝐞 − 2𝒗𝒗𝑇

𝒗𝑇𝒗𝒛 = 𝒛 − 2

𝒗(𝒗𝑇𝒛)

𝒗𝑇𝒗= 𝒛

2. Let 𝒖 =𝒗

| 𝒗 |. Any vector 𝒙 can be written as

𝒙 = 𝒛 + 𝒖𝑇𝒙 𝒖. 𝐞 − 2𝒖𝒖𝑇 𝒙 = 𝐞 − 2𝒖𝒖𝑇 𝒛 + 𝒖𝑇𝒙 𝒖 = 𝒛 − 𝒖𝑇𝒙 𝒖

14

â€Ē Householder transformation is used to selectively zero out blocks of entries in vectors or columns of matrices.

â€Ē Householder is stable with respect to round-off error.

15

For a given a vector 𝒙, find a Householder

transformation, ðŧ, such that ðŧ𝒙 = 𝑎

10â‹Ū0

= 𝑎𝒆1

– Previous vector reflection (case 2) implies that vector 𝒖 is in parallel with 𝒙 − ðŧ𝒙.

– Let 𝒗 = 𝒙 − ðŧ𝒙 = 𝒙 − 𝑎𝒆1, where 𝑎 = Âą| 𝒙 | (when the arithmetic is exact, sign does not matter).

– 𝒖 = 𝒗/|| 𝒗||

– In the presence of round-off error, use

𝒗 = 𝒙 + 𝑠𝑖𝑔𝑛 ð‘Ĩ1 | 𝒙 |𝒆1

to avoid catastrophic cancellation. Here ð‘Ĩ1 is the first entry in the vector 𝒙.

Thus in practice, 𝑎 = −𝑠𝑖𝑔𝑛 ð‘Ĩ1 | 𝒙 | 16

Algorithm for Computing the Householder Transformation

17

ð‘Ĩ𝑚 = max { ð‘Ĩ1 , ð‘Ĩ2 , â€Ķ , ð‘Ĩ𝑛 } for 𝑘 = 1 to 𝑛 ð‘Ģ𝑘 = ð‘Ĩ𝑘/ð‘Ĩ𝑚 end

𝛞 = 𝑠𝑖𝑔𝑛(ð‘Ģ1)[ð‘Ģ12 + ð‘Ģ2

2 +â‹Ŋ+ ð‘Ģ𝑛2]1/2

ð‘Ģ1 = ð‘Ģ1 + 𝛞 𝛞 = −𝛞ð‘Ĩ𝑚 𝒖 = 𝒗/| 𝒗 |

Remark: 1. The computational complexity is 𝑂(𝑛). 2. ðŧ𝒙 is an 𝑂 𝑛 operation compared to 𝑂(𝑛2) for a general matrix-vector product. ðŧ𝒙 = 𝒙 − 𝟐𝒖(𝒖ð‘ŧ𝒙).

QR Factorization by HouseHolder Transformation

Key idea:

Apply Householder transformation successively to columns of matrix to zero out sub-diagonal entries of each column.

18

â€Ē Stage 1 – Consider the first column of matrix ðī and determine a

HouseHolder matrix ðŧ1 so that ðŧ1

𝑎11𝑎21â‹Ū

𝑎𝑛1

=

𝛞10â‹Ū0

– Overwrite ðī by ðī1, where

ðī1 =

𝛞1 𝑎12∗ â‹Ŋ 𝑎1𝑛

∗

0 𝑎22∗ â‹Ū 𝑎2𝑛

∗

â‹Ū â‹Ū â‹Ū â‹Ū0 𝑎𝑛2

∗ â‹Ŋ 𝑎𝑛𝑛∗

= ðŧ1ðī

Denote vector which defines ðŧ1 vector 𝒗1. 𝒖1 = 𝒗1/| 𝒗1 |

Remark: Column vectors

𝑎12∗

𝑎22∗

â‹Ū𝑎𝑛2∗

â€Ķ

𝑎1𝑛∗

𝑎2𝑛∗

â‹Ū𝑎𝑛𝑛∗

should be computed by

ðŧ𝒙 = 𝒙 − 𝟐𝒖(𝒖ð‘ŧ𝒙). 19

â€Ē Stage 2 – Consider the second column of the updated matrix

ðī ≡ ðī1 = ðŧ1ðī and take the part below the diagonal to determine a Householder matrix ðŧ2

∗ so that

ðŧ2∗

𝑎22∗

𝑎32∗

â‹Ū𝑎𝑛2∗

=

𝛞2

0â‹Ū0

Remark: size of ðŧ2∗ is (𝑛 − 1) × (𝑛 − 1).

Denote vector which defines ðŧ2∗ vector 𝒗2. 𝒖2 = 𝒗2/| 𝒗2 |

– Inflate ðŧ2∗ to ðŧ2 where

ðŧ2 =1 00 ðŧ2

∗

Overwrite ðī by ðī2 ≡ ðŧ2ðī1

20

â€Ē Stage k

– Consider the ð‘˜ð‘Ąâ„Ž column of the updated matrix ðī and take the part below the diagonal to determine a Householder matrix ðŧ𝑘

∗ so that

ðŧ𝑘∗

𝑎𝑘𝑘∗

𝑎𝑘+1,𝑘∗

â‹Ū𝑎𝑛,𝑛∗

=

𝛞𝑘

0â‹Ū0

Remark: size of ðŧ𝑘∗ is (𝑛 − 𝑘 + 1) × (𝑛 − 𝑘 + 1).

Denote vector which defines ðŧ𝑘∗ vector 𝒗𝑘. 𝒖𝑘 = 𝒗𝑘/| 𝒗𝑘 |

– Inflate ðŧ𝑘∗ to ðŧ𝑘 where

ðŧ𝑘 =𝐞𝑘−1 0

0 ðŧ𝑘∗ , 𝐞𝑘−1 is (𝑘 − 1) × (𝑘 − 1) identity matrix.

Overwrite ðī by ðī𝑘 ≡ ðŧ𝑘ðī𝑘−1

21

â€Ē After (𝑛 − 1) stages, matrix ðī is transformed into an upper triangular matrix 𝑅.

â€Ē 𝑅 = ðī𝑛−1 = ðŧ𝑛−1ðī𝑛−2 = â‹Ŋ =ðŧ𝑛−1ðŧ𝑛−2â€Ķðŧ1ðī

â€Ē Set 𝑄𝑇 = ðŧ𝑛−1ðŧ𝑛−2â€Ķðŧ1. Then 𝑄−1 = 𝑄𝑇. Thus 𝑄 = ðŧ1

𝑇ðŧ2𝑇 â€Ķðŧ𝑛−1

𝑇

â€Ē ðī = 𝑄𝑅

22

Example. ðī =1 11 21 3

.

Find ðŧ1 such that ðŧ1

111

=𝛞100

.

By 𝑎 = −𝑠𝑖𝑔𝑛 ð‘Ĩ1 | 𝒙 |,

𝛞1 = − 3 = −1.721.

ð‘Ē1 = 0.8881, ð‘Ē2 = ð‘Ē3 = 0.3250.

By ðŧ𝒙 = 𝒙 − 𝟐𝒖 𝒖ð‘ŧ𝒙 , ðŧ1

123

=−3.46410.36611.3661

ðŧ1ðī =−1.721 −3.4641

0 0.36610 1.3661

23

Find ðŧ2∗ and ðŧ2, where

ðŧ2∗ 0.36611.3661

=𝛞2

0

So 𝛞2 = −1.4143, 𝒖2 =0.79220.6096

So 𝑅 = ðŧ2ðŧ1ðī =−1.7321 −3.4641

0 −1.41430 0

𝑄 = (ðŧ2ðŧ1)𝑇= ðŧ1

𝑇ðŧ2𝑇

=−0.5774 0.7071 −0.4082−0.5774 0 0.8165−0.5774 −0.7071 −0.4082

24

ðī𝒙 = 𝒃 ⇒ 𝑄𝑅𝒙 = 𝒃

⇒ 𝑄𝑇𝑄𝑅𝒙 = 𝑄𝑇𝒃 ⇒ 𝑅𝒙 = 𝑄𝑇𝒃

Remark: Q rarely needs explicit construction

25

Parallel Householder QR

â€Ē Householder QR factorization is similar to Gaussian elimination for LU factorization.

– Additions + multiplications: 𝑂4

3𝑛3 for QR versus

𝑂2

3𝑛3 for LU

â€Ē Forming Householder vector 𝒗𝑘is similar to computing multipliers in Gaussian elimination.

â€Ē Subsequent updating of remaining unreduced portion of matrix is similar to Gaussian elimination.

â€Ē Parallel Householder QR is similar to parallel LU. Householder vectors need to broadcast among columns of matrix.

26

References

â€Ē M. Cosnard, J. M. Muller, and Y. Robert. Parallel QR decomposition of a rectangular matrix, Numer. Math. 48:239-249, 1986

â€Ē A. Pothen and P. Raghavan. Distributed orthogonal factorization: Givens and Householder algorithms, SIAM J. Sci. Stat. Comput. 10:1113-1134, 1989

27

top related