lecture 8: fast linear solvers (part 3)zxu2/acms60212-40212/lec-09-3.pdfcholesky factorization...
Post on 22-Jan-2021
7 Views
Preview:
TRANSCRIPT
Lecture 8: Fast Linear Solvers (Part 3)
1
Cholesky Factorization
âĒ Matrix ðī is symmetric if ðī = ðīð.
âĒ Matrix ðī is positive definite if for all ð â ð, ðððĻð > 0.
âĒ A symmetric positive definite matrix ðī has Cholesky factorization ðī = ðŋðŋð, where ðŋ is a lower triangular matrix with positive diagonal entries.
âĒ Example.
â ðī = ðīð with ððð > 0 and ððð > |ððð|ðâ ð is positive definite.
2
ðī =
ð11 ð12 âĶ ð1ðð12 ð22 âĶ ð2ðâŪ
ð1ð
âŪð2ð
âąâĶ
âŪððð
=
ð11 0 âĶ 0ð21 ð22 âĶ 0âŪðð1
âŪðð2
âąâĶ
âŪððð
ð11 ð21 âĶ ðð10 ð22 âĶ ðð2âŪ0
âŪ0
âąâĶ
âŪððð
3
In 2 Ã 2 matrix size case ð11 ð21ð21 ð22
=ð11 0ð21 ð22
ð11 ð210 ð22
ð11 = ð11; ð21 = ð21/ð11; ð22 = ð22 â ð212
4
Submatrix Cholesky Factorization Algorithm
5
for ð = 1 to ð ððð = ððð for ð = ð + 1 to ð ððð = ððð/ððð end for ð = ð + 1 to ð // reduce submatrix for ð = ð to ð ððð = ððð â ðððððð
end end end
Equivalent to ðððððð due
to symmetry
6
Remark:
1. This is a variation of Gaussian Elimination algorithm.
2. Storage of matrix ðī is used to hold matrix ðŋ .
3. Only lower triangle of ðī is used (See ððð = ððð â ðððððð).
4. Pivoting is not needed for stability.
5. About ð3/6 multiplications and about ð3/6 additions are required.
Data Access Pattern
Read only
Read and write
Column Cholesky Factorization Algorithm
7
for ð = 1 to ð for ð = 1 to ð â 1 for ð = ð to ð ððð = ððð â ðððððð
end end ððð = ððð
for ð = ð + 1 to ð ððð = ððð/ððð
end end
Data Access Pattern
8
Read only
Read and write
Parallel Algorithm
âĒ Parallel algorithms are similar to those for LU factorization.
9
References âĒ X. S. Li and J. W. Demmel, SuperLU_Dist: A scalable
distributed-memory sparse direct solver for unsymmetric linear systems, ACM Trans. Math. Software 29:110-140, 2003
âĒ P. HÃĐnon, P. Ramet, and J. Roman, PaStiX: A high-performance parallel direct solver for sparse symmetric positive definite systems, Parallel Computing 28:301-321, 2002
QR Factorization and HouseHolder Transformation
Theorem Suppose that matrix ðī is an ð Ã ð matrix with linearly independent columns, then ðī can be factored as ðī = ðð
where ð is an ð Ã ð matrix with orthonormal columns and ð is an invertible ð Ã ð upper triangular matrix.
10
QR Factorization by Gram-Schmidt Process
Consider matrix ðī = [ð1 ð2 âĶ |ðð]
Then,
ð1 = ð1, ð1 =ð1
||ð1||
ð2 = ð2 â (ð2 â ð1)ð1, ð2 =ð2
||ð2||
âĶ
ðð = ðð â ðð â ð1 ð1 ââŊâ ðð â ððâ1 ððâ1, ðð =ðð
||ðð||
ðī = ð1 ð2 âĶ ðð
= ð1 ð2 âĶ ðð
ð1 â ð1 ð2 â ð1 âĶ ðð â ð1
0 ð2 â ð2 âĶ ðð â ð2
âŪ âŪ âą âŪ0 0 âĶ ðð â ðð
= ðð
11
Classical/Modified Gram-Schmidt Process
for j=1 to n
ðð = ðð
for i = 1 to j-1
ððð = ðð
âðð (ðķððð ð ðððð)
ððð = ððâðð (ðððððððð)
ðð = ðð â ððððð
endfor
ððð = ||ðð||2
ðð = ðð/ððð
endfor 12
Householder Transformation
Let ðĢ â ð ð be a nonzero vector, the ð Ã ð matrix
ðŧ = ðž â 2ððð
ððð
is called a Householder transformation (or reflector).
âĒ Alternatively, let ð = ð/||ð||, ðŧ can be rewritten as
ðŧ = ðž â 2ððð.
Theorem. A Householder transformation ðŧ is symmetric and orthogonal, so ðŧ = ðŧð = ðŧâ1.
13
1. Let vector ð be perpendicular to ð.
ðž â 2ððð
ðððð = ð â 2
ð(ððð)
ððð= ð
2. Let ð =ð
| ð |. Any vector ð can be written as
ð = ð + ððð ð. ðž â 2ððð ð = ðž â 2ððð ð + ððð ð = ð â ððð ð
14
âĒ Householder transformation is used to selectively zero out blocks of entries in vectors or columns of matrices.
âĒ Householder is stable with respect to round-off error.
15
For a given a vector ð, find a Householder
transformation, ðŧ, such that ðŧð = ð
10âŪ0
= ðð1
â Previous vector reflection (case 2) implies that vector ð is in parallel with ð â ðŧð.
â Let ð = ð â ðŧð = ð â ðð1, where ð = Âą| ð | (when the arithmetic is exact, sign does not matter).
â ð = ð/|| ð||
â In the presence of round-off error, use
ð = ð + ð ððð ðĨ1 | ð |ð1
to avoid catastrophic cancellation. Here ðĨ1 is the first entry in the vector ð.
Thus in practice, ð = âð ððð ðĨ1 | ð | 16
Algorithm for Computing the Householder Transformation
17
ðĨð = max { ðĨ1 , ðĨ2 , âĶ , ðĨð } for ð = 1 to ð ðĢð = ðĨð/ðĨð end
ðž = ð ððð(ðĢ1)[ðĢ12 + ðĢ2
2 +âŊ+ ðĢð2]1/2
ðĢ1 = ðĢ1 + ðž ðž = âðžðĨð ð = ð/| ð |
Remark: 1. The computational complexity is ð(ð). 2. ðŧð is an ð ð operation compared to ð(ð2) for a general matrix-vector product. ðŧð = ð â ðð(ððŧð).
QR Factorization by HouseHolder Transformation
Key idea:
Apply Householder transformation successively to columns of matrix to zero out sub-diagonal entries of each column.
18
âĒ Stage 1 â Consider the first column of matrix ðī and determine a
HouseHolder matrix ðŧ1 so that ðŧ1
ð11ð21âŪ
ðð1
=
ðž10âŪ0
â Overwrite ðī by ðī1, where
ðī1 =
ðž1 ð12â âŊ ð1ð
â
0 ð22â âŪ ð2ð
â
âŪ âŪ âŪ âŪ0 ðð2
â âŊ ðððâ
= ðŧ1ðī
Denote vector which defines ðŧ1 vector ð1. ð1 = ð1/| ð1 |
Remark: Column vectors
ð12â
ð22â
âŪðð2â
âĶ
ð1ðâ
ð2ðâ
âŪðððâ
should be computed by
ðŧð = ð â ðð(ððŧð). 19
âĒ Stage 2 â Consider the second column of the updated matrix
ðī ⥠ðī1 = ðŧ1ðī and take the part below the diagonal to determine a Householder matrix ðŧ2
â so that
ðŧ2â
ð22â
ð32â
âŪðð2â
=
ðž2
0âŪ0
Remark: size of ðŧ2â is (ð â 1) Ã (ð â 1).
Denote vector which defines ðŧ2â vector ð2. ð2 = ð2/| ð2 |
â Inflate ðŧ2â to ðŧ2 where
ðŧ2 =1 00 ðŧ2
â
Overwrite ðī by ðī2 ⥠ðŧ2ðī1
20
âĒ Stage k
â Consider the ððĄâ column of the updated matrix ðī and take the part below the diagonal to determine a Householder matrix ðŧð
â so that
ðŧðâ
ðððâ
ðð+1,ðâ
âŪðð,ðâ
=
ðžð
0âŪ0
Remark: size of ðŧðâ is (ð â ð + 1) Ã (ð â ð + 1).
Denote vector which defines ðŧðâ vector ðð. ðð = ðð/| ðð |
â Inflate ðŧðâ to ðŧð where
ðŧð =ðžðâ1 0
0 ðŧðâ , ðžðâ1 is (ð â 1) Ã (ð â 1) identity matrix.
Overwrite ðī by ðīð ⥠ðŧððīðâ1
21
âĒ After (ð â 1) stages, matrix ðī is transformed into an upper triangular matrix ð .
âĒ ð = ðīðâ1 = ðŧðâ1ðīðâ2 = âŊ =ðŧðâ1ðŧðâ2âĶðŧ1ðī
âĒ Set ðð = ðŧðâ1ðŧðâ2âĶðŧ1. Then ðâ1 = ðð. Thus ð = ðŧ1
ððŧ2ð âĶðŧðâ1
ð
âĒ ðī = ðð
22
Example. ðī =1 11 21 3
.
Find ðŧ1 such that ðŧ1
111
=ðž100
.
By ð = âð ððð ðĨ1 | ð |,
ðž1 = â 3 = â1.721.
ðĒ1 = 0.8881, ðĒ2 = ðĒ3 = 0.3250.
By ðŧð = ð â ðð ððŧð , ðŧ1
123
=â3.46410.36611.3661
ðŧ1ðī =â1.721 â3.4641
0 0.36610 1.3661
23
Find ðŧ2â and ðŧ2, where
ðŧ2â 0.36611.3661
=ðž2
0
So ðž2 = â1.4143, ð2 =0.79220.6096
So ð = ðŧ2ðŧ1ðī =â1.7321 â3.4641
0 â1.41430 0
ð = (ðŧ2ðŧ1)ð= ðŧ1
ððŧ2ð
=â0.5774 0.7071 â0.4082â0.5774 0 0.8165â0.5774 â0.7071 â0.4082
24
ðīð = ð â ðð ð = ð
â ðððð ð = ððð â ð ð = ððð
Remark: Q rarely needs explicit construction
25
Parallel Householder QR
âĒ Householder QR factorization is similar to Gaussian elimination for LU factorization.
â Additions + multiplications: ð4
3ð3 for QR versus
ð2
3ð3 for LU
âĒ Forming Householder vector ððis similar to computing multipliers in Gaussian elimination.
âĒ Subsequent updating of remaining unreduced portion of matrix is similar to Gaussian elimination.
âĒ Parallel Householder QR is similar to parallel LU. Householder vectors need to broadcast among columns of matrix.
26
References
âĒ M. Cosnard, J. M. Muller, and Y. Robert. Parallel QR decomposition of a rectangular matrix, Numer. Math. 48:239-249, 1986
âĒ A. Pothen and P. Raghavan. Distributed orthogonal factorization: Givens and Householder algorithms, SIAM J. Sci. Stat. Comput. 10:1113-1134, 1989
27
top related