stat 310 lecture 3 - argonne national laboratoryanitescu/classes/2011/lectures/stat310-201… ·...
TRANSCRIPT
![Page 1: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/1.jpg)
STAT 310 Lecture 3 Mihai Anitescu
![Page 2: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/2.jpg)
Outline
�• Homework Questions? Structure of an optimization code (EXPAND)
�• Survey �• 2.3 Direct Linear Algebra �– Factorization �• 2.4 Sparsity �• 3.1 Failure of vanilla Newton �• 3.2 Line Search Methods �• 3.3 Dealing with Indefinite Matrices �• 3.4 Quasi-Newton Methods
![Page 3: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/3.jpg)
Some thoughts about coding
1. Think ahead of time what functionality your code will have, and define the interface properly
2. If portions of code are similar, try to define a function and �“refactorize�” (e.g the 3 different iterations).
3. Document your code. 4. Do not write long function files; they are
impossible to debug (unless very experienced).
![Page 4: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/4.jpg)
Example Encapsulation
[xout,iteratesGradNorms]=newtonLikeMethod(@fenton_wrap,[3 4]',1,1e-12,200)
![Page 5: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/5.jpg)
2.3 SOLVING SYSTEMS OF LINEAR EQUATIONS
![Page 6: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/6.jpg)
2.3.1 DIRECT METHODS: THE ESSENTIALS
![Page 7: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/7.jpg)
�• Lower Triangular Matrix
�• Upper Triangular Matrix
![Page 8: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/8.jpg)
�• LU decomposition / factorization [ A ] { x } = [ L ] [ U ] { x } = { b } �• Forward substitution [ L ] { d } = { b } �• Back substitution [ U ] { x } = { d } �• Q:Why might I do this instead of
Gaussian elimination?
![Page 9: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/9.jpg)
Complexity of LU Decomposition to solve Ax=b:
�– decompose A into LU -- cost 2n3/3 flops
�– solve Ly=b for y by forw. substitution -- cost n2 flops
�– solve Ux=y for x by back substitution -- cost n2 flops
slower alternative: �– compute A-1 -- cost 2n3 flops �– multiply x=A-1b -- cost 2n2
flops this costs about 3 times as much as LU 26 Sept. 2000 15-859B - Introduction to Scientific
Computing 9
![Page 10: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/10.jpg)
�• If [A] is symmetric and positive definite, it is convenient to use Cholesky decomposition.
[A] = [L][L]T= [U]T[U]
�• No pivoting or scaling needed if [A] is symmetric and positive definite (all eigenvalues are positive)
�• If [A] is not positive definite, the procedure may encounter the square root of a negative number
�• Complexity is ½ that of LU (due to symmetry exploitation)
![Page 11: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/11.jpg)
�• [A] = [U]T[U] �• Recurrence relations
![Page 12: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/12.jpg)
�• Still need pivoting in LU decomposition (why?)
�• Messes up order of [L]
�• What to do?
�• Need to pivot both [L] and a permutation matrix [P]
�• Initialize [P] as identity matrix and pivot when [A] is pivoted. Also pivot [L]
![Page 13: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/13.jpg)
�• Permutation matrix [ P ] - permutation of identity matrix [ I ] �• Permutation matrix performs �“bookkeeping�”
associated with the row exchanges �• Permuted matrix [ P ] [ A ] �• LU factorization of the permuted matrix [ P ] [ A ] = [ L ] [ U ] �• Solution [ L ] [ U ] {x} = [ P ] {b}
![Page 14: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/14.jpg)
LU-factorization for real symmetric Indefinite matrix A (constrained optimization has saddle points)
factorization
factorization
where and
Question: 1) If A is not singular, can I be guaranteed to find a nonsingular principal block E after pivoting? Of what size?
2) Why not LU-decomposition?
![Page 15: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/15.jpg)
History of LDL�’ decomposition: 1x1, 2x2 pivoting
�• diagonal pivoting method with complete pivoting: Bunch-Parlett, �“Direct methods fro solving symmetric indefinite systems of linear equations,�” SIAM J. Numer. Anal., v. 8, 1971, pp. 639-655
�• diagonal pivoting method with partial pivoting: Bunch-Kaufman, �“Some Stable Methods for Calculating Inertia and Solving Symmetric Linear Systems,�” Mathematics of Computation, volume 31, number 137, January 1977, page 163-179
�• DEMOS
![Page 16: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/16.jpg)
2.3.1 DIRECT METHODS: EXTRA DETAILS
![Page 17: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/17.jpg)
Gaussian Elimination (GE) �• Add multiples of each row to later rows to make A upper triangular �• Solve resulting triangular system Ux = c by substitution
Summer School Lecture 4 17
�… for each column i �… zero it out below the diagonal by adding multiples of row i to later rows for i = 1 to n-1 �… for each row j below row i for j = i+1 to n �… add a multiple of row i to row j tmp = A(j,i); for k = i to n A(j,k) = A(j,k) - (tmp/A(i,i)) * A(i,k)
0 . . . 0
0 . . . 0
0 . 0 0
0 0
0 . . . 0
0 . . . 0
0 . 0
0 . . . 0
0 . . . 0
0 . . . 0
After i=1 After i=2 After i=3 After i=n-1
�…
![Page 18: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/18.jpg)
Refine GE (1/5) �• Initial Version
Remove computation of constant tmp/A(i,i) from inner loop.
�… for each column i �… zero it out below the diagonal by adding multiples of row i to later rows for i = 1 to n-1 �… for each row j below row i for j = i+1 to n �… add a multiple of row i to row j tmp = A(j,i); for k = i to n A(j,k) = A(j,k) - (tmp/A(i,i)) * A(i,k)
for i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i to n A(j,k) = A(j,k) - m * A(i,k)
m
i
j
![Page 19: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/19.jpg)
Refine GE (2/5) �• Last version
�• Don�’t compute what we already know: zeros below diagonal in column i
Summer School Lecture 4 19
for i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - m * A(i,k)
for i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i to n A(j,k) = A(j,k) - m * A(i,k)
Do not compute zeros
m
i
j
![Page 20: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/20.jpg)
Refine GE Algorithm (3/5) �• Last version
�• Store multipliers m below diagonal in zeroed entries for later use
Summer School Lecture 4 20
for i = 1 to n-1 for j = i+1 to n m = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - m * A(i,k)
for i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - A(j,i) * A(i,k)
Store m here
m
i
j
![Page 21: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/21.jpg)
Refine GE Algorithm (4/5) �• Last version
Summer School Lecture 4 21
for i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for k = i+1 to n A(j,k) = A(j,k) - A(j,i) * A(i,k)
�• Split Loop
for i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for j = i+1 to n for k = i+1 to n A(j,k) = A(j,k) - A(j,i) * A(i,k)
Store all m�’s here before updating rest of matrix
i
j
![Page 22: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/22.jpg)
Refine GE Algorithm (5/5) �• Last version
Summer School Lecture 4 22
for i = 1 to n-1 A(i+1:n,i) = A(i+1:n,i) * ( 1 / A(i,i) ) �… BLAS 1 (scale a vector) A(i+1:n,i+1:n) = A(i+1:n , i+1:n ) - A(i+1:n , i) * A(i , i+1:n) �… BLAS 2 (rank-1 update)
for i = 1 to n-1 for j = i+1 to n A(j,i) = A(j,i)/A(i,i) for j = i+1 to n for k = i+1 to n A(j,k) = A(j,k) - A(j,i) * A(i,k)
![Page 23: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/23.jpg)
What GE really computes
�• Call the strictly lower triangular matrix of multipliers M, and let L = I+M
�• Call the upper triangle of the final matrix U �• Lemma (LU Factorization): If the above
algorithm terminates (does not divide by zero) then A = L*U
Summer School Lecture 4 23
for i = 1 to n-1 A(i+1:n,i) = A(i+1:n,i) / A(i,i) �… BLAS 1 (scale a vector) A(i+1:n,i+1:n) = A(i+1:n , i+1:n ) - A(i+1:n , i) * A(i , i+1:n) �… BLAS 2 (rank-1 update)
![Page 24: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/24.jpg)
What GE really computes
�• Solving A*x=b using GE �– Factorize A = L*U using GE (cost =
2/3 n3 flops) �– Solve L*y = b for y, using substitution (cost = n2
flops) �– Solve U*x = y for x, using substitution (cost = n2
flops)
�• Thus A*x = (L*U)*x = L*(U*x) = L*y = b as desired Summer School Lecture 4 24
for i = 1 to n-1 A(i+1:n,i) = A(i+1:n,i) / A(i,i) �… BLAS 1 (scale a vector) A(i+1:n,i+1:n) = A(i+1:n , i+1:n ) - A(i+1:n , i) * A(i , i+1:n) �… BLAS 2 (rank-1 update)
![Page 25: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/25.jpg)
�• Once [L] is formed, we can use forward substitution instead of forward elimination for different {b}�’s
Very efficient for large matrices !
![Page 26: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/26.jpg)
Identical to Gauss
elimination
![Page 27: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/27.jpg)
Example:
![Page 28: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/28.jpg)
![Page 29: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/29.jpg)
�• Forward-substitution
�• Back-substitution (identical to Gauss elimination)
![Page 30: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/30.jpg)
![Page 31: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/31.jpg)
2.4 COMPLEXITY OF LINEAR ALGEBRA; SPARSITY
![Page 32: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/32.jpg)
Complexity of LU Decomposition to solve Ax=b:
�– decompose A into LU -- cost 2n3/3 flops
�– solve Ly=b for y by forw. substitution -- cost n2 flops
�– solve Ux=y for x by back substitution -- cost n2 flops
slower alternative: �– compute A-1 -- cost 2n3 flops �– multiply x=A-1b -- cost 2n2
flops this costs about 3 times as much as LU 26 Sept. 2000 15-859B - Introduction to Scientific
Computing 32
![Page 33: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/33.jpg)
Complexity of linear algebra
lesson: �– if you see A-1 in a formula, read it as �“solve a
system�”, not �“invert a matrix�”
Cholesky factorization -- cost n3/3 flops
LDL�’ factorization -- cost n3/3 flops
Q: What is the cost of Cramer�’s rule (roughly)? 26 Sept. 2000 15-859B - Introduction to Scientific Computing
33
![Page 34: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/34.jpg)
Sparse Linear Algebra
�• Suppose you are applying matrix-vector multiply and the matrix has lots of zero elements �– Computation cost? Space requirements?
�• General sparse matrix representation concepts �– Primarily only represent the nonzero data values
(nnz) �– Auxiliary data structures describe placement of
nonzeros in �“dense matrix�”
�• And *MAYBE* LU or Cholesky can be done in O(nnz), so not as bad as (O(n^3)); since very oftentimes nnz=O(n)
![Page 35: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/35.jpg)
Sparse Linear Algebra
�• Because of its phenomenal computational and storage savings potential, sparse linear algebra is a huge research topic.
�• VERY difficult to develop. �• Matlab implements sparse linear algebra based
on i,j,s format. �• DEMO �• Conclusion: Maybe I can SCALE well �… Solve
O(10^12) problems in O(10^12). 35
![Page 36: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/36.jpg)
SUMMARY SECTION 2
�• The heaviest components of numerical software are Numerical differentiation (AD/DIVDIFF) and linear algebra.
�• Factorization is always preferable to direct (Gaussian) elimination.
�• Keeping track of sparsity in linear algebra can enormously improve performance.
![Page 37: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/37.jpg)
Section 3: Line Search Methods Mihai Anitescu STAT 310 Reference: Chapter 3 in Nocedal and Wright.
![Page 38: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/38.jpg)
3.1 FAILURE OF NEWTON METHODS
![Page 39: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/39.jpg)
Problem definition
min f (x)
f : Rn R - continuously differentiable - gradient is available -Hessian is unavailable
Necessary optimality conditions: f (x*) = 0
Sufficient optimality conditions:
2 f (x*) 0
![Page 40: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/40.jpg)
DEMO
�• Algorithm: Newton. �• Note: not only does the algorithm not converge,
the function values go to infinity. �• So we should have known ahead of time we
should have done something else earlier.
![Page 41: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/41.jpg)
Ways of enforcing that thinks do not blow up or wander
�• 1. Line-search methods. �– Make a �“guess�” of a good direction. �– Make good progress along that direction. At least
know you will decrease f.
�• 2. Trust region model. �– Create a quadratic model of the function. �– Define a region where we �“believe�”�—�”trust�” the
model and find a �“good�” point in that �“region�”. �– If at that point the model is far from f, less trust�—
smaller region, if not, more �–larger region.
![Page 42: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/42.jpg)
3.2 LINE SEARCH METHODS
![Page 43: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/43.jpg)
3.2.1 LINE SEARCH METHODS: ESSENTIALS
![Page 44: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/44.jpg)
Line Search Methods Idea:
�• At the current point find a �“Newton-like�” direction
�• Along that direction do 1-dimensional minimization (simpler than over whole space)
�• Because the line search always decreases f, we will have an accumulation point (cannot diverge if bounded below) �– unlike Newton proper
xkdk
dk
xk+1 argmin f (xk + dk )
![Page 45: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/45.jpg)
g ( ) = f ( x k + p k ) for f ( x k ) ' p k < 0
Descent Principle �• Descent Principle: Carry Out a one-Dimensional
Search Along a Line where I will decrease the function.
�• If this happens, there exists an alpha (why? ) such that.
�• So I will keep making progress. �• Typical choice (why)? �• Newton may need to be modified (why?)
f xk + pk( ) < f xk( )
Bk pk = f (xk ); Bk 0
![Page 46: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/46.jpg)
Line Search-Armijo
f (xk ) f (xk +m
k dk ) mk f (xk )T dk
(0,1) (0,1 / 2)
g(0)+ g�’( )
g(0)+ c1 g�’( )
�•I cannot accept just about ANY decrease, for I may NEVER converge (why , example of spurious convergence). �• IDEA: Accept only decreases PROPORTIONAL TO THE SQUARE OF GRADIENT. Then I have to converge (since process stops only when gradient is 0). �• Example: Armijo Rule. It uses the concept of BACKTRACKING.
![Page 47: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/47.jpg)
Some Theory
Newton is accepted by LS
Global Convergence:
Fast Convergence:
![Page 48: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/48.jpg)
Extensions
�• Line Search Refinements: �– Use interpolation �– Wolfe and Goldshtein rule
�• Other optimization approaches �– Steepest descent, �– CG �….
![Page 49: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/49.jpg)
3.2.2 LINE SEARCH METHODS: EXTRAS.
![Page 50: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/50.jpg)
3.2.2.1 LINE SEARCH METHODS: USING INTERPOLATION IN LINE SEARCH
![Page 51: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/51.jpg)
Quadratic Interpolation Approximate g( ) with h(0)
h(0)=g(0), h�’(0) =g�’(0), g( 0)
g( 0)
1
g( 0) g( 0) 0g' ( ) 2 2
![Page 52: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/52.jpg)
Quadratic Interpolation Potential step 1 = g'( ) 02
2(g( 0) g(0) 0g'( ))
g( 0)
1
![Page 53: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/53.jpg)
b = 2( ) 03 1 g( 0) g(0) g'(0) 0
2
2 0 1 1 0
Cubic Interpolation h( ) = a 3 +b 2 + g'(0)+ g(0)
a 1 0 12 g( 1 ) g(0) g'(0) 1 3
b+ b2 3ag'(0) 3a 2 =
Cubic Interpolation
![Page 54: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/54.jpg)
3.2.2.2 LINE SEARCH METHODS: OTHER LINE SEARCH PRINCIPLES
![Page 55: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/55.jpg)
Unconstrained optimization methods
xk+1 = xk + k dk
k : dk :Step length Search direction
1) Line search
2) Trust-Region algorithms
Quadratic approximation
Influences
![Page 56: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/56.jpg)
Step length computation:
1) Armijo rule:
2) Goldstein rule:
1 k gkT dk f (xk + k dk ) f (xk ) 2 k gk
T dk
0 < 2 <12 < 1 <1
f (xk ) f (xk +m
k dk ) mk f (xk )T dk
(0,1) (0,1 / 2)
k = ( f (xk )T dk ) / dk
2
![Page 57: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/57.jpg)
3) Wolfe conditions:
f (xk + k dk ) f (xk ) k gkT dk
f (xk + k dk )T dk gkT dk
0 < <1
Implementations:
Shanno (1978)
Moré - Thuente (1992-1994)
![Page 58: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/58.jpg)
3.2.2.3 LINE SEARCH METHODS: TAXONOMY OF METHODS
![Page 59: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/59.jpg)
Methods for Unconstrained Optimization
1) Steepest descent (Cauchy, 1847)
dk = f (xk )
dk =2 f (xk ) 1 f (xk )
2) Newton
3) Quasi-Newton (Broyden, 1965; and many others)
dk = Hk f (xk )
![Page 60: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/60.jpg)
4) Conjugate Gradient Methods (1952)
dk+1 = gk+1 + ksk
sk = xk+1 xk
dk2 f (xk ) 1 gk
rk = 2 f (xk )dk + gk
k is known as the conjugate gradient parameter
5) Truncated Newton method (Dembo, et al, 1982)
6) Trust Region methods
![Page 61: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/61.jpg)
7) Conic model method (Davidon, 1980)
c(d) = f (xk ) +
gkT d
1+ bT d+ 1
2dT Ak d
(1+ bT d)2
q(d) = f (xk ) + gk
T d + 12
dT Bk d
8) Tensorial methods (Schnabel & Frank, 1984)
mT (xc + d) = f (xc ) + f (xc ) d + 1
22 f (xc ) d 2
+ 1
6Tc d 3 + 1
24Vc d 4
![Page 62: STAT 310 Lecture 3 - Argonne National Laboratoryanitescu/CLASSES/2011/LECTURES/STAT310-201… · Summer School Lecture 4 21 for i = 1 to n-1 for j = i+1 to ... Mihai Anitescu STAT](https://reader031.vdocuments.net/reader031/viewer/2022022608/5b88ce6d7f8b9aaf728e674c/html5/thumbnails/62.jpg)
10) Direct searching methods
9) Methods based on systems of Differential Equations Gradient flow Method (Courant, 1942)
dxdt
= 2 f (x) 1 f (x)
x(0) = x0
Hooke-Jevees (form searching) (1961) Powell (conjugate directions) (1964) Rosenbrock (coordinate system rotation)(1960) Nelder-Mead (rolling the simplex) (1965) Powell �–UOBYQA (quadratic approximation) (1994-2000)
N. Andrei, Critica Retiunii Algoritmilor de Optimizare fara Restrictii Editura Academiei Romane, 2008.