lecture 8: newton methods - xiamen universitymath.xmu.edu.cn/group/nona/damc/lecture08.pdflecture 8:...
TRANSCRIPT
Lecture 8 Newton Methods
April 15 - 17 2020
Newton Methods Lecture 8 April 15 - 17 2020 1 16
1 Basic Newtonrsquos Method
Consider the problemminxisinRn
f(x)
where f Rn 983041rarr R is Lipschitz twice continuously differentiable
A second-order Taylor series approximation to f around xk is
f(xk + d) asymp f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d
When nabla2f(xk) is positive definite the minimizer dk of theright-hand side is unique it is
dk = minusnabla2f(xk)minus1nablaf(xk)
Basic Newtonrsquos iteration
xk+1 = xk minusnabla2f(xk)minus1nablaf(xk)
Newton Methods Lecture 8 April 15 - 17 2020 2 16
11 Newtonrsquos Method vs Steepest Descent vs CG
Given SPD A isin Rntimesn
Aminus1b = argminxisinRn
1
2xTAxminus bTx
Steepest Descent iteration
xk+1 = xk minus (Axk minus b)T(Axk minus b)
(Axk minus b)TA(Axk minus b)(Axk minus b)
Newtonrsquos iteration x1 = x0 minusAminus1(Ax0 minus b)
Newton Methods Lecture 8 April 15 - 17 2020 3 16
Theorem 1 (Local quadratic convergence)
Suppose f(x) is twice Lipschitz continuously differentiable withLipschitz constant M ie
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
Suppose that (the second-order sufficient conditions)
nablaf(x983183) = 0 and nabla2f(x983183) ≽ γI for some γ gt 0
which ensure that x983183 is a local minimizer of f(x) If
983042x0 minus x983183983042 le γ
2M
then the sequence xkinfin0 in Newtonrsquos method converges to x983183 at aquadratic rate with
983042xk+1 minus x983183983042 le M
γ983042xk minus x9831839830422 k = 0 1 2
Newton Methods Lecture 8 April 15 - 17 2020 4 16
2 DD + Newton for smooth strongly convex functions
If f is γ-strongly convex and nablaf is L-Lipschitz continuous thennabla2f(x) is positive definite and γI ≼ nabla2f(x) ≼ LI The Newtondirection
dk = minusnabla2f(xk)minus1nablaf(xk)
is a descent direction satisfying
nablaf(xk)Tdk le minus γ
L983042nablaf(xk)983042983042dk983042
The DD method using the Newton direction yields xk rarr x983183 wherex983183 is the (unique) global minimizer of f
Two stage method DD + Newton
Global sublinear convergence of DD is enhanced to the localquadratic convergence if we use αk = 1 whenever it satisfies theweak Wolfe conditions
DD + Newton global sublinear + local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 5 16
3 DD + Newton for smooth convex functions
If f is convex but not strongly convex and nablaf is L-Lipschitzcontinuous then nabla2f(x) may be singular for some x ie
0 ≼ nabla2f(x) ≼ LI
So the Newton direction may not be well defined
Consider the modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
which is a descent direction The DD method using the modifiedNewton direction yields xk rarr x983183 where x983183 is a minimizer of f
Two stage method DD + Newton
If the minimizer x983183 is unique and nabla2f(x983183) is positive definite thennabla2f(x983183) will be positive definite for sufficiently large k
DD + Newton global sublinear + local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 6 16
4 DD + Newton for smooth nonconvex functions
For smooth nonconvex f the Hessian nabla2f(xk) may be indefinitefor some k The Newton direction may not exist (when nabla2f(xk) issingular) or it may not be a descent direction (when nabla2f(xk) hasnegative eigenvalues) The modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
will be a descent direction for λk sufficiently large For given0 lt η lt 1 a sufficient condition is
λk + λmin(nabla2f(xk))
λk + Lge η
Two stage method DD + Newton
Once again if the DD iterates xk enter the neighborhood of a localsolution x983183 for which nabla2f(x983183) is positive definite some strategyfor choosing λk and αk recovers the local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 7 16
41 Other modified Newton directions
Modified Cholesky factorization For indefinite nabla2f(xk) by addingpositive elements if needed to avoid taking the square root of anegative number the factorization continues to proceed Using themodified factorization in place of nabla2f(xk) in the calculation of theNewton direction dk we obtain a new modified Newton direction
Given eigenvalue decomposition
nabla2f(xk) = QkΛkQTk
we can define a modified Newton direction
dk = minusQk983144Λminus1k QT
knablaf(xk)
where 983144Λk with positive diagonal entries is a modified version of Λk
For more modified Newton directions to ensure descent in a DDframework see Chapter 3 of NO
Newton Methods Lecture 8 April 15 - 17 2020 8 16
5 Trust region method
The trust-region subproblem Given gk and symmetric Bk
mind
f(xk) + gTk d+
1
2dTBkd st 983042d983042 le ∆k
where ∆k is the radius of the trust region in which the quadraticf(xk) + gT
k d+ 12d
TBkd ldquowellrdquo captures the true behavior of f
The solution dk of the subproblem satisfies the linear system
[Bk + λI]dk = minusgk for some λ ge 0
where λ is chosen such that Bk + λI is positive semidefinite andλ(983042dk983042 minus∆k) = 0 (Exercise [Sorensen etc])
Solving the subproblem thus reduces to a search for the value of λSpecialized methods have been devised [Sorensen etc]
Newton Methods Lecture 8 April 15 - 17 2020 9 16
The trust-region method procedure
Define the ratio ρk between the actual decrease in f and theamount of decrease in the quadratic objective
ρk =f(xk + dk)minus f(xk)1
2(dk)TBkd
k + gTk d
k
If ρk is at least greater than a small tolerance (eg 01) we acceptthe step and proceed to the next iteration Otherwise the trustregion radius ∆k is too large so we do not take the step shrink thetrust region and resolve the new subproblem to obtain a new step
If ρk is close to 1 and the bound 983042dk983042 le ∆k is active (ie983042dk983042 = ∆k) we conclude that a larger trust region may hastenprogress so we increase ∆k for the next iteration
Newton Methods Lecture 8 April 15 - 17 2020 10 16
51 Dogleg method for trust region subproblem
For large-scale problems it may be too expensive to solvetrust-region subproblem near-exactly since the process mayrequire several factorizations of Bk + λI for different values of λ
A popular approach for finding approximate solutions which canbe used when Bk is positive definite is the dogleg method
Newton Methods Lecture 8 April 15 - 17 2020 11 16
52 Trust-region Newton method
The subproblem
mind
f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d st 983042d983042 le ∆k
The trust-region Newton method can ldquoescaperdquo from a saddlepoint Suppose nablaf(xk) = 0 and nabla2f(xk) indefinite with somestrictly negative eigenvalues Then the solution dk of thesubproblem will be nonzero and the algorithm will step away fromthe saddle point xk in the direction of most negative curvature fornabla2f(xk) This guarantees that any accumulation points willsatisfy second-order necessary conditions
Another appealing feature of the trust-region Newton approach isthat when the sequence xk approaches a point x983183 satisfyingsecond-order sufficient conditions the trust region bound becomesinactive and the method takes basic Newton steps for allsufficiently large k so it has local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 12 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
1 Basic Newtonrsquos Method
Consider the problemminxisinRn
f(x)
where f Rn 983041rarr R is Lipschitz twice continuously differentiable
A second-order Taylor series approximation to f around xk is
f(xk + d) asymp f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d
When nabla2f(xk) is positive definite the minimizer dk of theright-hand side is unique it is
dk = minusnabla2f(xk)minus1nablaf(xk)
Basic Newtonrsquos iteration
xk+1 = xk minusnabla2f(xk)minus1nablaf(xk)
Newton Methods Lecture 8 April 15 - 17 2020 2 16
11 Newtonrsquos Method vs Steepest Descent vs CG
Given SPD A isin Rntimesn
Aminus1b = argminxisinRn
1
2xTAxminus bTx
Steepest Descent iteration
xk+1 = xk minus (Axk minus b)T(Axk minus b)
(Axk minus b)TA(Axk minus b)(Axk minus b)
Newtonrsquos iteration x1 = x0 minusAminus1(Ax0 minus b)
Newton Methods Lecture 8 April 15 - 17 2020 3 16
Theorem 1 (Local quadratic convergence)
Suppose f(x) is twice Lipschitz continuously differentiable withLipschitz constant M ie
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
Suppose that (the second-order sufficient conditions)
nablaf(x983183) = 0 and nabla2f(x983183) ≽ γI for some γ gt 0
which ensure that x983183 is a local minimizer of f(x) If
983042x0 minus x983183983042 le γ
2M
then the sequence xkinfin0 in Newtonrsquos method converges to x983183 at aquadratic rate with
983042xk+1 minus x983183983042 le M
γ983042xk minus x9831839830422 k = 0 1 2
Newton Methods Lecture 8 April 15 - 17 2020 4 16
2 DD + Newton for smooth strongly convex functions
If f is γ-strongly convex and nablaf is L-Lipschitz continuous thennabla2f(x) is positive definite and γI ≼ nabla2f(x) ≼ LI The Newtondirection
dk = minusnabla2f(xk)minus1nablaf(xk)
is a descent direction satisfying
nablaf(xk)Tdk le minus γ
L983042nablaf(xk)983042983042dk983042
The DD method using the Newton direction yields xk rarr x983183 wherex983183 is the (unique) global minimizer of f
Two stage method DD + Newton
Global sublinear convergence of DD is enhanced to the localquadratic convergence if we use αk = 1 whenever it satisfies theweak Wolfe conditions
DD + Newton global sublinear + local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 5 16
3 DD + Newton for smooth convex functions
If f is convex but not strongly convex and nablaf is L-Lipschitzcontinuous then nabla2f(x) may be singular for some x ie
0 ≼ nabla2f(x) ≼ LI
So the Newton direction may not be well defined
Consider the modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
which is a descent direction The DD method using the modifiedNewton direction yields xk rarr x983183 where x983183 is a minimizer of f
Two stage method DD + Newton
If the minimizer x983183 is unique and nabla2f(x983183) is positive definite thennabla2f(x983183) will be positive definite for sufficiently large k
DD + Newton global sublinear + local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 6 16
4 DD + Newton for smooth nonconvex functions
For smooth nonconvex f the Hessian nabla2f(xk) may be indefinitefor some k The Newton direction may not exist (when nabla2f(xk) issingular) or it may not be a descent direction (when nabla2f(xk) hasnegative eigenvalues) The modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
will be a descent direction for λk sufficiently large For given0 lt η lt 1 a sufficient condition is
λk + λmin(nabla2f(xk))
λk + Lge η
Two stage method DD + Newton
Once again if the DD iterates xk enter the neighborhood of a localsolution x983183 for which nabla2f(x983183) is positive definite some strategyfor choosing λk and αk recovers the local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 7 16
41 Other modified Newton directions
Modified Cholesky factorization For indefinite nabla2f(xk) by addingpositive elements if needed to avoid taking the square root of anegative number the factorization continues to proceed Using themodified factorization in place of nabla2f(xk) in the calculation of theNewton direction dk we obtain a new modified Newton direction
Given eigenvalue decomposition
nabla2f(xk) = QkΛkQTk
we can define a modified Newton direction
dk = minusQk983144Λminus1k QT
knablaf(xk)
where 983144Λk with positive diagonal entries is a modified version of Λk
For more modified Newton directions to ensure descent in a DDframework see Chapter 3 of NO
Newton Methods Lecture 8 April 15 - 17 2020 8 16
5 Trust region method
The trust-region subproblem Given gk and symmetric Bk
mind
f(xk) + gTk d+
1
2dTBkd st 983042d983042 le ∆k
where ∆k is the radius of the trust region in which the quadraticf(xk) + gT
k d+ 12d
TBkd ldquowellrdquo captures the true behavior of f
The solution dk of the subproblem satisfies the linear system
[Bk + λI]dk = minusgk for some λ ge 0
where λ is chosen such that Bk + λI is positive semidefinite andλ(983042dk983042 minus∆k) = 0 (Exercise [Sorensen etc])
Solving the subproblem thus reduces to a search for the value of λSpecialized methods have been devised [Sorensen etc]
Newton Methods Lecture 8 April 15 - 17 2020 9 16
The trust-region method procedure
Define the ratio ρk between the actual decrease in f and theamount of decrease in the quadratic objective
ρk =f(xk + dk)minus f(xk)1
2(dk)TBkd
k + gTk d
k
If ρk is at least greater than a small tolerance (eg 01) we acceptthe step and proceed to the next iteration Otherwise the trustregion radius ∆k is too large so we do not take the step shrink thetrust region and resolve the new subproblem to obtain a new step
If ρk is close to 1 and the bound 983042dk983042 le ∆k is active (ie983042dk983042 = ∆k) we conclude that a larger trust region may hastenprogress so we increase ∆k for the next iteration
Newton Methods Lecture 8 April 15 - 17 2020 10 16
51 Dogleg method for trust region subproblem
For large-scale problems it may be too expensive to solvetrust-region subproblem near-exactly since the process mayrequire several factorizations of Bk + λI for different values of λ
A popular approach for finding approximate solutions which canbe used when Bk is positive definite is the dogleg method
Newton Methods Lecture 8 April 15 - 17 2020 11 16
52 Trust-region Newton method
The subproblem
mind
f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d st 983042d983042 le ∆k
The trust-region Newton method can ldquoescaperdquo from a saddlepoint Suppose nablaf(xk) = 0 and nabla2f(xk) indefinite with somestrictly negative eigenvalues Then the solution dk of thesubproblem will be nonzero and the algorithm will step away fromthe saddle point xk in the direction of most negative curvature fornabla2f(xk) This guarantees that any accumulation points willsatisfy second-order necessary conditions
Another appealing feature of the trust-region Newton approach isthat when the sequence xk approaches a point x983183 satisfyingsecond-order sufficient conditions the trust region bound becomesinactive and the method takes basic Newton steps for allsufficiently large k so it has local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 12 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
11 Newtonrsquos Method vs Steepest Descent vs CG
Given SPD A isin Rntimesn
Aminus1b = argminxisinRn
1
2xTAxminus bTx
Steepest Descent iteration
xk+1 = xk minus (Axk minus b)T(Axk minus b)
(Axk minus b)TA(Axk minus b)(Axk minus b)
Newtonrsquos iteration x1 = x0 minusAminus1(Ax0 minus b)
Newton Methods Lecture 8 April 15 - 17 2020 3 16
Theorem 1 (Local quadratic convergence)
Suppose f(x) is twice Lipschitz continuously differentiable withLipschitz constant M ie
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
Suppose that (the second-order sufficient conditions)
nablaf(x983183) = 0 and nabla2f(x983183) ≽ γI for some γ gt 0
which ensure that x983183 is a local minimizer of f(x) If
983042x0 minus x983183983042 le γ
2M
then the sequence xkinfin0 in Newtonrsquos method converges to x983183 at aquadratic rate with
983042xk+1 minus x983183983042 le M
γ983042xk minus x9831839830422 k = 0 1 2
Newton Methods Lecture 8 April 15 - 17 2020 4 16
2 DD + Newton for smooth strongly convex functions
If f is γ-strongly convex and nablaf is L-Lipschitz continuous thennabla2f(x) is positive definite and γI ≼ nabla2f(x) ≼ LI The Newtondirection
dk = minusnabla2f(xk)minus1nablaf(xk)
is a descent direction satisfying
nablaf(xk)Tdk le minus γ
L983042nablaf(xk)983042983042dk983042
The DD method using the Newton direction yields xk rarr x983183 wherex983183 is the (unique) global minimizer of f
Two stage method DD + Newton
Global sublinear convergence of DD is enhanced to the localquadratic convergence if we use αk = 1 whenever it satisfies theweak Wolfe conditions
DD + Newton global sublinear + local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 5 16
3 DD + Newton for smooth convex functions
If f is convex but not strongly convex and nablaf is L-Lipschitzcontinuous then nabla2f(x) may be singular for some x ie
0 ≼ nabla2f(x) ≼ LI
So the Newton direction may not be well defined
Consider the modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
which is a descent direction The DD method using the modifiedNewton direction yields xk rarr x983183 where x983183 is a minimizer of f
Two stage method DD + Newton
If the minimizer x983183 is unique and nabla2f(x983183) is positive definite thennabla2f(x983183) will be positive definite for sufficiently large k
DD + Newton global sublinear + local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 6 16
4 DD + Newton for smooth nonconvex functions
For smooth nonconvex f the Hessian nabla2f(xk) may be indefinitefor some k The Newton direction may not exist (when nabla2f(xk) issingular) or it may not be a descent direction (when nabla2f(xk) hasnegative eigenvalues) The modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
will be a descent direction for λk sufficiently large For given0 lt η lt 1 a sufficient condition is
λk + λmin(nabla2f(xk))
λk + Lge η
Two stage method DD + Newton
Once again if the DD iterates xk enter the neighborhood of a localsolution x983183 for which nabla2f(x983183) is positive definite some strategyfor choosing λk and αk recovers the local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 7 16
41 Other modified Newton directions
Modified Cholesky factorization For indefinite nabla2f(xk) by addingpositive elements if needed to avoid taking the square root of anegative number the factorization continues to proceed Using themodified factorization in place of nabla2f(xk) in the calculation of theNewton direction dk we obtain a new modified Newton direction
Given eigenvalue decomposition
nabla2f(xk) = QkΛkQTk
we can define a modified Newton direction
dk = minusQk983144Λminus1k QT
knablaf(xk)
where 983144Λk with positive diagonal entries is a modified version of Λk
For more modified Newton directions to ensure descent in a DDframework see Chapter 3 of NO
Newton Methods Lecture 8 April 15 - 17 2020 8 16
5 Trust region method
The trust-region subproblem Given gk and symmetric Bk
mind
f(xk) + gTk d+
1
2dTBkd st 983042d983042 le ∆k
where ∆k is the radius of the trust region in which the quadraticf(xk) + gT
k d+ 12d
TBkd ldquowellrdquo captures the true behavior of f
The solution dk of the subproblem satisfies the linear system
[Bk + λI]dk = minusgk for some λ ge 0
where λ is chosen such that Bk + λI is positive semidefinite andλ(983042dk983042 minus∆k) = 0 (Exercise [Sorensen etc])
Solving the subproblem thus reduces to a search for the value of λSpecialized methods have been devised [Sorensen etc]
Newton Methods Lecture 8 April 15 - 17 2020 9 16
The trust-region method procedure
Define the ratio ρk between the actual decrease in f and theamount of decrease in the quadratic objective
ρk =f(xk + dk)minus f(xk)1
2(dk)TBkd
k + gTk d
k
If ρk is at least greater than a small tolerance (eg 01) we acceptthe step and proceed to the next iteration Otherwise the trustregion radius ∆k is too large so we do not take the step shrink thetrust region and resolve the new subproblem to obtain a new step
If ρk is close to 1 and the bound 983042dk983042 le ∆k is active (ie983042dk983042 = ∆k) we conclude that a larger trust region may hastenprogress so we increase ∆k for the next iteration
Newton Methods Lecture 8 April 15 - 17 2020 10 16
51 Dogleg method for trust region subproblem
For large-scale problems it may be too expensive to solvetrust-region subproblem near-exactly since the process mayrequire several factorizations of Bk + λI for different values of λ
A popular approach for finding approximate solutions which canbe used when Bk is positive definite is the dogleg method
Newton Methods Lecture 8 April 15 - 17 2020 11 16
52 Trust-region Newton method
The subproblem
mind
f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d st 983042d983042 le ∆k
The trust-region Newton method can ldquoescaperdquo from a saddlepoint Suppose nablaf(xk) = 0 and nabla2f(xk) indefinite with somestrictly negative eigenvalues Then the solution dk of thesubproblem will be nonzero and the algorithm will step away fromthe saddle point xk in the direction of most negative curvature fornabla2f(xk) This guarantees that any accumulation points willsatisfy second-order necessary conditions
Another appealing feature of the trust-region Newton approach isthat when the sequence xk approaches a point x983183 satisfyingsecond-order sufficient conditions the trust region bound becomesinactive and the method takes basic Newton steps for allsufficiently large k so it has local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 12 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
Theorem 1 (Local quadratic convergence)
Suppose f(x) is twice Lipschitz continuously differentiable withLipschitz constant M ie
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
Suppose that (the second-order sufficient conditions)
nablaf(x983183) = 0 and nabla2f(x983183) ≽ γI for some γ gt 0
which ensure that x983183 is a local minimizer of f(x) If
983042x0 minus x983183983042 le γ
2M
then the sequence xkinfin0 in Newtonrsquos method converges to x983183 at aquadratic rate with
983042xk+1 minus x983183983042 le M
γ983042xk minus x9831839830422 k = 0 1 2
Newton Methods Lecture 8 April 15 - 17 2020 4 16
2 DD + Newton for smooth strongly convex functions
If f is γ-strongly convex and nablaf is L-Lipschitz continuous thennabla2f(x) is positive definite and γI ≼ nabla2f(x) ≼ LI The Newtondirection
dk = minusnabla2f(xk)minus1nablaf(xk)
is a descent direction satisfying
nablaf(xk)Tdk le minus γ
L983042nablaf(xk)983042983042dk983042
The DD method using the Newton direction yields xk rarr x983183 wherex983183 is the (unique) global minimizer of f
Two stage method DD + Newton
Global sublinear convergence of DD is enhanced to the localquadratic convergence if we use αk = 1 whenever it satisfies theweak Wolfe conditions
DD + Newton global sublinear + local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 5 16
3 DD + Newton for smooth convex functions
If f is convex but not strongly convex and nablaf is L-Lipschitzcontinuous then nabla2f(x) may be singular for some x ie
0 ≼ nabla2f(x) ≼ LI
So the Newton direction may not be well defined
Consider the modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
which is a descent direction The DD method using the modifiedNewton direction yields xk rarr x983183 where x983183 is a minimizer of f
Two stage method DD + Newton
If the minimizer x983183 is unique and nabla2f(x983183) is positive definite thennabla2f(x983183) will be positive definite for sufficiently large k
DD + Newton global sublinear + local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 6 16
4 DD + Newton for smooth nonconvex functions
For smooth nonconvex f the Hessian nabla2f(xk) may be indefinitefor some k The Newton direction may not exist (when nabla2f(xk) issingular) or it may not be a descent direction (when nabla2f(xk) hasnegative eigenvalues) The modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
will be a descent direction for λk sufficiently large For given0 lt η lt 1 a sufficient condition is
λk + λmin(nabla2f(xk))
λk + Lge η
Two stage method DD + Newton
Once again if the DD iterates xk enter the neighborhood of a localsolution x983183 for which nabla2f(x983183) is positive definite some strategyfor choosing λk and αk recovers the local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 7 16
41 Other modified Newton directions
Modified Cholesky factorization For indefinite nabla2f(xk) by addingpositive elements if needed to avoid taking the square root of anegative number the factorization continues to proceed Using themodified factorization in place of nabla2f(xk) in the calculation of theNewton direction dk we obtain a new modified Newton direction
Given eigenvalue decomposition
nabla2f(xk) = QkΛkQTk
we can define a modified Newton direction
dk = minusQk983144Λminus1k QT
knablaf(xk)
where 983144Λk with positive diagonal entries is a modified version of Λk
For more modified Newton directions to ensure descent in a DDframework see Chapter 3 of NO
Newton Methods Lecture 8 April 15 - 17 2020 8 16
5 Trust region method
The trust-region subproblem Given gk and symmetric Bk
mind
f(xk) + gTk d+
1
2dTBkd st 983042d983042 le ∆k
where ∆k is the radius of the trust region in which the quadraticf(xk) + gT
k d+ 12d
TBkd ldquowellrdquo captures the true behavior of f
The solution dk of the subproblem satisfies the linear system
[Bk + λI]dk = minusgk for some λ ge 0
where λ is chosen such that Bk + λI is positive semidefinite andλ(983042dk983042 minus∆k) = 0 (Exercise [Sorensen etc])
Solving the subproblem thus reduces to a search for the value of λSpecialized methods have been devised [Sorensen etc]
Newton Methods Lecture 8 April 15 - 17 2020 9 16
The trust-region method procedure
Define the ratio ρk between the actual decrease in f and theamount of decrease in the quadratic objective
ρk =f(xk + dk)minus f(xk)1
2(dk)TBkd
k + gTk d
k
If ρk is at least greater than a small tolerance (eg 01) we acceptthe step and proceed to the next iteration Otherwise the trustregion radius ∆k is too large so we do not take the step shrink thetrust region and resolve the new subproblem to obtain a new step
If ρk is close to 1 and the bound 983042dk983042 le ∆k is active (ie983042dk983042 = ∆k) we conclude that a larger trust region may hastenprogress so we increase ∆k for the next iteration
Newton Methods Lecture 8 April 15 - 17 2020 10 16
51 Dogleg method for trust region subproblem
For large-scale problems it may be too expensive to solvetrust-region subproblem near-exactly since the process mayrequire several factorizations of Bk + λI for different values of λ
A popular approach for finding approximate solutions which canbe used when Bk is positive definite is the dogleg method
Newton Methods Lecture 8 April 15 - 17 2020 11 16
52 Trust-region Newton method
The subproblem
mind
f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d st 983042d983042 le ∆k
The trust-region Newton method can ldquoescaperdquo from a saddlepoint Suppose nablaf(xk) = 0 and nabla2f(xk) indefinite with somestrictly negative eigenvalues Then the solution dk of thesubproblem will be nonzero and the algorithm will step away fromthe saddle point xk in the direction of most negative curvature fornabla2f(xk) This guarantees that any accumulation points willsatisfy second-order necessary conditions
Another appealing feature of the trust-region Newton approach isthat when the sequence xk approaches a point x983183 satisfyingsecond-order sufficient conditions the trust region bound becomesinactive and the method takes basic Newton steps for allsufficiently large k so it has local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 12 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
2 DD + Newton for smooth strongly convex functions
If f is γ-strongly convex and nablaf is L-Lipschitz continuous thennabla2f(x) is positive definite and γI ≼ nabla2f(x) ≼ LI The Newtondirection
dk = minusnabla2f(xk)minus1nablaf(xk)
is a descent direction satisfying
nablaf(xk)Tdk le minus γ
L983042nablaf(xk)983042983042dk983042
The DD method using the Newton direction yields xk rarr x983183 wherex983183 is the (unique) global minimizer of f
Two stage method DD + Newton
Global sublinear convergence of DD is enhanced to the localquadratic convergence if we use αk = 1 whenever it satisfies theweak Wolfe conditions
DD + Newton global sublinear + local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 5 16
3 DD + Newton for smooth convex functions
If f is convex but not strongly convex and nablaf is L-Lipschitzcontinuous then nabla2f(x) may be singular for some x ie
0 ≼ nabla2f(x) ≼ LI
So the Newton direction may not be well defined
Consider the modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
which is a descent direction The DD method using the modifiedNewton direction yields xk rarr x983183 where x983183 is a minimizer of f
Two stage method DD + Newton
If the minimizer x983183 is unique and nabla2f(x983183) is positive definite thennabla2f(x983183) will be positive definite for sufficiently large k
DD + Newton global sublinear + local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 6 16
4 DD + Newton for smooth nonconvex functions
For smooth nonconvex f the Hessian nabla2f(xk) may be indefinitefor some k The Newton direction may not exist (when nabla2f(xk) issingular) or it may not be a descent direction (when nabla2f(xk) hasnegative eigenvalues) The modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
will be a descent direction for λk sufficiently large For given0 lt η lt 1 a sufficient condition is
λk + λmin(nabla2f(xk))
λk + Lge η
Two stage method DD + Newton
Once again if the DD iterates xk enter the neighborhood of a localsolution x983183 for which nabla2f(x983183) is positive definite some strategyfor choosing λk and αk recovers the local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 7 16
41 Other modified Newton directions
Modified Cholesky factorization For indefinite nabla2f(xk) by addingpositive elements if needed to avoid taking the square root of anegative number the factorization continues to proceed Using themodified factorization in place of nabla2f(xk) in the calculation of theNewton direction dk we obtain a new modified Newton direction
Given eigenvalue decomposition
nabla2f(xk) = QkΛkQTk
we can define a modified Newton direction
dk = minusQk983144Λminus1k QT
knablaf(xk)
where 983144Λk with positive diagonal entries is a modified version of Λk
For more modified Newton directions to ensure descent in a DDframework see Chapter 3 of NO
Newton Methods Lecture 8 April 15 - 17 2020 8 16
5 Trust region method
The trust-region subproblem Given gk and symmetric Bk
mind
f(xk) + gTk d+
1
2dTBkd st 983042d983042 le ∆k
where ∆k is the radius of the trust region in which the quadraticf(xk) + gT
k d+ 12d
TBkd ldquowellrdquo captures the true behavior of f
The solution dk of the subproblem satisfies the linear system
[Bk + λI]dk = minusgk for some λ ge 0
where λ is chosen such that Bk + λI is positive semidefinite andλ(983042dk983042 minus∆k) = 0 (Exercise [Sorensen etc])
Solving the subproblem thus reduces to a search for the value of λSpecialized methods have been devised [Sorensen etc]
Newton Methods Lecture 8 April 15 - 17 2020 9 16
The trust-region method procedure
Define the ratio ρk between the actual decrease in f and theamount of decrease in the quadratic objective
ρk =f(xk + dk)minus f(xk)1
2(dk)TBkd
k + gTk d
k
If ρk is at least greater than a small tolerance (eg 01) we acceptthe step and proceed to the next iteration Otherwise the trustregion radius ∆k is too large so we do not take the step shrink thetrust region and resolve the new subproblem to obtain a new step
If ρk is close to 1 and the bound 983042dk983042 le ∆k is active (ie983042dk983042 = ∆k) we conclude that a larger trust region may hastenprogress so we increase ∆k for the next iteration
Newton Methods Lecture 8 April 15 - 17 2020 10 16
51 Dogleg method for trust region subproblem
For large-scale problems it may be too expensive to solvetrust-region subproblem near-exactly since the process mayrequire several factorizations of Bk + λI for different values of λ
A popular approach for finding approximate solutions which canbe used when Bk is positive definite is the dogleg method
Newton Methods Lecture 8 April 15 - 17 2020 11 16
52 Trust-region Newton method
The subproblem
mind
f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d st 983042d983042 le ∆k
The trust-region Newton method can ldquoescaperdquo from a saddlepoint Suppose nablaf(xk) = 0 and nabla2f(xk) indefinite with somestrictly negative eigenvalues Then the solution dk of thesubproblem will be nonzero and the algorithm will step away fromthe saddle point xk in the direction of most negative curvature fornabla2f(xk) This guarantees that any accumulation points willsatisfy second-order necessary conditions
Another appealing feature of the trust-region Newton approach isthat when the sequence xk approaches a point x983183 satisfyingsecond-order sufficient conditions the trust region bound becomesinactive and the method takes basic Newton steps for allsufficiently large k so it has local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 12 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
3 DD + Newton for smooth convex functions
If f is convex but not strongly convex and nablaf is L-Lipschitzcontinuous then nabla2f(x) may be singular for some x ie
0 ≼ nabla2f(x) ≼ LI
So the Newton direction may not be well defined
Consider the modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
which is a descent direction The DD method using the modifiedNewton direction yields xk rarr x983183 where x983183 is a minimizer of f
Two stage method DD + Newton
If the minimizer x983183 is unique and nabla2f(x983183) is positive definite thennabla2f(x983183) will be positive definite for sufficiently large k
DD + Newton global sublinear + local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 6 16
4 DD + Newton for smooth nonconvex functions
For smooth nonconvex f the Hessian nabla2f(xk) may be indefinitefor some k The Newton direction may not exist (when nabla2f(xk) issingular) or it may not be a descent direction (when nabla2f(xk) hasnegative eigenvalues) The modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
will be a descent direction for λk sufficiently large For given0 lt η lt 1 a sufficient condition is
λk + λmin(nabla2f(xk))
λk + Lge η
Two stage method DD + Newton
Once again if the DD iterates xk enter the neighborhood of a localsolution x983183 for which nabla2f(x983183) is positive definite some strategyfor choosing λk and αk recovers the local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 7 16
41 Other modified Newton directions
Modified Cholesky factorization For indefinite nabla2f(xk) by addingpositive elements if needed to avoid taking the square root of anegative number the factorization continues to proceed Using themodified factorization in place of nabla2f(xk) in the calculation of theNewton direction dk we obtain a new modified Newton direction
Given eigenvalue decomposition
nabla2f(xk) = QkΛkQTk
we can define a modified Newton direction
dk = minusQk983144Λminus1k QT
knablaf(xk)
where 983144Λk with positive diagonal entries is a modified version of Λk
For more modified Newton directions to ensure descent in a DDframework see Chapter 3 of NO
Newton Methods Lecture 8 April 15 - 17 2020 8 16
5 Trust region method
The trust-region subproblem Given gk and symmetric Bk
mind
f(xk) + gTk d+
1
2dTBkd st 983042d983042 le ∆k
where ∆k is the radius of the trust region in which the quadraticf(xk) + gT
k d+ 12d
TBkd ldquowellrdquo captures the true behavior of f
The solution dk of the subproblem satisfies the linear system
[Bk + λI]dk = minusgk for some λ ge 0
where λ is chosen such that Bk + λI is positive semidefinite andλ(983042dk983042 minus∆k) = 0 (Exercise [Sorensen etc])
Solving the subproblem thus reduces to a search for the value of λSpecialized methods have been devised [Sorensen etc]
Newton Methods Lecture 8 April 15 - 17 2020 9 16
The trust-region method procedure
Define the ratio ρk between the actual decrease in f and theamount of decrease in the quadratic objective
ρk =f(xk + dk)minus f(xk)1
2(dk)TBkd
k + gTk d
k
If ρk is at least greater than a small tolerance (eg 01) we acceptthe step and proceed to the next iteration Otherwise the trustregion radius ∆k is too large so we do not take the step shrink thetrust region and resolve the new subproblem to obtain a new step
If ρk is close to 1 and the bound 983042dk983042 le ∆k is active (ie983042dk983042 = ∆k) we conclude that a larger trust region may hastenprogress so we increase ∆k for the next iteration
Newton Methods Lecture 8 April 15 - 17 2020 10 16
51 Dogleg method for trust region subproblem
For large-scale problems it may be too expensive to solvetrust-region subproblem near-exactly since the process mayrequire several factorizations of Bk + λI for different values of λ
A popular approach for finding approximate solutions which canbe used when Bk is positive definite is the dogleg method
Newton Methods Lecture 8 April 15 - 17 2020 11 16
52 Trust-region Newton method
The subproblem
mind
f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d st 983042d983042 le ∆k
The trust-region Newton method can ldquoescaperdquo from a saddlepoint Suppose nablaf(xk) = 0 and nabla2f(xk) indefinite with somestrictly negative eigenvalues Then the solution dk of thesubproblem will be nonzero and the algorithm will step away fromthe saddle point xk in the direction of most negative curvature fornabla2f(xk) This guarantees that any accumulation points willsatisfy second-order necessary conditions
Another appealing feature of the trust-region Newton approach isthat when the sequence xk approaches a point x983183 satisfyingsecond-order sufficient conditions the trust region bound becomesinactive and the method takes basic Newton steps for allsufficiently large k so it has local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 12 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
4 DD + Newton for smooth nonconvex functions
For smooth nonconvex f the Hessian nabla2f(xk) may be indefinitefor some k The Newton direction may not exist (when nabla2f(xk) issingular) or it may not be a descent direction (when nabla2f(xk) hasnegative eigenvalues) The modified Newton direction
dk = minus[nabla2f(xk) + λkI]minus1nablaf(xk)
will be a descent direction for λk sufficiently large For given0 lt η lt 1 a sufficient condition is
λk + λmin(nabla2f(xk))
λk + Lge η
Two stage method DD + Newton
Once again if the DD iterates xk enter the neighborhood of a localsolution x983183 for which nabla2f(x983183) is positive definite some strategyfor choosing λk and αk recovers the local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 7 16
41 Other modified Newton directions
Modified Cholesky factorization For indefinite nabla2f(xk) by addingpositive elements if needed to avoid taking the square root of anegative number the factorization continues to proceed Using themodified factorization in place of nabla2f(xk) in the calculation of theNewton direction dk we obtain a new modified Newton direction
Given eigenvalue decomposition
nabla2f(xk) = QkΛkQTk
we can define a modified Newton direction
dk = minusQk983144Λminus1k QT
knablaf(xk)
where 983144Λk with positive diagonal entries is a modified version of Λk
For more modified Newton directions to ensure descent in a DDframework see Chapter 3 of NO
Newton Methods Lecture 8 April 15 - 17 2020 8 16
5 Trust region method
The trust-region subproblem Given gk and symmetric Bk
mind
f(xk) + gTk d+
1
2dTBkd st 983042d983042 le ∆k
where ∆k is the radius of the trust region in which the quadraticf(xk) + gT
k d+ 12d
TBkd ldquowellrdquo captures the true behavior of f
The solution dk of the subproblem satisfies the linear system
[Bk + λI]dk = minusgk for some λ ge 0
where λ is chosen such that Bk + λI is positive semidefinite andλ(983042dk983042 minus∆k) = 0 (Exercise [Sorensen etc])
Solving the subproblem thus reduces to a search for the value of λSpecialized methods have been devised [Sorensen etc]
Newton Methods Lecture 8 April 15 - 17 2020 9 16
The trust-region method procedure
Define the ratio ρk between the actual decrease in f and theamount of decrease in the quadratic objective
ρk =f(xk + dk)minus f(xk)1
2(dk)TBkd
k + gTk d
k
If ρk is at least greater than a small tolerance (eg 01) we acceptthe step and proceed to the next iteration Otherwise the trustregion radius ∆k is too large so we do not take the step shrink thetrust region and resolve the new subproblem to obtain a new step
If ρk is close to 1 and the bound 983042dk983042 le ∆k is active (ie983042dk983042 = ∆k) we conclude that a larger trust region may hastenprogress so we increase ∆k for the next iteration
Newton Methods Lecture 8 April 15 - 17 2020 10 16
51 Dogleg method for trust region subproblem
For large-scale problems it may be too expensive to solvetrust-region subproblem near-exactly since the process mayrequire several factorizations of Bk + λI for different values of λ
A popular approach for finding approximate solutions which canbe used when Bk is positive definite is the dogleg method
Newton Methods Lecture 8 April 15 - 17 2020 11 16
52 Trust-region Newton method
The subproblem
mind
f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d st 983042d983042 le ∆k
The trust-region Newton method can ldquoescaperdquo from a saddlepoint Suppose nablaf(xk) = 0 and nabla2f(xk) indefinite with somestrictly negative eigenvalues Then the solution dk of thesubproblem will be nonzero and the algorithm will step away fromthe saddle point xk in the direction of most negative curvature fornabla2f(xk) This guarantees that any accumulation points willsatisfy second-order necessary conditions
Another appealing feature of the trust-region Newton approach isthat when the sequence xk approaches a point x983183 satisfyingsecond-order sufficient conditions the trust region bound becomesinactive and the method takes basic Newton steps for allsufficiently large k so it has local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 12 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
41 Other modified Newton directions
Modified Cholesky factorization For indefinite nabla2f(xk) by addingpositive elements if needed to avoid taking the square root of anegative number the factorization continues to proceed Using themodified factorization in place of nabla2f(xk) in the calculation of theNewton direction dk we obtain a new modified Newton direction
Given eigenvalue decomposition
nabla2f(xk) = QkΛkQTk
we can define a modified Newton direction
dk = minusQk983144Λminus1k QT
knablaf(xk)
where 983144Λk with positive diagonal entries is a modified version of Λk
For more modified Newton directions to ensure descent in a DDframework see Chapter 3 of NO
Newton Methods Lecture 8 April 15 - 17 2020 8 16
5 Trust region method
The trust-region subproblem Given gk and symmetric Bk
mind
f(xk) + gTk d+
1
2dTBkd st 983042d983042 le ∆k
where ∆k is the radius of the trust region in which the quadraticf(xk) + gT
k d+ 12d
TBkd ldquowellrdquo captures the true behavior of f
The solution dk of the subproblem satisfies the linear system
[Bk + λI]dk = minusgk for some λ ge 0
where λ is chosen such that Bk + λI is positive semidefinite andλ(983042dk983042 minus∆k) = 0 (Exercise [Sorensen etc])
Solving the subproblem thus reduces to a search for the value of λSpecialized methods have been devised [Sorensen etc]
Newton Methods Lecture 8 April 15 - 17 2020 9 16
The trust-region method procedure
Define the ratio ρk between the actual decrease in f and theamount of decrease in the quadratic objective
ρk =f(xk + dk)minus f(xk)1
2(dk)TBkd
k + gTk d
k
If ρk is at least greater than a small tolerance (eg 01) we acceptthe step and proceed to the next iteration Otherwise the trustregion radius ∆k is too large so we do not take the step shrink thetrust region and resolve the new subproblem to obtain a new step
If ρk is close to 1 and the bound 983042dk983042 le ∆k is active (ie983042dk983042 = ∆k) we conclude that a larger trust region may hastenprogress so we increase ∆k for the next iteration
Newton Methods Lecture 8 April 15 - 17 2020 10 16
51 Dogleg method for trust region subproblem
For large-scale problems it may be too expensive to solvetrust-region subproblem near-exactly since the process mayrequire several factorizations of Bk + λI for different values of λ
A popular approach for finding approximate solutions which canbe used when Bk is positive definite is the dogleg method
Newton Methods Lecture 8 April 15 - 17 2020 11 16
52 Trust-region Newton method
The subproblem
mind
f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d st 983042d983042 le ∆k
The trust-region Newton method can ldquoescaperdquo from a saddlepoint Suppose nablaf(xk) = 0 and nabla2f(xk) indefinite with somestrictly negative eigenvalues Then the solution dk of thesubproblem will be nonzero and the algorithm will step away fromthe saddle point xk in the direction of most negative curvature fornabla2f(xk) This guarantees that any accumulation points willsatisfy second-order necessary conditions
Another appealing feature of the trust-region Newton approach isthat when the sequence xk approaches a point x983183 satisfyingsecond-order sufficient conditions the trust region bound becomesinactive and the method takes basic Newton steps for allsufficiently large k so it has local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 12 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
5 Trust region method
The trust-region subproblem Given gk and symmetric Bk
mind
f(xk) + gTk d+
1
2dTBkd st 983042d983042 le ∆k
where ∆k is the radius of the trust region in which the quadraticf(xk) + gT
k d+ 12d
TBkd ldquowellrdquo captures the true behavior of f
The solution dk of the subproblem satisfies the linear system
[Bk + λI]dk = minusgk for some λ ge 0
where λ is chosen such that Bk + λI is positive semidefinite andλ(983042dk983042 minus∆k) = 0 (Exercise [Sorensen etc])
Solving the subproblem thus reduces to a search for the value of λSpecialized methods have been devised [Sorensen etc]
Newton Methods Lecture 8 April 15 - 17 2020 9 16
The trust-region method procedure
Define the ratio ρk between the actual decrease in f and theamount of decrease in the quadratic objective
ρk =f(xk + dk)minus f(xk)1
2(dk)TBkd
k + gTk d
k
If ρk is at least greater than a small tolerance (eg 01) we acceptthe step and proceed to the next iteration Otherwise the trustregion radius ∆k is too large so we do not take the step shrink thetrust region and resolve the new subproblem to obtain a new step
If ρk is close to 1 and the bound 983042dk983042 le ∆k is active (ie983042dk983042 = ∆k) we conclude that a larger trust region may hastenprogress so we increase ∆k for the next iteration
Newton Methods Lecture 8 April 15 - 17 2020 10 16
51 Dogleg method for trust region subproblem
For large-scale problems it may be too expensive to solvetrust-region subproblem near-exactly since the process mayrequire several factorizations of Bk + λI for different values of λ
A popular approach for finding approximate solutions which canbe used when Bk is positive definite is the dogleg method
Newton Methods Lecture 8 April 15 - 17 2020 11 16
52 Trust-region Newton method
The subproblem
mind
f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d st 983042d983042 le ∆k
The trust-region Newton method can ldquoescaperdquo from a saddlepoint Suppose nablaf(xk) = 0 and nabla2f(xk) indefinite with somestrictly negative eigenvalues Then the solution dk of thesubproblem will be nonzero and the algorithm will step away fromthe saddle point xk in the direction of most negative curvature fornabla2f(xk) This guarantees that any accumulation points willsatisfy second-order necessary conditions
Another appealing feature of the trust-region Newton approach isthat when the sequence xk approaches a point x983183 satisfyingsecond-order sufficient conditions the trust region bound becomesinactive and the method takes basic Newton steps for allsufficiently large k so it has local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 12 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
The trust-region method procedure
Define the ratio ρk between the actual decrease in f and theamount of decrease in the quadratic objective
ρk =f(xk + dk)minus f(xk)1
2(dk)TBkd
k + gTk d
k
If ρk is at least greater than a small tolerance (eg 01) we acceptthe step and proceed to the next iteration Otherwise the trustregion radius ∆k is too large so we do not take the step shrink thetrust region and resolve the new subproblem to obtain a new step
If ρk is close to 1 and the bound 983042dk983042 le ∆k is active (ie983042dk983042 = ∆k) we conclude that a larger trust region may hastenprogress so we increase ∆k for the next iteration
Newton Methods Lecture 8 April 15 - 17 2020 10 16
51 Dogleg method for trust region subproblem
For large-scale problems it may be too expensive to solvetrust-region subproblem near-exactly since the process mayrequire several factorizations of Bk + λI for different values of λ
A popular approach for finding approximate solutions which canbe used when Bk is positive definite is the dogleg method
Newton Methods Lecture 8 April 15 - 17 2020 11 16
52 Trust-region Newton method
The subproblem
mind
f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d st 983042d983042 le ∆k
The trust-region Newton method can ldquoescaperdquo from a saddlepoint Suppose nablaf(xk) = 0 and nabla2f(xk) indefinite with somestrictly negative eigenvalues Then the solution dk of thesubproblem will be nonzero and the algorithm will step away fromthe saddle point xk in the direction of most negative curvature fornabla2f(xk) This guarantees that any accumulation points willsatisfy second-order necessary conditions
Another appealing feature of the trust-region Newton approach isthat when the sequence xk approaches a point x983183 satisfyingsecond-order sufficient conditions the trust region bound becomesinactive and the method takes basic Newton steps for allsufficiently large k so it has local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 12 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
51 Dogleg method for trust region subproblem
For large-scale problems it may be too expensive to solvetrust-region subproblem near-exactly since the process mayrequire several factorizations of Bk + λI for different values of λ
A popular approach for finding approximate solutions which canbe used when Bk is positive definite is the dogleg method
Newton Methods Lecture 8 April 15 - 17 2020 11 16
52 Trust-region Newton method
The subproblem
mind
f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d st 983042d983042 le ∆k
The trust-region Newton method can ldquoescaperdquo from a saddlepoint Suppose nablaf(xk) = 0 and nabla2f(xk) indefinite with somestrictly negative eigenvalues Then the solution dk of thesubproblem will be nonzero and the algorithm will step away fromthe saddle point xk in the direction of most negative curvature fornabla2f(xk) This guarantees that any accumulation points willsatisfy second-order necessary conditions
Another appealing feature of the trust-region Newton approach isthat when the sequence xk approaches a point x983183 satisfyingsecond-order sufficient conditions the trust region bound becomesinactive and the method takes basic Newton steps for allsufficiently large k so it has local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 12 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
52 Trust-region Newton method
The subproblem
mind
f(xk) +nablaf(xk)Td+1
2dTnabla2f(xk)d st 983042d983042 le ∆k
The trust-region Newton method can ldquoescaperdquo from a saddlepoint Suppose nablaf(xk) = 0 and nabla2f(xk) indefinite with somestrictly negative eigenvalues Then the solution dk of thesubproblem will be nonzero and the algorithm will step away fromthe saddle point xk in the direction of most negative curvature fornabla2f(xk) This guarantees that any accumulation points willsatisfy second-order necessary conditions
Another appealing feature of the trust-region Newton approach isthat when the sequence xk approaches a point x983183 satisfyingsecond-order sufficient conditions the trust region bound becomesinactive and the method takes basic Newton steps for allsufficiently large k so it has local quadratic convergence
Newton Methods Lecture 8 April 15 - 17 2020 12 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
53 Difference between line-search and trust-region methods
The basic difference between line-search and trust-region methodscan be summarized as follows
Line-search methods first choose a direction dk then decide howfar to move along that direction
Trust-region methods do the opposite They choose the distance∆k first then find the direction that makes the best progress forthis step length
6 Cubic regularization approachs
Assume that 983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042 Then
TM (zx) = f(x) +nablaf(x)T(zminus x)
+1
2(zminus x)Tnabla2f(x)(zminus x) +
M
6983042zminus x9830423
ge f(z)
Newton Methods Lecture 8 April 15 - 17 2020 13 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
Approach I the basic cubic regularization algorithm
xk+1 = argminz
TM (zxk) k = 0 1 2
Approach II Seek 983141x approximately satisfying second-ordernecessary conditions that is
983042nablaf(983141x)983042 le εg λmin(nabla2f(983141x)) ge minusεH
where εg and εH are two small positive constants
Assume nabla2f is M -Lipschitz continuous
983042nabla2f(x)minusnabla2f(y)983042 le M983042xminus y983042
nablaf is L-Lipschitz continuous
983042nablaf(x)minusnablaf(y)983042 le L983042xminus y983042
and f is lower-bounded f(x) ge f
Newton Methods Lecture 8 April 15 - 17 2020 14 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
(i) If 983042nablaf(xk)983042 gt εg set
xk+1 = xk minus 1
Lnablaf(xk)
(ii) If 983042nablaf(xk)983042 le εg and λmin(nabla2f(xk)) lt minusεH choose dk to be theeigenvector corresponding to λmin(nabla2f(xk)) Choose the size andsign of dk such that
983042dk983042 = 1
andnablaf(xk)Tdk le 0
Set
xk+1 = xk + αkdk where αk =
2εHM
(iii) If neither of these conditions hold then xk satisfies theapproximate second-order necessary conditions so we terminate
Newton Methods Lecture 8 April 15 - 17 2020 15 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16
For the steepest-descent step (i)
f(xk+1) le f(xk)minus 1
2L983042nablaf(xk)9830422 le f(xk)minus
ε2g2L
For a step of type (ii)
f(xk+1) 983249 f(xk) + αknablaf(xk)⊤dk
+1
2α2k(d
k)Tnabla2f(xk)dk +1
6Mα3
k983042dk9830423
983249 f(xk)minus 1
2
9830612εHM
9830622
εH +1
6M
9830612εHM
9830623
= f(xk)minus 2
3
ε3HM2
We attain a decrease in the objective of at least
min
983075ε2g2L
2
3
ε3HM2
983076
Newton Methods Lecture 8 April 15 - 17 2020 16 16