“on sizing and shifting the bfgs update within the sized-broyden family of secant updates”...
TRANSCRIPT
![Page 1: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/1.jpg)
“On Sizing and Shifting The BFGS Update Within The Sized-Broyden
Family of Secant Updates”
Richard Tapia (Joint work with H. Yabe and H.J. Martinez)
Rice University
Department of Computational and Applied MathematicsCenter for Excellence and Equity in Education
Berkeley, CaliforniaSeptember 23, 2005
![Page 2: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/2.jpg)
2
PreliminariesThe Problem: f(x)
x min RRf n : I
“Equivalently”:
Time Honored Work-Horse Methods
(Cauchy 1847) Gradient Method (steepest descent)
0 )( xfxx
.0)( xf
I
![Page 3: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/3.jpg)
3
Equation)(Newton )()(2 xfsxf
( 1700’s) Newton’s Method
sxx
where
![Page 4: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/4.jpg)
4
Characteristics Gradient Method:
Inexpensive Good global properties Slow local convergence
Newton’s Method Expensive per iteration Poor global properties, excellent local properties Fast local convergence
)( 3nO
![Page 5: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/5.jpg)
5
The Middle Ground and The Algorithm of Interest
Secant Methodssxx
where )(xfBs
secant equation ysB
where )()( xfsxfy
Remark: We view B as an approximation of ).(2 xf
Characteristics: Similar properties as Newton’s Method, but not as expensive,
per iteration
(B+ new approximation)
)( 2nO
![Page 6: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/6.jpg)
6
History/Chronology In one dimension (n=1) the secant equation
uniquely gives
as an approximation to The 1-dimension 2-point secant method was probably discovered in the middle of the 18th century. It is extremely effective and efficient and has a convergence rate of (Golden mean)
xx
xfxfB
).( xf
2
51
![Page 7: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/7.jpg)
7
Gauss formulated a 3-point secant method for two dimensions
There was considerable research activity on (n+1)-point secant methods in the 1960’s. While these methods had good theoretical properties they were numerical failures. The iterates tend to cluster in a lower dimensional manifold, and lead to linear systems that are ill-conditioned and nearly singular. These (n+1)-point secant methods have been discarded.
![Page 8: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/8.jpg)
8
The New Generation of Secant Methods (Variable Metric or Quasi-Newton Methods) DFP Davidon-Fletcher-Powell Davidon(1958)
Fletcher-Powell 1963 DFP was the work-horse secant method from 1963-1970
in spite of the serious numerical flaw that the diagonal of the approximating matrices approached zero (excessively small eigenvalues). This required restarts using the identity as a Hessian approximation.
BFGS (1970) Broyden-Fletcher-Goldfarb-Shanno A new secant update that does not generate excessively
small eigenvalues BFGS has become the secant method of choice
based on numerical performance In some cases BFGS is not effective and generates
approximations with excessively large eigenvalues.
![Page 9: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/9.jpg)
9
Broyden Family of Secant Updates (1970)
Write sy
yy
Bss
BBssBysBBFGS
T
T
T
T
,,
Broyden Family TvvysBBFGSB ,,
where parameter andR
Bss
Bs
sy
yBssv
TTT
I
![Page 10: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/10.jpg)
10
1963 DFP promotes small eigenvalues 1970 BFGS may promote large
eigenvalues Convex class Preconvex class
10
1,00
![Page 11: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/11.jpg)
11
Two Interesting Research Ideas That We Build On John Dennis (1972)
Notion of least change secant update
Choose in the Broyden class so that is closest to B in a weighted Frobenius norm. In this case we can explain BFGS and DFP.
Oren-Luenberger (1974) (SSVM)Size the matrix B before updating
where
B
BB OL
Bss
syT
T
OL
![Page 12: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/12.jpg)
12
Terminology Def:
(i) A and B are said to be relatively sized if
(ii) sizes B relative to A if and A are relatively sized
Proposition: sizes B relative to A
BSpectrumASpectrum R I B
Bvv
vv
uu
Auu
vu
T
T
T
T
satisfying ,
![Page 13: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/13.jpg)
13
Corollary: For any u
Def: sizes B relative to the Hessian of f if there exists x such that sizes B relative to
Buu
AuuT
T
sizes B relative to A
).(2 xf
![Page 14: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/14.jpg)
14
Historical Background on Sizing 1974 Oren-Luenberger (SSVM)
size at each iteration with
Proposition: sizes B relative to the Hessian of f
Proof:
Bss
syT
T
OL
OL
dsxfsxfsxfsy TTT )(2
![Page 15: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/15.jpg)
15
1978 Shanno-Phua
Observation: Secant equation implies
Therefore all secant updates are
sized
relative to the Hessian of f. Suggestion: Size only initial approximation in
BFGS secant method and do so using
.1sBs
syT
T
ysB
.OL
![Page 16: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/16.jpg)
16
Question? Effectiveness of Effective sizing strategy
Initial approximation only? All approximations? Selective approximations?
OL
![Page 17: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/17.jpg)
17
M. Contreras and R. Tapia (1993)“Sizing The DFP and BFGS Updates: A Numerical Study”
Propositions: If the secant method converges q-superlinearly, then converges to one.
Selective sizing: size ifOL
21 1 OL 0, 21
![Page 18: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/18.jpg)
18
Contreras-Tapia Findings The DFP update loves to be sized by
Sizing at every iteration is only slightly inferior to selective sizing. Without sizing DFP is vastly inferior to BFGS. With selective sizing competitive with a selectively sized BFGS.
When sizing is working, converges very nicely to one.
Selective sizing for BFGS is best, sizing at each iteration is not good; it does not like to be sized.
is not a real good fit with BFGS. It tends to size too much especially for large dimensional problems.
.OL
OL
OL
![Page 19: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/19.jpg)
19
New ResearchYabe-Martinez-Tapia (2004) Premise:
For BFGS, especially in higher dimensions, B often has large eigenvalues (indeed by design) and this tends to give large Rayleigh quotients
Hence is small and this in turn moves in the direction of singularity.
Bsssy TTOL .ssBss TT
BOL
![Page 20: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/20.jpg)
20
Idea:
Follow sizing with with shift within the Broyden class to compensate for near singularity.
Sized Broyden class
set and then find best OL
OL
TysBBFGSB ,,
![Page 21: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/21.jpg)
21
Byrd-Nocedal (1989)
A General Measure of Goodness
Proposition:
The measure ω is globally and uniquely minimized by A = I over the class of symmetric positive definite matrices
Size and Shift Approach
Consider choices of the parameters and determined from the minimization problem
)det(ln)()( AATRA
![Page 22: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/22.jpg)
22
where
and D is a symmetric positive definite weighting matrix.
Observe that solves this problem; if is not restricted to the sized Broyden class
m in D B D
12
12
TysBBFGSB ,,
DB B
![Page 23: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/23.jpg)
23
Obvious choices for D D = I Obtain member of sized Broyden class
closest to the identity – Gradient flavored
D = B Obtain member of sized Broyden class closest to D – least-change
secant flavored Obtain member of sized Broyden class
closest to the Hessian – Newton flavored
)(2 xfD
![Page 24: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/24.jpg)
24
Three Optimization ProblemsI. Given find as solution of
II. Given find as solution of
III. Find and as solution of
2
1
2
1
min DBDw
* *
**
**
2
1
2
1
min DBDw
2
1
2
1
,min DBDw
![Page 25: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/25.jpg)
25
Solutions Problem I: Given
where
Observation: For D = B
Interpretation: In least change sense (no sizing) implies (BFGS).
1
11
*
vDvT
Bss
Bs
sy
yBssv
sy
yByBss
TTT
T
TT
2
1
.1
1*
10*
![Page 26: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/26.jpg)
26
Problem II: Given
)1()1(2
1* n
where
)2)(1()1()1( 2
2
1
121
21
nrn
sy
yByBss
Bss
BsBDsDBDTR
T
TT
T
T
![Page 27: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/27.jpg)
27
Observation: For D = B
1
2)1()1)1()1(1
2
1 2*
n
nr
Hence implies0 1*
Interpretation: In least-change sense BFGS should not be sized.
![Page 28: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/28.jpg)
28
Problem III: Find both and from minimization problem
Observation: For D = B
Interpretation: In least change sense BFGS with no sizing is best.
* *
1
21
*
vDv
nT
1
1 *
1*
vDvT
1* 0*
![Page 29: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/29.jpg)
29
Numerical Experimentation Selectively size BFGS using Shift using solution obtained with D = I
(gradient flavored), D = B (least changed flavored), and (Newton flavored)
OL
)(2 xfD
![Page 30: “On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice](https://reader034.vdocuments.net/reader034/viewer/2022052702/56649f385503460f94c5516a/html5/thumbnails/30.jpg)
30
SurpriseThe winner is D = I (gradient flavored) Comment: There is consistency in this choice. Our
sizing indicator has told us that we should size; hence BFGS is probably not best and we should shift Either B is bad, is bad, or there is a bad match
between the two. Therefore least change D = B may be dangerous and Newton may be dangerous. The choice D = I prevents this faulty information from further contaminating the update; i.e. we use the member of the Broyden class which is closest to steepest descent.
)(2 xf
)(2 xfD