![Page 1: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/1.jpg)
Nonlinear Optimization Theory and Practice
by
Asim Karim Computer Science Dept.
Lahore University of Management Sciences
2nd International Bhurban Conference on Applied Sciences and Technology
Control and Simulation (June 19 – 21, 2003)
![Page 2: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/2.jpg)
Optimization
What is optimization? Finding solution(s) from a set of admissible or feasible solutions that minimizes (or maximizes) a performance measure or objective
Examples Engineering design: find the cross-sectional dimensions of a beam that results in the least weight structure
Resource management: find the optimal distribution of resources to accomplish a task in least time
Machine control: find the policy for injecting fuel that leads to lest fuel consumption
Traveling salesperson problem: find the path through a given set of locations that has the shortest distance
Optimization is a very powerful concept. Many problems in different fields can be posed as optimization problems.
![Page 3: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/3.jpg)
Types of Optimizations
Two basic classes of optimization problems Static: decision variables do not vary over time Dynamic: decision variables vary over time, and optimal solutions are time-paths or trajectories rather than single values
Other classifications Linear and nonlinear: if any non-linearity exists in the problem, then it is a nonlinear optimization problem; otherwise, it is a linear optimization problem
Unconstrained and constrained: if the variables are unrestricted, then it is an unconstrained optimization problem; otherwise, it is a constrained optimization problem
![Page 4: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/4.jpg)
Nonlinear Optimization
Nonlinear optimization theory includes as a special case the linear optimization problem. The basic concepts of static optimization are similar to those of dynamic optimization.
Solution methods Mathematical: These methods are based on calculus and geometry. Collectively, these methods are known as nonlinear programming techniques.
Heuristic: These methods are based on search heuristics. Examples include genetic algorithms and simulated annealing.
We will be focusing on nonlinear programming – that is, mathematical methods for solving static nonlinear optimization problems
![Page 5: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/5.jpg)
Optimization Theory
Two questions Existence: do local/global minima exist? Optimality conditions: what are the properties or characteristics of local/global minima?
Does f(x) = x has a local minimum?. What about f(x) = exp(x)? We won’t be focusing on existence of optimal solutions. Optimality conditions are used in algorithms for solving optimization problems.
![Page 6: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/6.jpg)
Criteria for Characterizing Solution Methods
Rate of convergence Stability of convergence Search for minima (local or global?) Computational efficiency and scalability Memory usage and scalability Other requirements (continuous differentiability, twice continuous differentiability, etc)
![Page 7: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/7.jpg)
Unconstrained Optimization
Definition Minimize: )(xf subject to nR∈x Rn is the n-dimensional space of real numbers (Euclidean space).
Local and Global Minimum Vector x* is a local minimum of f if there exists 0>ε such that ε<−∀≤ * with );(*)( xxxxx ff Vector x* is a global minimum if nR );(*)( ∈∀≤ xxx ff
![Page 8: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/8.jpg)
Local and Global Minima
![Page 9: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/9.jpg)
Optimality Conditions
Assuming f is continuously differentiable:
Necessary conditions 0*)( =∇ xf (first order necessary condition 0*)(2 ≥∇ xf (i.e. positive semi-definite; second order necessary
conditions
Sufficient condition 0*)(2 >∇ xf (i.e. positive definite)
Special cases Convex or quadratic function: first order necessary condition is also sufficient. Moreover, the stationary point is a global minimum.
![Page 10: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/10.jpg)
Optimality Conditions - Example
![Page 11: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/11.jpg)
Gradient Methods (1)
Gradient methods Method of steepest descent (and its variations) Newton’s method (and its variations) Quasi-Newton’s method
Basic strategy and equations These methods involve iterative descent such that
,...2,1 )()( 1 =<+ kff kk xx Update rule: kkkk dxx α+=+1 such that 0)( <∇ kTkf dx
![Page 12: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/12.jpg)
Gradient Methods (2)
The different methods vary in their choice of dk. The stepsize αk is determined by a line search technique.
![Page 13: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/13.jpg)
Method of Steepest Descent (1)
Direction vector )( kk f xd −∇=
This method is often slow to converge.
![Page 14: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/14.jpg)
Method of Steepest Descent (2)
Scaled steepest descent )( kkk f xDd ∇−=
where Dk is a diagonal matrix used to scale the gradient vector. Usually, the diagonal element i in Dk is computed as the inverse of the second order partial derivative of f with xi (an approximation to the Newton’s method). This method converges faster than the method of steepest descent.
![Page 15: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/15.jpg)
Newton’s Method (1)
Direction vector [ ] )()(
12 kkk ff xxd ∇∇−= −
assuming the Hessian is positive definite.
When αk = 1, then it is known as pure Newton’s method. However, the pure method has some major drawbacks (can you identify some?)
Faster convergence (see figure on next slide), but computationally expensive (Hessian computation)
A variation to reduce computational complexity To reduce computation expense of the Hessian, the modified Newton’s method computes the Hessian every p > 1 iterations, instead of every iteration.
![Page 16: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/16.jpg)
Newton’s Method (2)
To ensure global convergence (a drawback of pure method) Use the steepest descent direction vector whenever the Hessian is negative or undefined.
![Page 17: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/17.jpg)
Quasi-Newton Methods
Direction vector )( kkk f xDd ∇−=
where Dk is a positive definite matrix selected such that it approximates the Newton direction. A popular way to compute Dk is
TkkkkkkTk
kTkkk
kTk
Tkkkk τ ))((
)()(
))((
)()(
))((1 υυqDq
DqqDqp
ppDD ξ+−+=+
where
k
kk
kTk
kk
τqD
qppυ −=)(
; )()( kkTkk qDq=τ ; 10 ≤≤ kξ ; kkk xxp −= +1 ;
)()( 1 kkk ff xxq ∇−∇= + When ξ k = 0 (for all k), then it is known as DFP method When ξ k = 0 (for all k), then it is known as BFGS method (popular)
![Page 18: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/18.jpg)
Conjugate Gradient Method
Iterative improvement kkkk dxx α+=+1
where the direction vectors dk (k = 0, 1, …) are Q-conjugate.(Q is a positive definite matrix)
The directions dk are computed by the Gram-Shmidt method.
)( 00 xd f−∇=
111 )()(
)()()( −
−− ∇∇∇∇+−∇= k
kTk
kTkkk
xff
fff d
xxx
xd
Conjugate gradient method and its variations are popular approaches for unconstrained optimization and solving linear system of equations.
![Page 19: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/19.jpg)
Stepsize Selection Methods
Importance In practice, the choice of the stepsize αk significantly affects the rate of convergence, stability and computational efficiency of iterative direction methods.
If αk is too small, convergence may be very slow If αk is too large, convergence may not be smooth (divergence) Exact computation can be expensive
Common methods Constant stepsize Line minimization Armijo rule Goldstein rule
![Page 20: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/20.jpg)
Line Minimization
The stepsize αk is such that it minimizes f(x) along dk, that is
)(min)( kkkkk ff dxdx αα +=+ Usually ],0[ s∈α where s > 0 to reduce computation (method known as limited line minimization) The bisection or Newton-Raphson methods are used for this minimization (these are line or 1-D optimization algorithms)
Disadvantage It is computationally expensive, requiring the solution of a sub-optimization problem in each iteration.
![Page 21: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/21.jpg)
Armijo Rule
The stepsize skmk βα = is determined by a successive reduction process where mk is the first non-negative integer for which
kTkmkmkk fssff dxdxx )()()( ∇−≥+− σββ
Procedure Select a value for s, )1,0(),1,0( ∈∈ σβ Set m0 = 0 Evaluate the inequality. If it is satisfied, skmk βα = ; otherwise, increment m and repeat the evaluation
Usually σ is chosen close to zero and β is between ½ and 1/10. If dk is scaled then s = 1 is an appropriate choice.
![Page 22: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/22.jpg)
Comparison of Methods
Steepest descent Newton’s Quasi-Newton CG Slow convergence
Fastest Fast Fast
Computationally less expensive
HIgh High/Moderate Moderate
Needs once differentiability
Twice differentiability
Once differentiability
Once differentiability
Suitable for less complex problems
Well-defined problems
Complex problems
Complex problems
Suitable for small scale problems
Small to medium
Medium Medium and large
Hard to parallelize
Hardest Easier Easier
![Page 23: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/23.jpg)
Constrained Optimization
Definition Minimize: )(xf subject to nRC ⊂∈x C is the constraint set, an n-dimensional subspace of real numbers. C is defined by Iihi ,...,1 0)( ==x (equality constraints) Jjg j ,...,1 0)( =≤x (inequality constraints)
Local and Global Minimum Vector C∈*x is a local minimum of f over C if there exists 0>ε such that ε<−∈∀≤ * with );(*)( xxxxx Cff Vector C∈*x is a global minimum if C );(*)( ∈∀≤ xxx ff
![Page 24: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/24.jpg)
Optimality Conditions (1)
Assuming f is continuously differentiable:
Necessary conditions Cf T ∈∀≥−∇ xxxx 0*)(*)(
If f is convex over C, then the above is also sufficient for optimality.
If f and constraint set C are convex then local minimum x* is also a global minimum.
The solution methods based on these optimality conditions are similar to those for unconstrained problems (feasible directions methods).
![Page 25: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/25.jpg)
Geometric Interpretation
![Page 26: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/26.jpg)
Optimality Conditions (2)
Karush-Kuhn-Tucker Necessary Condition If C∈*x is a local minimum of f over C, then there exist Lagrange multiplier vectors ),...,(* 11 λλ=λ and ),...,(* 1 Jµµ=µ , such that 0*)*,*,( =∇ µλxLx where Jjj ,...,1 0* =≥µ
0* =jµ for j when constraint j is inactive at x* Lagrangian function
∑∑==
++=J
jjj
I
iii ghfL
11
)()()(),,( xxxµλx µλ
![Page 27: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/27.jpg)
Example
Minimize f(x) = x1 + x2 Subject to: x1
2 + x22 = 2
Lagrangian
)()(),( xxx hfλL λ+= At the local optimum x*, the KKT condition must be satisfied
0*)(*)( =∇+∇ xx hf λ What is the value of λ?
![Page 28: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/28.jpg)
![Page 29: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/29.jpg)
Barrier and Interior Point Methods
Constrained problem is converted to a sequence of unconstrained problems which involve an added high cost for approaching the boundary of the feasible region.
Barrier and interior point methods are used for inequality constrained problems.
Minimize )()()( xxx BfFB += Barrier function
∑=
−−=J
jjgB
1
)}(ln{)( xx (logarithmic)
∑=
−=J
j jgB
1 )(
1)(
xx (inverse)
![Page 30: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/30.jpg)
Barrier Method – Geometrical Interpretation
![Page 31: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/31.jpg)
Penalty Method
Constrained problem is converted to a sequence of unconstrained problems which involve an added high cost for infeasibility.
A penalty parameter or function is used to penalize violation of constraints.
[ ] [ ]
++= ∑∑
=
+
=
J
jj
I
ii
nP gh
rfF
1
2
1
2 )()(2
)()( xxxx
where [ ])(,0max)( xx jj gg =+ and rn is a penalty parameter
These approaches are also known as SUMT, sequential unconstrained minimization technique (any unconstrained algorithm may be used)
Often the penalized function is an augmented Lagrangian function to improve convergence.
![Page 32: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/32.jpg)
Optimal Control (1)
Optimal control problems are dynamic optimization problems
Definition (discrete-time optimal control)
Minimize ∑−
=+=
1
0
),()(N
iiiiNN ggJ uxx
Subject to 1,...,0 );,(1 −==+ Nif iiii uxx (system equation)
NiRX nii ,...,1 ; =⊂∈x (state vector forming a trajectory)
1...,0 ; −=⊂∈ NiRU miiu (control vector forming a trajectory)
x0: given The system equation specifies uniquely the state trajectory which corresponds to a given control trajectory.
![Page 33: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/33.jpg)
Optimal Control (2)
Given a control trajectory u = (u0, u1,…,uN-1), the state trajectory is uniquely determined by the system equation fi (i = 1, N). Equivalently, we can write
Nix ii ,...,2,1 );( == uφ where iφ is determined from fi.
Simplified definition
Minimize ∑−
=+=
1
0
)),(())(()(N
iiiiNN ggJ uuuu φφ
Subject to NiRX n
ii ,...,1 ; =⊂∈x 1...,0 ; −=⊂∈ NiRU m
iiu x0: given The optimal solution u* = (u0*, u1*…,uN-1*) can be found by any of the nonlinear optimization methods.
![Page 34: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/34.jpg)
Nonlinear Optimization Algorithm (1)
1. Select x0 (i.e. set decision/control variables); set k = 0 2. For constrained problems, set the initial penalty r0 to a suitable
small value. Choose a penalty update rule (e.g. rk = rk-1 * 1.75) 3. For constrained problems, formulate the equivalent
unconstrained objective function (using penalty method) 4. Find the new vector xk+1
a. Compute the direction vector dk b. Compute the stepsize αk c. Update kkkk dxx α+=+1 d. For constrained problem, repeat steps a to c until convergence
is achieved within reasonable (large) tolerance 5. If stopping criteria is satisfied, stop. xk is the optimum solution
and f(xk) the optimum objective value 6. if stopping criteria is not satisfied, update k = k + 1, update rk+1
and go to step 3.
![Page 35: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/35.jpg)
Nonlinear Optimization Algorithm (2)
Stopping criteria
ε<− −
)(
)()( 1
k
kk
f
ff
xxx
ε<− −
k
kk
xxx 1
where ε is a small positive number.
Calculating gradients The gradients are typically computed by the finite difference method in practice. This procedure is general and does not require explicit expressions for the gradient functions (which are often not available in practice – implicit equations)
![Page 36: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/36.jpg)
Practical Guidelines (1)
General Nonlinear programming is computationally challenging. Theoretical results often do not translate into practical behavior. This is because of the discrete nature of digital computations.
Each problem should be considered from modeling to implementation independently from others.
Experimentation can yield insights that can be used to tune the methods for improved efficiency and performance.
Large scale problems require additional care in design and implementation.
Real world problems often do not possess many properties assumed during theoretical analyses (e.g. continuously differentiable functions, etc)
![Page 37: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/37.jpg)
Practical Guidelines (2)
Categories Mathematical modeling/problem formulation Scaling Validation Method selection Large scale problems High performance and parallel implementation
![Page 38: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/38.jpg)
Mathematical Modeling/Problem Formulation (1)
Key questions What are the objectives of the optimization? Is an accurate mathematical model of the problem available? Is reliable data available?
Goal: the simplest mathematical model consistent with the objectives and accuracy of available data and models
Specific decisions Objective function? Number and type of variables? Number and type of constraints?
![Page 39: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/39.jpg)
Mathematical Modeling/Problem Formulation (2)
Some guidelines If there are more than one objective functions, embody all but one into the constraints set. (otherwise, a multi-objective optimization technique has to be used)
If unsure of design and implementation decisions, start with a simple formulation and study the results of the optimization before modifying it
Two rules of thumb: (1) convex objective and constraints sets are preferable to non-convex ones; (2) Linear and simple nonlinear functions are preferable
Converting nonlinear functions to piecewise linear ones is generally NOT preferable, as this increases the number of variables and distorts the physical understanding of the problem
Converting integer variables to continuous ones may be done Making objective and constraint functions differential may be done
![Page 40: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/40.jpg)
Scaling
Variables should be scaled so that their values are neither too large nor too small relative to one another.
Benefits Controls round-off errors Improves the conditioning of the problem Often improves convergence
Example: suppose x ranges between a and b. It is scaled to [-1, 1] by
[ ]
[ ]2/)(
2/)(
ab
bax
−+−
![Page 41: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/41.jpg)
Method Selection
A comparison of several methods was presented on a previous slide
Some further considerations Is a one-time solution sought or the problem has to be solved many times (for varying parameters)? In the latter case, efficiency and accuracy is important.
For complex nonlinear problems with non-convex sets, simpler methods like the method of steepest descent are preferable.
If parallel implementation is desired then the conjugate-gradient method is preferable.
The choice of the stepsize rule has a significant impact on efficiency and accuracy.
![Page 42: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/42.jpg)
Validation
It is essential that the solution obtained is validated as correct. There are no fixed procedures for this; each problem has to be considered separately.
Some useful strategies If an optimal solution is known, then a close solution from a method indicates correctness
Run the algorithm with several different starting values to see if it converges to the same optimum solution
Vary parameters of the problem and correlate the behavior of the solution to the physical understanding of the problem.
Solve the problem using different methods
![Page 43: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/43.jpg)
Example – Min. Weight Design of Cold-Formed Steel Beam
Objective of the optimization problem To develop parameterized minimum weight design curves for cold-formed steel hat-shape beams
![Page 44: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/44.jpg)
Example (2)
Problem definition Min. [ ]dbLtf 22 += µ Subject to: the constraints of the building code (AISI) Variables of the problem are t, b, and d only; others are parameters. The code specified constraints are complex, nonlinear, and implicit. Some equations are not continuously differentiable.
Scaling No scaling is needed since the code equations are based on the ratios b/t and d/t
Method Selection The method of steepest descent (with scaling) is most appropriate because it can be followed and understood, and hence, tuned to give good solutions.
![Page 45: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/45.jpg)
Example (3)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
Span length (m)
Thi
ckne
ss, t
(m
m)
b
d t
b/2 b/2
q (KN/m)20
15
10
7.55
2.5
Fy = 345 N/mm^2Unbraced
Validation The solution is validated by comparing with optimal solutions found by other algorithms and by parametric behavior of the solution.
![Page 46: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/46.jpg)
Large Scale Problems (1)
What is large-scale? There are no hard and fast rules. One criteria is
Hundreds or thousands of variables Run-time in the tens of minutes
Consideration in design and implementation Scalability of method (both computational efficiency and memory usage)
Memory requirement Run-time Utilizing structure in the problem to enhance performance (e.g. large-scale problems are often sparse)
Numerical conditioning Robustness Parallel implement ability
![Page 47: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/47.jpg)
Large Scale Problems (2)
Recommendations Non-Newton-like methods such as method of steepest descent and conjugate gradient method
Conjugate gradient method, especially when implemented in parallel, is is usually the best
Scaling of variables is essential Scaling of gradient direction (when using steepest descent method)
![Page 48: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/48.jpg)
Parallel Implementation
Motivation Significant speedups can be achieved by implementing the method on a high-performance parallel computer
Parallel computing is affordable with cluster of computers running Linux and freely available parallel libraries.
Recommendation The conjugate gradient method is readily parallelizable on both distributed-memory and shared-memory architectures. Signficant speedups and efficiencies can are obtained in practice.
Other advantages of parallel implementation Improved search for a global minimum. This is a consequence of the non-deterministic execution order of parallel programs
Faster and stable convergence. This has been observed in practice for the solution of complex and large problems
![Page 49: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/49.jpg)
MATLAB Optimization Toolbox
The MATLAB toolbox implements several methods. This makes experimentation straightforward and the selection of the best method easier. However, MATLAB code is not as efficient as compiled C or Fortran code. Hence, it is appropriate for small to medium scale problems only.
Two key functions fminunc - Multidimensional unconstrained nonlinear minimization. fmincon - Multidimensional constrained nonlinear minimization.
These m-files are the primary interface for constrained and unconstrained optimization in MATLAB. Type help optim to list all toolbox functions
![Page 50: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/50.jpg)
Unconstrained Optimization
Syntax X=FMINUNC(FUN,X0,OPTIONS)
where FUN = objective function to be minimized; X0 = starting vector; OPTIONS = structure specifying optimization options
Example FUN can be specified using @:
X = fminunc(@myfun,2) where MYFUN is a MATLAB function such as: function F = myfun(x)
F = sin(x) + 3;
![Page 51: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/51.jpg)
Constrained Optimization
Syntax X=FMINCON(FUN,X0,A,B,Aeq,Beq,LB,UB,NONLCON,OPTIONS) This function minimizes the optimization problem: Minimize F(X) subject to: A*X <= B; Aeq*X = Beq (linear constraints)
C(X) <= 0; Ceq(X) = 0 (nonlinear constraints) LB <= X <= UB
The function NONLCON accepts X and returns the vectors C and Ceq, representing the nonlinear inequalities and equalities, respectively. Like FUN, NONLCON can be specified with a function or with INLINE.
![Page 52: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/52.jpg)
Setting Optimization Parameters
The OPTIMSET function is used to modify the OPTIONS structure that specifies optimization parameters such as optimization method and line search method.
Syntax OPTIONS = OPTIMSET('PARAM1',VALUE1,'PARAM2',VALUE2,...)
For medium scale problems, MATLAB provides steepest descent, Newton and Quasi-Newton (BFGS and DFP) methods
For large scale problems, MATLAB provides CG and sequential quadratic programming methods
HessUpdate - [ {bfgs} | dfp | steepdesc ] Use help command for more details.
![Page 53: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/53.jpg)
![Page 54: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/54.jpg)
Example (1)
Min. f(x)= 100*(x(2)-x(1)^2)^2+(1-x(1))^2 (banana function)
05
1015
2025
3035
0
10
20
30
400
500
1000
1500
2000
2500
3000
![Page 55: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/55.jpg)
Example (2)
Optimal solution is x* = [1, 1] and f(x*) = 0
BFGS Quasi-Newton method Value of the function at the solution: 8.98565e-009 Number of function evaluations: 105
DFP Quasi-Newton method Value of the function at the solution: 2.26078e-008 Number of function evaluations: 109
Steepest descent method Value of the function at the solution: 4.84404 Number of function evaluations: 302 Steepest descent did not converge in 302 iterations.
![Page 56: Nonlinear Optimizationweb.lums.edu.pk › ~akarim › pub › optimization1.pdf · 2015-09-15 · Nonlinear Optimization Theory and Practice by Asim Karim Computer Science Dept. Lahore](https://reader035.vdocuments.net/reader035/viewer/2022081611/5f03a5d97e708231d40a15a0/html5/thumbnails/56.jpg)
References
Dimitri P. Bertsekas, Nonlinear Programming, Athena Scientific, MA, 1995.
Edward K. Chong et al. An Introduction to Optimization, Wiley, 2001.
Hojjat Adeli and Asim Karim, Construction Scheduling, Cost Optimization and Management, Spon Press, 2001.
Ananth Grama et al., An Introduction to Parallel Computing: Design and Analysis of Algorithms, Addison-Wesley, 2003.
MATLAB Optimization Toolbox, http://www.mathworks.com/access/helpdesk/help/toolbox/optim/optim.shtml