1 computacion inteligente derivative-based optimization

80
1 Computacion Inteligente Derivative-Based Optimization

Upload: rolf-ford

Post on 16-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Computacion Inteligente Derivative-Based Optimization

1

Computacion Inteligente

Derivative-Based Optimization

Page 2: 1 Computacion Inteligente Derivative-Based Optimization

2

Contents

• Optimization problems

• Mathematical background

• Descent Methods

• The Method of Steepest Descent

• Conjugate Gradient

Page 3: 1 Computacion Inteligente Derivative-Based Optimization

3

OPTIMIZATION PROBLEMS

Page 4: 1 Computacion Inteligente Derivative-Based Optimization

4Terms in Mathematical Optimization

1. Objective function – mathematical function which is optimized by changing the

values of the design variables.

2. Design Variables – Those variables which we, as designers, can change.

3. Constraints – Functions of the design variables which establish limits in individual

variables or combinations of design variables.

Page 5: 1 Computacion Inteligente Derivative-Based Optimization

5Problem Formulation

3 basic ingredients…– an objective function,– a set of decision variables,– a set of equality/inequality constraints.

The problem is

to search for the values of the decision variables that minimize the objective function while satisfying the constraints…

Page 6: 1 Computacion Inteligente Derivative-Based Optimization

6Mathematical Definition

– Design Variables: decision and objective vector

– Constraints: equality and inequality

– Bounds: feasible ranges for variables

– Objective Function: maximization can be converted to minimization due to the duality principle

max minf x f x

min : , 0, 0L U

xy f x x x x h x g x

Obective Decision vector

Bounds constrains

Page 7: 1 Computacion Inteligente Derivative-Based Optimization

7Steps in the Optimization Process

1. Identify the quantity or function, f, to be optimized.

2. Identify the design variables: x1, x2, x3, …,xn.

3. Identify the constraints if any exist

a. Equalities

b. Inequalities

4. Adjust the design variables (x’s) until f is optimized and all of the constraints are satisfied.

Page 8: 1 Computacion Inteligente Derivative-Based Optimization

8Local and Global Optimum Designs

1. Objective functions may be unimodal or multimodal.

a. Unimodal – only one optimumb. Multimodal – more than one optimum

2. Most search schemes are based on the assumption of a unimodal surface. The optimum determined in such cases is called a local optimum design.

3. The global optimum is the best of all local optimum designs.

Page 9: 1 Computacion Inteligente Derivative-Based Optimization

9Weierstrass Theorem

• Existence of global minimum

• If f(x) is continuous on the feasible set S which is closed and bounded, then f(x) has a global minimum in S

– A set S is closed if it contains all its boundary pts.

– A set S is bounded if it is contained in the interior of some circle

compact = closed and bounded

)numberfinite:,( ccxxT

Page 10: 1 Computacion Inteligente Derivative-Based Optimization

10Example of an Objective Function

-1 -0.5 0 0.5 1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

x1

x2

Page 11: 1 Computacion Inteligente Derivative-Based Optimization

11Multimodal Objective Function

0 0.5 1 1.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

local maxsaddle point

Page 12: 1 Computacion Inteligente Derivative-Based Optimization

12Optimization Approaches

• Derivative-based optimization (gradient based)

– Capable of determining “search directions” according to an objective function’s derivative information

• steepest descent method;

• Newton’s method; Newton-Raphson method;

• Conjugate gradient, etc.

• Derivative-free optimization

• random search method;

• genetic algorithm;

• simulated annealing; etc.

Page 13: 1 Computacion Inteligente Derivative-Based Optimization

13

MATHEMATICAL BACKGROUND

Page 14: 1 Computacion Inteligente Derivative-Based Optimization

14Positive Definite Matrices

• A square matrix M is positive definite if

• It is positive semidefinite if

0Tx Mx for all x ≠ 0

0Tx Mx for all x

The scalar xTMx = is called a quadratic form.,x Mx

Page 15: 1 Computacion Inteligente Derivative-Based Optimization

15Positive Definite Matrices

• A symmetric matrix M = MT is positive definite if and only if its eigenvalues λi > 0. (semidefinite ↔ λi ≥ 0)

– Proof (→): Let vi the eigenvector for the i-th eigenvalue λi

– Then,

– which implies λi > 0,

i i iMv v

20 T T

i i i i i i iv Mv v v v

prove that positive eigenvalues imply positive definiteness.

Page 16: 1 Computacion Inteligente Derivative-Based Optimization

16Positive Definite Matrices

• Theorem: If a matrix M = UTU then it is positive definite

• Proof. Let’s f be defined as

• If we can show that f is always positive then M must be positive definite. We can write this as

• Provided that Ux always gives a non zero vector for all values of x except when x = 0 we can write b = U x, i.e.

• so f must always be positive

T T Tf x Mx x U Ux

2Ti

i

f b b b

Tf Ux Ux

Page 17: 1 Computacion Inteligente Derivative-Based Optimization

17Quadratic Functions

• f: Rn → R is a quadratic function if

– where Q is symmetric.

1

2T Tf x x Qx b x c

Page 18: 1 Computacion Inteligente Derivative-Based Optimization

18Quadratic Functions

• It is no necessary for Q be symmetric.– Suposse matrix P non-symmetric

Q is symmetric

1 1

11 12 1 1

21 21 2

1

1 1

2 2

1: :2

:

n nT

ij i ji j

n

n nn

f x p x x x Px

p p p x

p xx x

p p

1 1( )

2 2T

ij ij jix Q x where q p p

Page 19: 1 Computacion Inteligente Derivative-Based Optimization

19Quadratic Functions

– Suposse matrix P non-symmetric. Example

Q is symmetric

2 2 21 1 2 1 3 2 2 3 3

1( ) 2 2 4 6 4 5

2f x x x x x x x x x x

2 2 41

( ) , 0 6 42

0 0 5

Tf x x Px P

2 1 21

, 1 6 22

2 2 5

Tx Qx Q

Page 20: 1 Computacion Inteligente Derivative-Based Optimization

20Quadratic functions

• Given the quadratic function

1

2T Tf x x Qx b x c

If Q is positive definite, then f is a parabolic “bowl.”

Page 21: 1 Computacion Inteligente Derivative-Based Optimization

21Quadratic functions

• Two other shapes can result from the quadratic form.

– If Q is negative definite, then f is a parabolic “bowl” up side down.

– If Q is indefinite then f describes a saddle.

Page 22: 1 Computacion Inteligente Derivative-Based Optimization

22Quadratic Functions

• Quadratics are useful in the study of optimization.

– Often, objective functions are “close to” quadratic near the solution.

– It is easier to analyze the behavior of algorithms when applied to quadratics.

– Analysis of algorithms for quadratics gives insight into their behavior in general.

Page 23: 1 Computacion Inteligente Derivative-Based Optimization

23One Dimension Derivative

• The derivative of f: R → R is a function f ′: R → R given by

• if the limit exists.

0

' limh

df x f x h f xf x

dx h

Page 24: 1 Computacion Inteligente Derivative-Based Optimization

24Directional Derivatives

• Along the Axes…

x

yxf

),(

y

yxf

),(

Page 25: 1 Computacion Inteligente Derivative-Based Optimization

25Directional Derivatives

• In general direction…

v

yxf

),(

2Rv

1v

Page 26: 1 Computacion Inteligente Derivative-Based Optimization

26Directional Derivatives

x

yxf

),(

y

yxf

),(

Page 27: 1 Computacion Inteligente Derivative-Based Optimization

27Directional Derivatives

• Definition: A real-valued function f: Rn → R is said to be continuously differentiable if the partial derivatives

• exist for each x in Rn and are continuous functions of x.

• In this case, we say f C1 (a smooth function C1)

1

,...,n

f f

x x

Page 28: 1 Computacion Inteligente Derivative-Based Optimization

28The Gradient vector

• Definition: The gradient of f: in R2 → R:

It is a function ∇f: R2 → R2 given by

( , ) :T

f ff x y

x y

),( yxfIn the plane

Page 29: 1 Computacion Inteligente Derivative-Based Optimization

29The Gradient vector

• Definition: The gradient of f: Rn → R is a function ∇f: Rn → Rn given by

11

( ,..., ) : ,...,

T

nn

f ff x x

x x

Page 30: 1 Computacion Inteligente Derivative-Based Optimization

30The Gradient Properties

• The gradient defines (hyper) plane approximating the function infinitesimally

yy

fx

x

fz

Page 31: 1 Computacion Inteligente Derivative-Based Optimization

31The Gradient Properties

• By the chain rule

vfpv

fp ,)(

1v

pf

v

Page 32: 1 Computacion Inteligente Derivative-Based Optimization

32The Gradient Properties

• Proposition 1:

is maximal choosing

p

p

ff

v

1

intuitive: the gradient points at the greatest change direction

v

f

Prove it!

Page 33: 1 Computacion Inteligente Derivative-Based Optimization

33The Gradient properties

• Proof:

– Assign:

– by chain rule:

1p

p

v ff

2

( , ) 1( ) ( ) , ( )

( )

1,

p p

p

p

p p p

p p

f x yp f f

v f

ff f f

f f

Page 34: 1 Computacion Inteligente Derivative-Based Optimization

34The Gradient properties

• Proof:

– On the other hand for general v:

( , )( ) ,

,

p p

p

p

f x yp f v f v

v

f

f x yf p

v

Page 35: 1 Computacion Inteligente Derivative-Based Optimization

35The Gradient Properties

• Proposition 2: let f: Rn → R be a smooth function C1 around p,

• if f has local minimum (maximum) at p then,

0 pf

Intuitive: necessary for local min(max)

Page 36: 1 Computacion Inteligente Derivative-Based Optimization

36The Gradient Properties

• Proof: intuitive

Page 37: 1 Computacion Inteligente Derivative-Based Optimization

37The Gradient Properties

• We found the best INFINITESIMAL DIRECTION at each point,

• Looking for minimum: “blind man” procedure

• How can we derive the way to the minimum using this knowledge?

Page 38: 1 Computacion Inteligente Derivative-Based Optimization

38Jacobian

• The gradient of f: Rn → Rm is a function Df: Rn → Rm×n given by

called Jacobian

Note that for f: Rn → R , we have ∇f(x) = Df(x)T.

Page 39: 1 Computacion Inteligente Derivative-Based Optimization

39Derivatives

• If the derivative of ∇f exists, we say that f is twice differentiable.

– Write the second derivative as D2f (or F), and call it the Hessian of f.

Page 40: 1 Computacion Inteligente Derivative-Based Optimization

40Level Sets and Gradients

• The level set of a function f: Rn → R at level c is the set of points S = {x: f(x) = c}.

Page 41: 1 Computacion Inteligente Derivative-Based Optimization

41Level Sets and Gradients

• Fact: ∇f(x0) is orthogonal to the level set at x0

Page 42: 1 Computacion Inteligente Derivative-Based Optimization

42Level Sets and Gradients

• Proof of fact:

– Imagine a particle traveling along the level set.

– Let g(t) be the position of the particle at time t, with g(0) = x0.

– Note that f(g(t)) = constant for all t.

– Velocity vector g′(t) is tangent to the level set.

– Consider F(t) = f(g(t)). We have F′(0) = 0. By the chain rule,

– Hence, ∇f(x0) and g′(0) are orthogonal.

' 0 ' 0 0 0T

F g f g

Page 43: 1 Computacion Inteligente Derivative-Based Optimization

43Taylor's Formula

• Suppose f: R → R is in C1. Then,

– o(h) is a term such that o(h) = h → 0 as h → 0.

– At x0, f can be approximated by a linear function, and the approximation gets better the closer we are to x0.

0 0 00'f x f x f x x ox x x

Page 44: 1 Computacion Inteligente Derivative-Based Optimization

44Taylor's Formula

• Suppose f: R → R is in C2. Then,

2

0

2

0 0 0 0 0

1' ''

2

f x f x f x x x f x x

o x x

x

– At x0, f can be approximated by a quadratic function.

Page 45: 1 Computacion Inteligente Derivative-Based Optimization

45Taylor's Formula

• Suppose f: Rn → R.

– If f in C1, then

– If f in C2, then

0 00 0

Tf x f x f x x x xx o

0 0

2

0 0 0

0

0

1

2

T Tf x f x f x x x x x F x x

x

x

o x

Page 46: 1 Computacion Inteligente Derivative-Based Optimization

46In What Direction does a Gradient Point?

• We already know that ∇f(x0) is orthogonal to the level set at x0.

– Suppose ∇f(x0) ≠ 0.

• Fact: ∇f points in the direction of increasing f.

Page 47: 1 Computacion Inteligente Derivative-Based Optimization

47Proof of Fact

• Consider xα = x0 + α∇f(x0), α > 0.

– By Taylor's formula,

• Therefore, for sufficiently small ,

f(xα) > f(x0)

00 0 0

2

0 0

Tf x f x x x f x

f x f x

o x x

o

Page 48: 1 Computacion Inteligente Derivative-Based Optimization

48

DESCENT METHODS

Page 49: 1 Computacion Inteligente Derivative-Based Optimization

49The Wolfe Theorem

• This theorem is the link from the previous gradient properties to the constructive algorithm.

• The problem:

)(min xfx

Page 50: 1 Computacion Inteligente Derivative-Based Optimization

50The Wolfe Theorem

• We introduce a model for algorithm:nRx 0

0)( ixfn

i Rh

)(minarg0

iii hxf

iiii hxx 1

Data

Step 0: set i = 0

Step 1: if stop,

Step 2: compute the step-size

Step 3: set go to step 1

else, compute search direction

Page 51: 1 Computacion Inteligente Derivative-Based Optimization

51The Wolfe Theorem

• The Theorem:

– Suppose f: Rn → R C1 smooth, and exist continuous function: k: Rn → [0,1], and,

– And, the search vectors constructed by the model algorithm satisfy:

0)(0)(: xkxfx

iiiii hxfxkhxf )()(),(

Page 52: 1 Computacion Inteligente Derivative-Based Optimization

52The Wolfe Theorem

– And

• Then

– if is the sequence constructed by the algorithm model,

– then any accumulation point y of this sequence satisfy:

00)( ihyf

0)( yf

0}{ iix

Page 53: 1 Computacion Inteligente Derivative-Based Optimization

53The Wolfe Theorem

• The theorem has very intuitive interpretation:

• Always go in descent direction.

)( ixf

ih

The principal differences between various descent algorithms lie in the first procedure for determining successive directions

Page 54: 1 Computacion Inteligente Derivative-Based Optimization

54

STEEPEST DESCENT

Page 55: 1 Computacion Inteligente Derivative-Based Optimization

55The Method of Steepest Descent

• We now use what we have learned to implement the most basic minimization technique.

• First we introduce the algorithm, which is a version of the model algorithm.

• The problem: )(min xfx

Page 56: 1 Computacion Inteligente Derivative-Based Optimization

56The Method of Steepest Descent

• Steepest descent algorithm:nRx 0

0)( ixf

)(minarg0

iii hxf

iiii hxx 1

Data

Step 0: set i = 0

Step 1: if stop,

Step 2: compute the step-size

Step 3: set go to step 1

else, compute search direction )( ii xfh

Page 57: 1 Computacion Inteligente Derivative-Based Optimization

57The Method of Steepest Descent

• Theorem:

– If is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy:

– Proof: from Wolfe theorem

0)( yf

0}{ iix

Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t given (are calculated numerically).

Page 58: 1 Computacion Inteligente Derivative-Based Optimization

58The Method of Steepest Descent

• How long a step to take?

1i i ix x h

Note search direction is if x

– We are limited to a line search

• Choose λ to minimize f .

. . . directional derivative is equal to zero.

Page 59: 1 Computacion Inteligente Derivative-Based Optimization

59The Method of Steepest Descent

• How long a step to take?

– From the chain rule:

• Therefore the method of steepest descent looks like this:

0),()( iiiii hhxfhxfd

d

1( )i if x h They are orthogonal !

Page 60: 1 Computacion Inteligente Derivative-Based Optimization

60The Method of Steepest Descent

Page 61: 1 Computacion Inteligente Derivative-Based Optimization

61Gradient Descent Example

Given:

Find the minimum when x1 is allowed to vary from 0.5 to 1.5 and x2 is allowed to vary from 0 to 2.

1 2 1 2 1 2, 2sin 1.47 sin 0.34 sin sin 1.9f x x x x x x

λ arbitrary

Page 62: 1 Computacion Inteligente Derivative-Based Optimization

62Optimum Steepest Descent Example

Given:

Find the minimum when x1 is allowed to vary from 0.5 to 1.5 and x2 is allowed to vary from 0 to 2.

1 2 1 2 1 2, 2sin 1.47 sin 0.34 sin sin 1.9f x x x x x x

Page 63: 1 Computacion Inteligente Derivative-Based Optimization

63

CONJUGATE GRADIENT

Page 64: 1 Computacion Inteligente Derivative-Based Optimization

64Conjugate Gradient

• We from now on assume we want to minimize the quadratic function:

• This is equivalent to solve linear problem:

cxbAxxxf TT 2

1)(

( ) 1 10

2 2Tf x

f x A x Ax bx

Ax bIf A symmetric

Page 65: 1 Computacion Inteligente Derivative-Based Optimization

65Sample: 2D lineal system

• La solucion es la interseccion de las lineas

3 2

2 6A

2

8b

0c

Page 66: 1 Computacion Inteligente Derivative-Based Optimization

66Sample: 2D lineal system

– Cada elipsoide tiene f(x) constante

In general, the solution x lies at the intersection pointof n hyperplanes, each having dimension n – 1.

Page 67: 1 Computacion Inteligente Derivative-Based Optimization

67Conjugate Gradient

• What is the problem with steepest descent?

– We can repeat the same directions over and over…

• Wouldn’t it be better if, every time we took a step, we got it right the first time?

Page 68: 1 Computacion Inteligente Derivative-Based Optimization

68Conjugate Gradient

• What is the problem with steepest descent?

– We can repeat the same directions over and over…

• Conjugate gradient requires n gradient evaluations and n line searches.

Page 69: 1 Computacion Inteligente Derivative-Based Optimization

69Conjugate Gradient

• First, let’s define de error as

bxA ~

xxe ii~

• ei is a vector that indicates how far we are from the solution.

solution

Start point

Page 70: 1 Computacion Inteligente Derivative-Based Optimization

70Conjugate Gradient

• Let’s pick a set of orthogonal search directions

0 1 1, ,..., ,...,j nd d d d

iiii dxx 1

(should span Rn)

– In each search direction, we’ll take exactly one step,

that step will be just the right length to line up evenly with x

Page 71: 1 Computacion Inteligente Derivative-Based Optimization

71Conjugate Gradient

– Unfortunately, this method only works if you already know the answer.

• Using the coordinate axes as search directions…

Page 72: 1 Computacion Inteligente Derivative-Based Optimization

72Conjugate Gradient

• We have

bxA ~

iiii dxx 1

( )f x Ax b Ax Ax

xxe ii~

( ) ( )i i if x A x x Ae

Page 73: 1 Computacion Inteligente Derivative-Based Optimization

73Conjugate Gradient

• Given , how do we calculate ?

iiii dxx 1

jd j

• ei+1 should be orthogonal to di

0 1d e

Page 74: 1 Computacion Inteligente Derivative-Based Optimization

74Conjugate Gradient

• Given , how do we calculate ?

– That is

jd j

1( ) 0Ti id f x

( )T Ti i i i

i T Ti i i i

d Ae d f x

d Ad d Ad

1 0Ti id Ae

( ) 0Ti i i id A e d

Page 75: 1 Computacion Inteligente Derivative-Based Optimization

75Conjugate Gradient

• How do we find ?

– Since search vectors form a basis

jd

1

00

n

iiide

1

0110020010 ...

j

iiij deddedee

On the other hand

1

0

1

0

j

iii

n

iiij dde

Page 76: 1 Computacion Inteligente Derivative-Based Optimization

76Conjugate Gradient

• We want that after n step the error will be 0:

– Here an idea: if then:jj

11

0

1

0

1

0

1

0

n

jiii

j

iii

n

iii

j

iii

n

iiij ddddde

nj 0ne

So if:

Page 77: 1 Computacion Inteligente Derivative-Based Optimization

77Conjugate Gradient

• So we look for such that

– Simple calculation shows that if we take

jj jd

0iTj Add

The correct choice is

ji

( )i id f x

Page 78: 1 Computacion Inteligente Derivative-Based Optimization

78Conjugate gradient

• Conjugate gradient algorithm for minimizing f:

Step 4: and repeat n times

Step 1:

nRx 0Data

Step 0:

Step 3:

)(: 000 xfrd

iTi

iT

ii Add

rr

iiii dxx 1

iT

i

iT

ii rr

rr 111

iiii drd 111

)(: ii xfr

Step 2:

Page 79: 1 Computacion Inteligente Derivative-Based Optimization

79

Sources

• J-Shing Roger Jang, Chuen-Tsai Sun and Eiji Mizutani, Slides for Ch. 5 of “Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence”, First Edition, Prentice Hall, 1997.

• Djamel Bouchaffra. Soft Computing. Course materials. Oakland University. Fall 2005

• Lucidi delle lezioni, Soft Computing. Materiale Didattico. Dipartimento di Elettronica e Informazione. Politecnico di Milano. 2004

• Jeen-Shing Wang, Course: Introduction to Neural Networks. Lecture notes. Department of Electrical Engineering. National Cheng Kung University. Fall, 2005

Page 80: 1 Computacion Inteligente Derivative-Based Optimization

80

Sources

• Carlo Tomasi, Mathematical Methods for Robotics and Vision. Stanford University. Fall 2000

• Petros Ioannou, Jing Sun, Robust Adaptive Control. Prentice-Hall, Inc, Upper Saddle River: NJ, 1996

• Jonathan Richard Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Edition 11/4. School of Computer Science. Carnegie Mellon University. Pittsburgh. August 4, 1994

• Gordon C. Everstine, Selected Topics in Linear Algebra. The GeorgeWashington University. 8 June 2004