gradient methods

7/30/2019 Gradient Methods

1/53

Gradient Methods

May 2005


2/53

Preview

Background

Steepest Descent Conjugate Gradient


3/53

Preview

Background



4/53

Background

Motivation

The gradient notion The Wolfe Theorems


5/53

Motivation

The min(max) problem:

But we learned in calculus how to solve that

kind of question!

)(min xfx


6/53

Motivation

Not exactly,

Functions: High order polynomials:

What about function that dont have an analytic

presentation: Black Box

x1

6

x3 1

120

x5 1

5040

x7

RRfn

:


7/53

Motivation- real world problem

Connectivity shapes (isenburg,gumhold,gotsman)

What do we get only from C without geometry?{ ( , ), }mesh C V E geometry


8/53


First we introduce error functionals and then try

to minimize them:

2

3

( , )

( ) 1n

s i j

i j E

E x x x

( , )

1

( )i j ii j EiL x x xd

3 2

1

( ) ( )nn

r i

i

E x L x


9/53


Then we minimize:

High dimension non-linear problem.

The authors use conjugate gradient methodwhich is maybe the most popular optimizationtechnique based on what well see here.

3

( , ) arg min 1 ( ) ( )n

s rx

E C E x E x


10/53


Changing the parameter:

3

( , ) arg min 1 ( ) ( )n

s rx

E C E x E x


11/53


12/53

Background

Motivation



13/53

:=f ( ),x y

cos1

2x

cos1

2y x


14/53

Directional Derivatives:

first, the one dimension derivative:


15/53

x

yxf

),(

y

yxf

),(

Directional Derivatives :

Along the Axes


16/53

v

yxf

),(

2

Rv

1v

Directional Derivatives :

In general direction


17/53

Directional

Derivatives

x

yxf

),(

y

yxf

),(


18/53

In the plane

2

R

RRf 2:

y

f

x

fyxf :),(

The Gradient: Definition in

),( yxf


19/53

n

n

x

f

x

fxxf ,...,:),...,(

1

1

RRfn :

The Gradient: Definition


20/53

The Gradient Properties

The gradient defines (hyper) plane

approximating the function infinitesimally

yy

fx

x

fz


21/53

The Gradient properties

By the chain rule: (important for later use)

vfpv

fp ,)(

1v

pf

v


22/53


Proposition 1:

is maximal choosing

is minimal choosing

(intuitive: the gradient points at the greatest change direction)

v

f

p

p

ffv

1

pp

ffv

1


23/53


Proof: (only for minimum case)

Assign: by chain rule:

p

p

p

pp

p

p

p

p

ff

fff

f

f

f

fp

v

yxf

2

,1

)(

)(

1,)()(

),(

p

pffv

1


24/53


On the other hand for general v:

p

p

pp

fpv

yxf

f

vfvfpv

yxf

)(),(

,)(),(


25/53


Proposition 2: let be asmooth function around P,

if f has local minimum (maximum) at p

then,

(Intuitive: necessary for local min(max))

RRfn :

0 pf

1

C


26/53


Proof:

Intuitive:


27/53


Formally: for any

We get:

}0{\nRv

0)(

,)()0()(

0

p

p

f

vf

dt

vtpdf


28/53


We found the best INFINITESIMAL DIRECTIONat each point,

Looking for minimum: blind man procedure

How can we derive the way to the minimum

using this knowledge?


29/53

Background

Motivation



30/53

The Wolfe Theorem

This is the link from the previous gradient

properties to the constructive algorithm.

The problem:

)(min xfx


31/53

The Wolfe Theorem

We introduce a model for algorithm:

Data:Step 0: set i=0

Step 1: if stop,

else, compute search directionStep 2: compute the step-size

Step 3: set go to step 1

n

Rx

0

0)( ixfn

i

Rh

)(minarg0

iii hxf

iiiihxx

1


32/53

The Wolfe Theorem

The Theorem: suppose C1

smooth, and exist continuous function:

And,

And, the search vectors constructed by the

model algorithm satisfy:

RRf n :

]1,0[: nRk

0)(0)(: xkxfx

iiiii hxfxkhxf )()(),(


33/53

The Wolfe Theorem

And

Then if is the sequence constructed by

the algorithm model,

then any accumulation point y of this sequence

satisfy:

0}{ iix

0)( yf

00)( ihyf


34/53

The Wolfe Theorem

The theorem has very intuitive interpretation :

Always go in decent direction.

)( ixf

ih


35/53

Preview

Background



36/53

Steepest Descent

What it mean?

We now use what we have learned toimplement the most basic minimization

technique.

First we introduce the algorithm, which is a

version of the model algorithm.

The problem:)(min xf

x


37/53

Steepest Descent

Steepest descent algorithm:

Data:Step 0: set i=0

Step 1: if stop,

else, compute search direction

Step 2: compute the step-size

Step 3: set go to step 1

n

Rx

0

0)( ixf

)( ii xfh

)(minarg0

iii hxf

iiii

hxx

1


38/53

Steepest Descent

Theorem: if is a sequence constructed

by the SD algorithm, then every accumulation

point y of the sequence satisfy:

Proof: from Wolfe theorem

Remark: Wolfe theorem gives us numerical stability if the derivatives arent

given (are calculated numerically).

0)( yf

0}{ iix


39/53

Steepest Descent

From the chain rule:

Therefore the method of steepest descent

looks like this:

0),()( iiiii hhxfhxfdd


40/53

Steepest Descent


41/53

Steepest Descent

The steepest descent find critical point and

local minimum.

Implicit step-size rule

Actually we reduced the problem to finding

minimum:

There are extensions that gives the step size

rule in discrete sense. (Armijo)

RRf :


42/53

Steepest Descent

Back with our connectivity shapes: the authors

solve the 1-dimension problem analytically.

They change the spring energy and get a

quartic polynomial in x

)(minarg0

iii hxf

2

23

( , )

( ) 1n

s i j

i j E

E x x x


43/53

Preview

Background



44/53

Conjugate Gradient

We from now on assume we want to minimize

the quadratic function:

This is equivalent to solve linear problem:

There are generalizations to general functions.

cxbAxxxf TT 2

1)(

bAxxf )(0


45/53

Conjugate Gradient

What is the problem with steepest descent?

We can repeat the same directions over and

over

Conjugate gradient takes at mostn steps.


46/53

Conjugate Gradient

0x

1x

0d

1e

0e

0

~x

bxA ~

,...,...,, 10 jddd Search directions should span

iiii dxx 1

iii AexxAxf

xAAxbAxxf

)~()(

~)(

xxe ii~

n


47/53

Conjugate Gradient

0x

1x

0d

0

~x

Given , how do we calculate ? (as before)jd

i

T

i

i

T

i

i

T

i

i

T

ii

iii

T

i

i

T

i

iTi

Add

xfd

Add

AeddeAd

Aed

xfd

)(0)(

0

0)(

1

1

j

)( 1 ixf


48/53

Conjugate Gradient

0x

1x

0d

1e

0e

0~x

How do we find ?

We want that after n step the error will be 0 :jd

1

0

0

n

i

ii de

1

0

110020010 ...

j

i

iij deddedee

1

0

1

0

j

i

ii

n

i

iij dde


49/53

Conjugate Gradient

Here an idea: if then:jj

11

0

1

0

1

0

1

0

n

ji

ii

j

i

ii

n

i

ii

j

i

ii

n

i

iij ddddde

So if ,nj 0ne


50/53

Conjugate Gradient

So we look for such that :jj jd

0iT

j Add

Simple calculation shows that if we take

A - conjugate (- orthogonal)ji


51/53

Conjugate Gradient

We have to find anA conjugate basis

We can do gram-schmidt process, but we

should be careful since it is an O(n) process:

1...0, njdj

k

i

k

kiii dud

1

0

,nuuu ,...,, 21

Some series of vectors


52/53

Conjugate Gradient

So for a arbitrary choice of we dont earn

nothing.

Luckily, we can choose so that the

conjugate direction calculation is O(m) where

m is the number of non-zero entries in .

The correct choice of is:

iu

iu

A

iu

)( ii xfu


53/53

Conjugate Gradient

So the conjugate gradient algorithm for minimizing f:Data:

Step 0:

Step 1:

Step 2:

Step 3:

Step 4: and repeat n times.

nx 0

)(: 000 xfrd

i

T

i

i

T

ii

Add

rr

iiii dxx 1

i

T

i

i

T

ii

rr

rr 111

iiii drd 111

)(: ii xfr

gradient methods

Documents