1 lecture 10: descent methods gradient descent (reminder)

• Generic descent algorithm

• Generalization to multiple dimensions

• Problems of descent methods, possible improvements

• Fixes

• Local minima

Lecture 10: descent methods

Gradient descent (reminder)

Minimum of a function is found by following the slope of the function

Gradient descent (illustration)

next step

new gradient

next step

Gradient descent: algorithm

Start with a point (guess)

Repeat

Determine a descent direction

Choose a step

Update

Until stopping criterion is satisfied

Repeat

Choose a step

Update

Direction: downhill

Repeat

Choose a step

Update

Repeat

Choose a step

Update

Now you are here

Repeat

Choose a step

Update

Stop when “close”

from minimum

Start with a point (guess) guess = x

Repeat

Determine a descent direction direction = -f’(x)

Choose a step step = h > 0

Update x:=x–hf’(x)

Until stopping criterion is satisfied f’(x)~0

Example of 2D gradient: pic of the MATLAB demo

Illustration of the gradient in 2D

Definition of the gradient in 2D

This is just a genaralization of the derivative in two dimensions.

This can be generalized to any dimension.

Gradient descent works in 2D

Generalization to multiple dimensions

Repeat

Choose a step

Update

Repeat

Choose a step

Update

Direction: downhill

Repeat

Choose a step

Update

Repeat

Choose a step

Update

Now you are here

Repeat

Choose a step

Update

Stop when “close”

from minimum

Start with a point (guess) guess = x

Repeat

Determine a descent direction direction = -f(x)

Choose a step step = h > 0

Update x:=x–h Vf’(x)

Until stopping criterion is satisfied Vf’(x)~0

Multiple dimensions

Everything that you have seen with derivatives can be generalized

with the gradient.

For the descent method, f’(x) can be replaced by

In two dimensions, and by

in N dimensions.

Example of 2D gradient: MATLAB demo

The cost to buy a portfolio is:

If you want to minimize the price to buy your portfolio, you

need to compute the gradient of its price:

Problem 1: choice of the step

When updating the current computation:

- small steps: inefficient

- large steps: potentially bad results

Too many steps:

takes too long to converge

Problem 1: choice of the step

When updating the current computation:

- small steps: inefficient

- large steps: potentially bad results

Current point

Next point (went too far)

Problem 2: « ping pong effect »

[S. Boyd, L. Vandenberghe, Convex Convex Optimization lect. Notes, Stanford Univ. 2004 ]

Problem 2: « ping pong effect »

Problem 2: (other norm dependent issues)

Problem 3: stopping criterion

Intuitive criterion:

In multiple dimensions:

Or equivalently

Rarely used in practice.

More about this in EE227A (convex optimization, Prof. L. El Ghaoui).

Several methods exist to address this problem

- Line search methods, in particular

- Backtracking line search

- Exact line search

- Normalized steepest descent

- Newton steps

Fundamental problem of the method: local minima

Local minima: pic of the MATLAB demo

The iterations of the algorithm

converge to a local minimum

Local minima: pic of the MATLAB demo

View of the algorithm is « myopic »

1 lecture 10: descent methods gradient descent (reminder)

Documents

learning to learn by gradient descent with...

deep learning i: gradient descentgd.pdf · roadmap...

holonomic gradient descent

gradient descent method

gradient descent rule tuning

gradient methods may 2005. preview background steepest...

linear regression & gradient descent

proximal gradient descent › ~aarti › class ›...

gradient methods april 2004. preview background steepest...

learning to learn by gradient descent by gradient...

mini-batch deeplearning.ai gradient descent · batch vs....

lecture 6: stochastic gradient descent sanjeev arora elad...

exponentiated gradient versus gradient descent for linear...

learning to learn by gradient descent by gradient...

an overview of gradient descent optimization algorithms...

optimization based on gradient descent

optimization/gradient descent

stochastic gradient descent - cmu statistics

gradient descent - university of washington

better data assimilation through gradient descent