gradient methods

Upload: xyzro

Post on 04-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Gradient Methods

    1/53

    Gradient Methods

    May 2005

  • 7/30/2019 Gradient Methods

    2/53

    Preview

    Background

    Steepest Descent Conjugate Gradient

  • 7/30/2019 Gradient Methods

    3/53

    Preview

    Background

    Steepest Descent Conjugate Gradient

  • 7/30/2019 Gradient Methods

    4/53

    Background

    Motivation

    The gradient notion The Wolfe Theorems

  • 7/30/2019 Gradient Methods

    5/53

    Motivation

    The min(max) problem:

    But we learned in calculus how to solve that

    kind of question!

    )(min xfx

  • 7/30/2019 Gradient Methods

    6/53

    Motivation

    Not exactly,

    Functions: High order polynomials:

    What about function that dont have an analytic

    presentation: Black Box

    x1

    6

    x3 1

    120

    x5 1

    5040

    x7

    RRfn

    :

  • 7/30/2019 Gradient Methods

    7/53

    Motivation- real world problem

    Connectivity shapes (isenburg,gumhold,gotsman)

    What do we get only from C without geometry?{ ( , ), }mesh C V E geometry

  • 7/30/2019 Gradient Methods

    8/53

    Motivation- real world problem

    First we introduce error functionals and then try

    to minimize them:

    2

    3

    ( , )

    ( ) 1n

    s i j

    i j E

    E x x x

    ( , )

    1

    ( )i j ii j EiL x x xd

    3 2

    1

    ( ) ( )nn

    r i

    i

    E x L x

  • 7/30/2019 Gradient Methods

    9/53

    Motivation- real world problem

    Then we minimize:

    High dimension non-linear problem.

    The authors use conjugate gradient methodwhich is maybe the most popular optimizationtechnique based on what well see here.

    3

    ( , ) arg min 1 ( ) ( )n

    s rx

    E C E x E x

  • 7/30/2019 Gradient Methods

    10/53

    Motivation- real world problem

    Changing the parameter:

    3

    ( , ) arg min 1 ( ) ( )n

    s rx

    E C E x E x

  • 7/30/2019 Gradient Methods

    11/53

  • 7/30/2019 Gradient Methods

    12/53

    Background

    Motivation

    The gradient notion The Wolfe Theorems

  • 7/30/2019 Gradient Methods

    13/53

    :=f ( ),x y

    cos1

    2x

    cos1

    2y x

  • 7/30/2019 Gradient Methods

    14/53

    Directional Derivatives:

    first, the one dimension derivative:

  • 7/30/2019 Gradient Methods

    15/53

    x

    yxf

    ),(

    y

    yxf

    ),(

    Directional Derivatives :

    Along the Axes

  • 7/30/2019 Gradient Methods

    16/53

    v

    yxf

    ),(

    2

    Rv

    1v

    Directional Derivatives :

    In general direction

  • 7/30/2019 Gradient Methods

    17/53

    Directional

    Derivatives

    x

    yxf

    ),(

    y

    yxf

    ),(

  • 7/30/2019 Gradient Methods

    18/53

    In the plane

    2

    R

    RRf 2:

    y

    f

    x

    fyxf :),(

    The Gradient: Definition in

    ),( yxf

  • 7/30/2019 Gradient Methods

    19/53

    n

    n

    x

    f

    x

    fxxf ,...,:),...,(

    1

    1

    RRfn :

    The Gradient: Definition

  • 7/30/2019 Gradient Methods

    20/53

    The Gradient Properties

    The gradient defines (hyper) plane

    approximating the function infinitesimally

    yy

    fx

    x

    fz

  • 7/30/2019 Gradient Methods

    21/53

    The Gradient properties

    By the chain rule: (important for later use)

    vfpv

    fp ,)(

    1v

    pf

    v

  • 7/30/2019 Gradient Methods

    22/53

    The Gradient properties

    Proposition 1:

    is maximal choosing

    is minimal choosing

    (intuitive: the gradient points at the greatest change direction)

    v

    f

    p

    p

    ffv

    1

    pp

    ffv

    1

  • 7/30/2019 Gradient Methods

    23/53

    The Gradient properties

    Proof: (only for minimum case)

    Assign: by chain rule:

    p

    p

    p

    pp

    p

    p

    p

    p

    ff

    fff

    f

    f

    f

    fp

    v

    yxf

    2

    ,1

    )(

    )(

    1,)()(

    ),(

    p

    pffv

    1

  • 7/30/2019 Gradient Methods

    24/53

    The Gradient properties

    On the other hand for general v:

    p

    p

    pp

    fpv

    yxf

    f

    vfvfpv

    yxf

    )(),(

    ,)(),(

  • 7/30/2019 Gradient Methods

    25/53

    The Gradient Properties

    Proposition 2: let be asmooth function around P,

    if f has local minimum (maximum) at p

    then,

    (Intuitive: necessary for local min(max))

    RRfn :

    0 pf

    1

    C

  • 7/30/2019 Gradient Methods

    26/53

    The Gradient Properties

    Proof:

    Intuitive:

  • 7/30/2019 Gradient Methods

    27/53

    The Gradient Properties

    Formally: for any

    We get:

    }0{\nRv

    0)(

    ,)()0()(

    0

    p

    p

    f

    vf

    dt

    vtpdf

  • 7/30/2019 Gradient Methods

    28/53

    The Gradient Properties

    We found the best INFINITESIMAL DIRECTIONat each point,

    Looking for minimum: blind man procedure

    How can we derive the way to the minimum

    using this knowledge?

  • 7/30/2019 Gradient Methods

    29/53

    Background

    Motivation

    The gradient notion The Wolfe Theorems

  • 7/30/2019 Gradient Methods

    30/53

    The Wolfe Theorem

    This is the link from the previous gradient

    properties to the constructive algorithm.

    The problem:

    )(min xfx

  • 7/30/2019 Gradient Methods

    31/53

    The Wolfe Theorem

    We introduce a model for algorithm:

    Data:Step 0: set i=0

    Step 1: if stop,

    else, compute search directionStep 2: compute the step-size

    Step 3: set go to step 1

    n

    Rx

    0

    0)( ixfn

    i

    Rh

    )(minarg0

    iii hxf

    iiiihxx

    1

  • 7/30/2019 Gradient Methods

    32/53

    The Wolfe Theorem

    The Theorem: suppose C1

    smooth, and exist continuous function:

    And,

    And, the search vectors constructed by the

    model algorithm satisfy:

    RRf n :

    ]1,0[: nRk

    0)(0)(: xkxfx

    iiiii hxfxkhxf )()(),(

  • 7/30/2019 Gradient Methods

    33/53

    The Wolfe Theorem

    And

    Then if is the sequence constructed by

    the algorithm model,

    then any accumulation point y of this sequence

    satisfy:

    0}{ iix

    0)( yf

    00)( ihyf

  • 7/30/2019 Gradient Methods

    34/53

    The Wolfe Theorem

    The theorem has very intuitive interpretation :

    Always go in decent direction.

    )( ixf

    ih

  • 7/30/2019 Gradient Methods

    35/53

    Preview

    Background

    Steepest Descent Conjugate Gradient

  • 7/30/2019 Gradient Methods

    36/53

    Steepest Descent

    What it mean?

    We now use what we have learned toimplement the most basic minimization

    technique.

    First we introduce the algorithm, which is a

    version of the model algorithm.

    The problem:)(min xf

    x

  • 7/30/2019 Gradient Methods

    37/53

    Steepest Descent

    Steepest descent algorithm:

    Data:Step 0: set i=0

    Step 1: if stop,

    else, compute search direction

    Step 2: compute the step-size

    Step 3: set go to step 1

    n

    Rx

    0

    0)( ixf

    )( ii xfh

    )(minarg0

    iii hxf

    iiii

    hxx

    1

  • 7/30/2019 Gradient Methods

    38/53

    Steepest Descent

    Theorem: if is a sequence constructed

    by the SD algorithm, then every accumulation

    point y of the sequence satisfy:

    Proof: from Wolfe theorem

    Remark: Wolfe theorem gives us numerical stability if the derivatives arent

    given (are calculated numerically).

    0)( yf

    0}{ iix

  • 7/30/2019 Gradient Methods

    39/53

    Steepest Descent

    From the chain rule:

    Therefore the method of steepest descent

    looks like this:

    0),()( iiiii hhxfhxfdd

  • 7/30/2019 Gradient Methods

    40/53

    Steepest Descent

  • 7/30/2019 Gradient Methods

    41/53

    Steepest Descent

    The steepest descent find critical point and

    local minimum.

    Implicit step-size rule

    Actually we reduced the problem to finding

    minimum:

    There are extensions that gives the step size

    rule in discrete sense. (Armijo)

    RRf :

  • 7/30/2019 Gradient Methods

    42/53

    Steepest Descent

    Back with our connectivity shapes: the authors

    solve the 1-dimension problem analytically.

    They change the spring energy and get a

    quartic polynomial in x

    )(minarg0

    iii hxf

    2

    23

    ( , )

    ( ) 1n

    s i j

    i j E

    E x x x

  • 7/30/2019 Gradient Methods

    43/53

    Preview

    Background

    Steepest Descent Conjugate Gradient

  • 7/30/2019 Gradient Methods

    44/53

    Conjugate Gradient

    We from now on assume we want to minimize

    the quadratic function:

    This is equivalent to solve linear problem:

    There are generalizations to general functions.

    cxbAxxxf TT 2

    1)(

    bAxxf )(0

  • 7/30/2019 Gradient Methods

    45/53

    Conjugate Gradient

    What is the problem with steepest descent?

    We can repeat the same directions over and

    over

    Conjugate gradient takes at mostn steps.

  • 7/30/2019 Gradient Methods

    46/53

    Conjugate Gradient

    0x

    1x

    0d

    1e

    0e

    0

    ~x

    bxA ~

    ,...,...,, 10 jddd Search directions should span

    iiii dxx 1

    iii AexxAxf

    xAAxbAxxf

    )~()(

    ~)(

    xxe ii~

    n

  • 7/30/2019 Gradient Methods

    47/53

    Conjugate Gradient

    0x

    1x

    0d

    0

    ~x

    Given , how do we calculate ? (as before)jd

    i

    T

    i

    i

    T

    i

    i

    T

    i

    i

    T

    ii

    iii

    T

    i

    i

    T

    i

    iTi

    Add

    xfd

    Add

    AeddeAd

    Aed

    xfd

    )(0)(

    0

    0)(

    1

    1

    j

    )( 1 ixf

  • 7/30/2019 Gradient Methods

    48/53

    Conjugate Gradient

    0x

    1x

    0d

    1e

    0e

    0~x

    How do we find ?

    We want that after n step the error will be 0 :jd

    1

    0

    0

    n

    i

    ii de

    1

    0

    110020010 ...

    j

    i

    iij deddedee

    1

    0

    1

    0

    j

    i

    ii

    n

    i

    iij dde

  • 7/30/2019 Gradient Methods

    49/53

    Conjugate Gradient

    Here an idea: if then:jj

    11

    0

    1

    0

    1

    0

    1

    0

    n

    ji

    ii

    j

    i

    ii

    n

    i

    ii

    j

    i

    ii

    n

    i

    iij ddddde

    So if ,nj 0ne

  • 7/30/2019 Gradient Methods

    50/53

    Conjugate Gradient

    So we look for such that :jj jd

    0iT

    j Add

    Simple calculation shows that if we take

    A - conjugate (- orthogonal)ji

  • 7/30/2019 Gradient Methods

    51/53

    Conjugate Gradient

    We have to find anA conjugate basis

    We can do gram-schmidt process, but we

    should be careful since it is an O(n) process:

    1...0, njdj

    k

    i

    k

    kiii dud

    1

    0

    ,nuuu ,...,, 21

    Some series of vectors

  • 7/30/2019 Gradient Methods

    52/53

    Conjugate Gradient

    So for a arbitrary choice of we dont earn

    nothing.

    Luckily, we can choose so that the

    conjugate direction calculation is O(m) where

    m is the number of non-zero entries in .

    The correct choice of is:

    iu

    iu

    A

    iu

    )( ii xfu

  • 7/30/2019 Gradient Methods

    53/53

    Conjugate Gradient

    So the conjugate gradient algorithm for minimizing f:Data:

    Step 0:

    Step 1:

    Step 2:

    Step 3:

    Step 4: and repeat n times.

    nx 0

    )(: 000 xfrd

    i

    T

    i

    i

    T

    ii

    Add

    rr

    iiii dxx 1

    i

    T

    i

    i

    T

    ii

    rr

    rr 111

    iiii drd 111

    )(: ii xfr