optimization in r: algorithms, sequencing, and automatic differentiation

42
Optimization in R: algorithms, sequencing, and automatic differentiation James Thorson Aug. 26, 2011

Upload: serena

Post on 22-Feb-2016

63 views

Category:

Documents


0 download

DESCRIPTION

Optimization in R: algorithms, sequencing, and automatic differentiation. James Thorson Aug. 26, 2011. Themes. Basic: Algorithms Settings Starting location Intermediate Sequenced optimization Phasing Parameterization Standard errors Advanced Derivatives. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Optimization in R: algorithms, sequencing, and automatic differentiation

Optimization in R: algorithms, sequencing, and automatic differentiation

James ThorsonAug. 26, 2011

Page 2: Optimization in R: algorithms, sequencing, and automatic differentiation

2

ThemesBasic:

Algorithms

Settings

Starting location

Intermediate

Sequenced optimization

Phasing

Parameterization

Standard errors

Advanced

Derivatives

Page 3: Optimization in R: algorithms, sequencing, and automatic differentiation

3

Outline1. One-dimensional

2. Two-dimensional

3. Using derivatives

Page 4: Optimization in R: algorithms, sequencing, and automatic differentiation

4

ONE-DIMENSIONAL

Page 5: Optimization in R: algorithms, sequencing, and automatic differentiation

5

Basic: Algorithm

• Characterists– Very fast

– Somewhat unstable

• Process– Starts with 2 points

– Moves in direction of higher point

– Then goes between two highest points

optimize(fn =, interval =, ...)

Page 6: Optimization in R: algorithms, sequencing, and automatic differentiation

6

Basic: Algorithm

Page 7: Optimization in R: algorithms, sequencing, and automatic differentiation

7

Basic: Algorithm

Page 8: Optimization in R: algorithms, sequencing, and automatic differentiation

8

Intermediate: Sequenced Sequencing:

1. Using a stable but slow method

2. Then using a fast method for fine-tuning

One-dimensional sequencing

3. Grid-search

4. Then use optimize()

Page 9: Optimization in R: algorithms, sequencing, and automatic differentiation

9

Intermediate: Sequenced

Page 10: Optimization in R: algorithms, sequencing, and automatic differentiation

10

Basic: AlgorithmsOther one-dimensional functions

• uniroot – Finds where f( ) = 0∙• polyroot – Finds all solutions to f( ) = 0∙

Page 11: Optimization in R: algorithms, sequencing, and automatic differentiation

11

TWO-DIMENSIONAL

Page 12: Optimization in R: algorithms, sequencing, and automatic differentiation

12

Basic: Settings

• trace = 1– Means different things for different optimization

routines

– In general, gives output during optimization

– Useful for diagnostics

optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))

Page 13: Optimization in R: algorithms, sequencing, and automatic differentiation

13

Basic: Settings

optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))

Page 14: Optimization in R: algorithms, sequencing, and automatic differentiation

14

Basic: Settings

• follow.on = TRUE– Starts subsequent methods at last stopping point

• method = c(“nlminb”,”L-BFGS-U”)– Lists the set and order of methods to use

optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))

calcMin() in “PBSmodelling” package

Page 15: Optimization in R: algorithms, sequencing, and automatic differentiation

15

Basic: SettingsContraints

• Unbounded

• Bounded– I recommend using bounds

– Box-constraints are common

• Non-box constraints– Usually implemented in the objective function

Page 16: Optimization in R: algorithms, sequencing, and automatic differentiation

16

Basic: AlgorithmsDifferences among algorithms:

• Speed vs. accuracy

• Unbounded vs. bounded

• Can use derivatives

Page 17: Optimization in R: algorithms, sequencing, and automatic differentiation

17

Basic: AlgorithmsNelder-Mead (a.k.a. “Simplex”)

• Characteristics– Bounded (nlminb)

– Unbounded (optimx)

– Cannot use derivatives

– Slow and but good at following valleys

– Easily stuck at local minima

Page 18: Optimization in R: algorithms, sequencing, and automatic differentiation

18

Basic: AlgorithmsNelder-Mead (a.k.a. “Simplex”)

• Process– Uses a polygon with n+1 vertices

– Take worst point and rotate across center

– If worse: shrink

– If better: Accept and expand along axis

Page 19: Optimization in R: algorithms, sequencing, and automatic differentiation

Basic: Algorithms

XX

YY

-1 0 1 2 3

-10

12

3

XX

YY

-1 0 1 2 3

-10

12

3

XX

YY

-1 0 1 2 3

-10

12

3

XX

YY

-1 0 1 2 3

-10

12

3

XX

YY

-1 0 1 2 3

-10

12

3

XX

YY

-1 0 1 2 3

-10

12

3

XX

YY

-1 0 1 2 3

-10

12

3

XX

YY

-1 0 1 2 3

-10

12

3

XX

YY

-1 0 1 2 3

-10

12

3

XX

YY

-1 0 1 2 3

-10

12

3

XX

YY

-1 0 1 2 3

-10

12

3

XX

YY

-1 0 1 2 3-1

01

23

Page 20: Optimization in R: algorithms, sequencing, and automatic differentiation

20

Basic: AlgorithmsRosenbrock “Banana” Function

Page 21: Optimization in R: algorithms, sequencing, and automatic differentiation

21

Basic: AlgorithmsQuasi-Newton (“BFGS”)

• Characteristics– Bounded (optim, method=“BFGS”)

– Unbounded (optim, method=“L-BFGS-U”)

– Can use derivatives

– Fast and less accurate

Page 22: Optimization in R: algorithms, sequencing, and automatic differentiation

22

Basic: AlgorithmsQuasi-Newton (“BFGS”)

• Process– Approximates gradient and Hessian

– Uses Newton’s method to update location

– Uses various other methods to update gradient and Hessian

Page 23: Optimization in R: algorithms, sequencing, and automatic differentiation

23

Basic: Algorithms

Page 24: Optimization in R: algorithms, sequencing, and automatic differentiation

24

Basic: AlgorithmsQuasi-Newton (“ucminf”)

• Different variation on quasi-Newton

Page 25: Optimization in R: algorithms, sequencing, and automatic differentiation

25

Basic: Algorithms

Page 26: Optimization in R: algorithms, sequencing, and automatic differentiation

26

Basic: AlgorithmsConjugate gradient

• Characteristics:– Bounded (optim)

– Very fast for near-quadratic problems

– Low memory

– Highly unstable generally

– I don’t recommend it for general usage

Page 27: Optimization in R: algorithms, sequencing, and automatic differentiation

27

Basic: AlgorithmsConjugate gradient

• Process– Numerical calculation of derivatives

– Subsequent derivatives are “conjugate” (i.e. form an optimal linear basis for a quadratic problem)

Page 28: Optimization in R: algorithms, sequencing, and automatic differentiation

28

Basic: Algorithms

Page 29: Optimization in R: algorithms, sequencing, and automatic differentiation

29

Basic: AlgorithmsMany others!

As one example….

Spectral project gradient

• Characterististics– ???

• Process– ???

Page 30: Optimization in R: algorithms, sequencing, and automatic differentiation

30

Basic: Algorithms

Page 31: Optimization in R: algorithms, sequencing, and automatic differentiation

31

Basic: AlgorithmsAccuracy trials

Npar bobyqa

newuoa

Rvmmin

nlminb

Rcgmin

ucminf L-BFGS-B

nlm spg Nelder-Mead

BFGS CG

1 50 0 0 1 0 1 0 1 1 1 0 1 1

2 50 0 0 0 1 1 0 1 1 0 0 0 1

3 50 0 0 0 1 1 0 1 1 0 0 0 1

4 2 0 0 0 1 1 1 1 0 0 1 0 0

5 3 0 NA 1 1 0 NA 1 NA 1 NA NA NA

6 50 0 0 1 0 1 0 1 1 1 0 1 1

7 50 0 0 1 0 1 0 1 1 1 0 1 1

8 50 0 0 0 1 1 1 1 1 1 0 1 1

9 303 0 0 1 1 1 0 1 1 1 0 1 1

10 5 0 NA 1 1 1 NA 1 NA 1 NA NA NA

Page 32: Optimization in R: algorithms, sequencing, and automatic differentiation

32

Basic: starting locationIt’s important to provide a good starting

location!– Some methods (like nlminb) find the nearest local

minimum

– Speeds convergence

Page 33: Optimization in R: algorithms, sequencing, and automatic differentiation

33

Intermediate: ParameterizationSuggestions:

1. All parameters on a similar scale– Derivatives are approximately equal

– One method: use exp() and plogit() for inputs

2. Minimize covariance

3. Minimize changes in scale or covariance

Page 34: Optimization in R: algorithms, sequencing, and automatic differentiation

34

Intermediate: PhasingPhasing

1. Estimate some parameters (with others fixed) in a first phase

2. Estimate more parameters in each phase

3. Eventually estimate all parameters

Uses

4. Multi-species models

• Estimate with linkages in later phases

5. Statistical catch-at-age

• Estimate scale early

Page 35: Optimization in R: algorithms, sequencing, and automatic differentiation

35

Intermediate: Standard errorsMaximum likelihood allows asymptotic

estimates of standard errors

1. Calculate Hessian matrix at maximum likelihood estimate– Second derivatives of Log-Likelihood function

2. Invert the Hessian

3. Diagonal entries are variances

4. Square root is standard error

Page 36: Optimization in R: algorithms, sequencing, and automatic differentiation

36

Intermediate: Standard errorsCalculation of Hessian depends on parameter

transformations

• When using exp() or logit() transformations, use the delta-method to transform back to normal space

Page 37: Optimization in R: algorithms, sequencing, and automatic differentiation

37

Intermediate: Standard errors

Page 38: Optimization in R: algorithms, sequencing, and automatic differentiation

38

Intermediate: Standard errorsGill and King (2004) “What to do when your

Hessian is not invertible”

gchol() – Generalized Cholesky (“kinship”)

ginv() – Moore-Penrose Inverse (“MASS”)

Page 39: Optimization in R: algorithms, sequencing, and automatic differentiation

39

Intermediate: Standard errors[

Switch over to R-screen to show mle() and solve(hess())

]

Page 40: Optimization in R: algorithms, sequencing, and automatic differentiation

40

Advanced: Differentiation

Gradient:

• Quasi-newton

• Conjugate gradient

Hessian:

• Quasi-newton

optimx(par = , fn = , gr=, hess=, lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))

Page 41: Optimization in R: algorithms, sequencing, and automatic differentiation

41

Advanced: DifferentiationAutomatic differentiation

• AD Model Builder

• “radx” package (still in development)

Semi-Automatic differentiation

• “Rsympy” package

Symbolic differentiation

• “deriv”

BUT:

None of these handle loops or “sum/prod”

so they’re not really helpful for statistics yet

Page 42: Optimization in R: algorithms, sequencing, and automatic differentiation

42

Advanced: DifferentiationMixture distribution model (~ 15 params)

• 10 seconds in R

• 2 seconds in ADMB

Multispecies catchability model (~ 150 params)

• 4 hours in R (using trapezoid method)

• 5 minutes in ADMB (using MCMC)

Surplus production meta-analysis (~ 750 coefs)

• 7 days in R (using trapezoid method)

• 2 hours in ADMB (using trapezoid method)