the newton-raphson algorithm - university of...

The Newton-Raphson Algorithm

David AllenUniversity of Kentucky

January 31, 2013

1 The Newton-Raphson Algorithm

The Newton-Raphson algorithm, also called Newton’smethod, is a method for finding the minimum ormaximum of a function of one or more variables. It isnamed after named after Isaac Newton and JosephRaphson.

Back 2

Its use in statisticsStatisticians often want to find parameter values thatminimize an objective function such a residual sum ofsquares or a negative log likelihood function. As θ is apopular symbol for a generic parameter, θ is used here torepresent the argument of an objective function.Newton’s algorithm is for finding the value of θ thatminimizes an objective function.

Back 3

SynopsisThe basic Newton’s algorithm starts with a provisionalvalue of θ. Then it

1. constructs a quadratic function with the same value,slope, and curvature as the objective function at theprovisional value;

2. finds the value of θ that minimizes the quadraticfunction; and

3. resets the provisional value to this minimizing value.

If all goes well, these steps are repeated until theprovisional value converges to the minimizing value.

Back 4

An example with one variableThe next few slides demonstrate repeated applications ofthe steps above for a scalar θ.

Back 5

The First ApproximationThe first approximation is with θ = 0.5.

0.5000

Back 6

The Second ApproximationThe second approximation is with θ = 2.25.

2.2500

Back 7

The Third ApproximationThe third approximation is with θ = 1.5694.

1.5694

Back 8

The Final ApproximationThe estimate of θ is 1.4142.

1.4142

Back 9

In Matrix NotationLet o(θ) be the objective function to be minimized. Itsvector of first derivatives, called the gradient vector, is

g(θ) =d

dθo(θ)

Its matrix of second derivatives, called the Hessianmatrix, is

H(θ) =d2

dθdθto(θ)

Back 10

The quadratic approximationThe quadratic approximation of o(θ) at θ = θ0 in terms ofthe gradient vector and Hessian matrix is

o(θ) = o(θ0) + gt(θ0)(θ− θ0) +1

2(θ− θ0)tH(θ0)(θ− θ0)

Provided H(θ0) is positive definite, the approximatingquadratic function is minimized by

θ = θ0 −H−1(θ0)g(θ0)

Back 11

ImplementationThere may be problems with convergence in practice, soNewton’s algorithm must be implemented with controls.Excellent discussions of Newton’s algorithm are given inDennis and Schnabel [1], Fletcher [2], Nocedal andWright [4], and Gill, Murray, and Wright [3].

Back 12

Minimum or Maximum?By checking second derivatives Newton’s algorithmprovides a definitive check that a minimum, maximum,or saddle point of the objective function is found.

Back 13

Rosenbrock’s functionThe Rosenbrock function is

100(2 − 21)2 + (1− 1)2.

Rosenbrock’s function is a frequently used test functionfor numerical optimization procedures. Even though it isa simple looking function of two variables, it has somegotchas.

Back 14

An exercise

Exercise 1.1. Write an R program to apply Newton’smethod to the Rosenbrock function. Do not use built in Rfunctions except for solve. Run your program usingdifferent starting values and observe the results.

Back 15

2 Least Squares

In situations where the response observations areuncorrelated with equal variances, least squares is thepreferred method of estimation. Let Y represent the thresponse observation and η(θ) its expected value. Hereθ is a vector of parameters that is functionallyindependent of the variance. The residual sum of squaresis

s(θ) =n∑

=1

(Y − η(θ))2 (1)

where n is the number of observations. The least squaresestimate of θ is the value of θ that minimizes s(θ)(assuming the minimum exists).

Back 16

Derivatives of the residual sum of squaresThe vector of first derivative of s(θ), called the gradientvector, is

g(θ) = −2n∑

=1

(Y − η(θ))d

dθη(θ) (2)

The matrix of second derivatives, called the Hessianmatrix, is

H(θ) = 2n∑

=1

d

dθη(θ)

d

dθtη(θ)−2

n∑

=1

(Y−η(θ))d2

dθ dθtη(θ)

(3)

Back 17

The quadratic approximationThe quadratic approximation of s(θ) at θ = θ0 in terms ofthe gradient vector and Hessian matrix is

s(θ) = s(θ0) + gt(θ0)(θ− θ0) +1

2(θ− θ0)tH(θ0)(θ− θ0)

Newton’s algorithm, with the terms in H(θ) involvingsecond derivatives omitted, is called the Gauss-Newtonalgorithm.

Back 18

The minimizing valueProvided H(θ0) is positive definite, the approximatingquadratic function is minimized by

θ = θ0 −H−1(θ0)g(θ0)

Back 19

SummaryIn the preceding, the objective function is the residualsum of squares. The chain rule of differentiation providesformulas needed to calculate quadratic approximation ofthe objective function in terms of derivatives d

dθη(θ) andd2

dθ dθtη(θ). When the η(θ) are components of a solutionof linear differential equations, the partial derivatives canbe calculated by a computer.

In the case of other objective functions, a similar processmust be followed i.e. use the chain rule to findexpressions for g(θ) and H(θ) in terms of d

dθη(θ) andd2

dθ dθtη(θ). Unfortunately, this is sometimes difficult.

Back 20

References

[1] J. E. Dennis, Jr. and Robert B. Schnabel. NumericalMethods for Unconstrained Optimization andNonlinear Equations. Prentice-Hall, Inc., EnglewoodCliffs, New Jersey 07632, 1983.

[2] Roger Fletcher. Practical methods of optimization,volume 1, Unconstrained optimization. John Wiley &and Sons, Ltd., 1980.

[3] Philip E. Gill, Walter Murray, and Margaret H. Wright.Practical Optimization. Academic Press, Inc., 1981.

[4] Jorge Nocedal and Stephen J. Wright. NumericalOptimization. Springer-Verlag New York, Inc., 1999.

Back 21

the newton-raphson algorithm - university of...

Documents