note_set_1

24
1 Note Set 1 – The Basics 1.1 – Overview In this note set, we will cover the very basic tools of numerical methods. Numerical methods are useful for both formal theory and methods because they free us from having to employ restrictive assumptions in order to obtain solutions to important problems. Consider applications to formal theory first. Stylized formal models often show that some result holds under a certain set of conditions. A good theorist however, will attempt to characterize all conditions under which that result holds. The ideal is an if and only if result. This may not be possible, but general conditions are more informative than specific ones. When we go down this road, however, we may be able to derive certain nice results, but lack specificity. Without restrictive assumptions, we may not be able to solve our model analytically. When we make restrictive assumptions, then we would like to at least make them realistic. The set of realistic assumptions will not always be the

Upload: fatcode27

Post on 18-Jul-2016

214 views

Category:

Documents


1 download

DESCRIPTION

stochastic

TRANSCRIPT

Page 1: Note_Set_1

1

Note Set 1 – The Basics

1.1 – Overview

In this note set, we will cover the very basic tools of numerical methods.

Numerical methods are useful for both formal theory and methods because they free us

from having to employ restrictive assumptions in order to obtain solutions to important

problems.

Consider applications to formal theory first. Stylized formal models often show

that some result holds under a certain set of conditions. A good theorist however, will

attempt to characterize all conditions under which that result holds. The ideal is an if and

only if result. This may not be possible, but general conditions are more informative than

specific ones. When we go down this road, however, we may be able to derive certain

nice results, but lack specificity. Without restrictive assumptions, we may not be able to

solve our model analytically. When we make restrictive assumptions, then we would like

to at least make them realistic. The set of realistic assumptions will not always be the

Page 2: Note_Set_1

2

ones that lead to an easy solution. Here is where numerical methods provide a great

advantage, because they free of form having to choose the set of assumptions that make

our model easy to solve, and allow us to choose these assumptions based on other

concerns (e.g. realism, generality).

1.2 – Numerical Differentiation

Differentiation differs from many of the operation we consider because it will

transform an analytical expression into another analytical one. If our original expression

is made up of plusses, minuses, multiplication signs, division signs, powers, roots,

exponentials, logarithms, sines, cosigns, etc., then results derivative will have a

representation made up of the same set of expressions. If all we were required to do is

compute derivatives, we may never need to employ numerical methods. However, mixing

analytic and numerical methods can be hard. Because of this, we will often apply

numerical differentiation because it is required by the other numerical procedures.

Alternatively, if we would like to study the behavior of a function and are able to

compute its derivative, it may still be advantageous to apply numerical differentiation.

There are many important applications of numerical differentiation. Many

methods for solving nonlinear systems of equations and the optimization of non-linear

functions require that derivatives be supplied. It is sometimes possible to supply analytic

derivatives to the function, but there are limitations to this approach. Computing

analytical derivates can vary from tedious to near impossible (imagine optimization a log-

likelihood that depends on a variance matrix through its Cholesky decomposition). Other

Page 3: Note_Set_1

3

applications of numerical derivatives include computing standard errors for econometric

models, marginal effects for econometric models, and solving games with continuous

strategy spaces.

Recall that the definition of the derivatives of the function ( )f x is,

0

( ) ( )'( ) limh

f x h f xf xh→

+ −=

We assume that have access to a numerical function that computes ( )f x . For example,

we may have the c++ code,

double Func(double &x) { return x * x + 7; }

We would like to be able to take the function Func and the point x as inputs and return

the derivative of Func at x. The definition of the derivative suggests an approach- select a

small value, h, and compute,

( ) ( )'( ) f x h f xf xh

+ −≈

The key question here is, how small should h be? For example, why not choose

801.0*10h −= ? How small we can select h is limited by the precision of the computer.

Real numbers are stored on a computer using a finite number of bits representing the

decimal place, the sign, and the exponent. A float is represented by 4 bytes (or 32 bits)

and a double is represented by 8 bytes (or 64 bits). If we select 801.0*10h −= , then the

computer will likely not be able to return a different value for ( )f x h+ and ( )f x , and

are estimated derivative will be (arbitrarily) evaluated as zero.

Page 4: Note_Set_1

4

There are two major sources of error in computing numerical derivatives-

truncation error and round-off error. The first trick we employ is to make sure that x and

x h+ differ exactly by a number that can be represented by a computer. This will reduce

one source of round-off error to zero. This can be accomplished using the following lines

of code,

double Temp = x + h; h = Temp – x;

Let mε denote the smallest number that a computer can represent. For example,

for a typical Intel/AMD pc, this number will be -162.22045*10 . There exist routines that

will compute this number for your computer. The remaining round off error will have

size ~ | ( ) / |m f x hε . To calculate the truncation error, consider the Taylor expansion,

212( ) ( ) '( ) ''( ) ...f x h f x f x h f x h+ = + + +

The truncation error is given by ~ ''( )f x h . Now, imagine choosing the size of h to

minimize the total error 12 ''( ) | ( ) | /mf x h f x hε+ . We will obtain,

''m fhfε

≈ . It is often

good enough to choose | |''

f xf

≈ when x is not to close to zero. This leads to the

heuristic, | |, | | 0

,m

m

x xh

otherwise

ε

ε

⎧ ≠⎪≈ ⎨⎪⎩

. An alternative that is sometimes used is

(1 | |)mh xε≈ + . This may seem a little ad-hoc, but we will see later that this method is

quite good (and certainly a lot better than guessing!).

Page 5: Note_Set_1

5

The truncation error in the above calculation is ( )O h . We can do better then this

by employing a higher order Taylor expansion,

2 31 12 6( ) ( ) '( ) ''( ) '''( ) ...f x h f x f x h f x h f x h+ = + + +

2 31 12 6( ) ( ) '( ) ''( ) '''( ) ...f x h f x f x h f x h f x h− = − + −

Notice that,

213

( ) ( )'( ) '''( ) ...2

f x h f x hf x f x hh

+ − −≈ − +

This is the second difference formula for computing numerical derivatives. Notice that

this yields are more accurate calculation, with an accuracy of 2( )O h , but requires an

extra function evaluation (assuming that f is already available). One can use the same

approach to calculate an optimal h . We get, 1 3 | |mh xε≈ .

We may also want to obtain higher-order derivatives. We can obtain these

derivatives by “iterating” the approach suggested above. For example, let use combine,

2 3 (4) 41 1 12 6 24( ) ( ) '( ) ''( ) '''( ) ( ) ...f x h f x f x h f x h f x h f x h+ = + + + +

2 3 (4) 41 1 12 6 24( ) ( ) '( ) ''( ) '''( ) ( ) ...f x h f x f x h f x h f x h f x h− = − + − + +

to obtain approximation to '( )f x at two points,

2 (4) 31 1 12 6 24

( ) ( ) '( ) ''( ) '''( ) ( ) ...f x h f x f x f x h f x h f x hh

+ −= + + + +

2 (4) 31 1 12 6 24

( ) ( ) '( ) ''( ) '''( ) ( ) ...f x f x h f x f x h f x h f x hh

− −= − + + +

We can this apply the first-difference principle again to obtain,

(4) 2112

( ) ( ) ( ) ( )

''( ) ( ) ...

f x h f x f x f x hh h f x f x h

h

+ − − −−

= + +

Page 6: Note_Set_1

6

We can write this as,

(4) 21122

( ) 2 ( ) ( )''( ) ( ) ...f x h f x f x hf x f x hh

+ − + −≈ − +

We can apply a similar principle to obtain 1 4 | |mh xε≈ with an error rate of 2( )O h .

The question arises, how well do these methods work in practice. In practice, they

work quite well, as the results below in Table 1.1. We computed the first and second

derivatives of the function 1( ) 1.5f x x−= at 2.75x = for step sizes of various values, as

well as the optimal values indicated by the formulas. We can see that the optimal values

do a good job of finding the best possible step size. C++ code for this example will be

available on the course website.

The discussion above assumed that the function ( )f x could be computed at

machine precision. Depending on the function we are dealing with, this may or may not

be the case. For “simple” problems, it will be the case. Suppose, alternatively, that we are

optimization a likelihood that involves an integral, which is computed using quadrature

methods (see the next section). This will involve error in the calculation and will likely

result in a bumpy objective function. Figure 1.2 plots a simulated method of moments

objective function. These can be quite messy at a small scale, so if we want to compute

derivatives, we should use step sizes on the scale where the function begins to be smooth.

Page 7: Note_Set_1

7

Table 1.1: First and Second Derivatives of 1( ) 1.5f x x−= at 2.75x =

First Derivatives Second Derivatives h Forward Backward Central h Forward Backward

1.00E-01 6.96E-03 -7.48E-03 -2.63E-04 1.00E-01 -1.45E-02 1.72E-02 1.00E-02 7.19E-04 -7.24E-04 -2.62E-06 1.00E-02 -1.56E-03 1.59E-03 1.00E-03 7.21E-05 -7.22E-05 -2.62E-08 1.00E-03 -1.57E-04 1.58E-04 1.00E-04 7.21E-06 -7.21E-06 -2.62E-10 1.00E-04 -1.58E-05 1.57E-05 1.00E-05 7.21E-07 -7.21E-07 8.26E-13 1.00E-05 -3.38E-06 1.06E-06 1.00E-06 7.21E-08 -7.22E-08 -5.05E-11 1.00E-06 7.66E-05 -1.45E-04 1.00E-07 7.26E-09 -7.18E-09 4.13E-11 1.00E-07 7.66E-05 7.66E-05 1.00E-08 8.26E-10 -1.03E-08 -4.73E-09 1.00E-08 -1.44E-01 -1.25E+00 1.00E-09 7.34E-09 7.34E-09 7.34E-09 1.00E-09 -1.44E-01 -1.44E-01 1.00E-10 2.29E-07 -8.81E-07 -3.26E-07 1.00E-10 -1.44E-01 -1.11E+04 1.00E-11 5.78E-06 -5.32E-06 2.29E-07 1.00E-11 -1.11E+06 -1.11E+06 1.00E-12 7.89E-05 -3.21E-05 2.34E-05 1.00E-12 -1.11E+08 -1.44E-01 1.00E-13 5.69E-04 -5.42E-04 1.38E-05 1.00E-13 -1.11E+10 -1.11E+10 1.00E-14 2.69E-03 -8.17E-03 -2.74E-03 1.00E-14 -1.44E-01 -1.06E+12 1.00E-15 7.33E-02 -5.17E-02 1.08E-02 1.00E-15 -1.41E+14 -1.44E-01 1.00E-16 -nan -nan -nan 1.00E-16 -nan -nan 1.00E-17 -nan -nan -nan 1.00E-17 -nan -nan 1.00E-18 -nan -nan -nan 1.00E-18 -nan -nan 1.00E-19 -nan -nan -nan 1.00E-19 -nan -nan Optimal h 5.08E-09 -3.05E-09 -6.17E-12 Optimal h -5.28E-05 5.28E-05

Page 8: Note_Set_1

8

Figure 1.2: Plot of a Simulated of Moments Objective Function

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.100000 0.100002 0.100003 0.100005 0.100006 0.100007 0.100009

Page 9: Note_Set_1

9

1.3 – Numerical Integration

Unlike differentiation, most integration problem will not admit numerical

solution. Hence, we often have no choice but to employ numerical methods. For example,

consider the integral 23(1 ) x x

xx e dx− ++∫ . The integral clearly does not admit an analytical

solution. Like numerical differentiation, you can probably guess what the first approach

to numerical integration will be.

Recall the definition of the Riemann integral. For any

0 1 2 2 ... n na x t x t t x b= < < < < < < = , we consider the approximation,

( ) 11

( ) ( )nb

i i ix ai

f x dx f t x x+==

≈ −∑∫

If we consider patricians such that the distance between ix and 1ix + is arbitrarily small for

all i , we get the value of the integral. This suggests the following formula for numerical

integration,

( )11 12

1( ) ( ) ( )

nb

i i i ix ai

f x dx f x x x x+ +==

≈ + −∑∫

If we set ( )ii nx a b a= + − . We get,

( )2 112

1( ) ( ) ( )

nbi

n nx ai

f x dx b a f a b a+

==

≈ − + −∑∫

More precisely, let us consider integrating the function between ix and 1ix + where

1i ih x x+= − . Using A Taylor expansion, we can obtain,

Page 10: Note_Set_1

10

211 2( ) ( ) ( ) '( ) ...i i i iF x F x f x h f x h+ = + + +

where F denotes the indefinite integral of f . Hence, we have,

1 212( ) ( ) '( ) ...i

i

x

i ix xf x dx f x h f x h+

== + +∫

Summing these terms, we obtain,

11 1 1

21 12 2

0 0 0( ) ( ) ( ) '( ) ...i

i

n n nb x

i ix a x xi i i

f x dx f x dx h f x h f x+− − −

= == = =

= = + +∑ ∑ ∑∫ ∫

We therefore have,

1

0( ) ( ( )) ( )

nb

x ai

f x dx h f a h b a O h−

==

= + − +∑∫

This is almost exactly the same as the formula derived above. A higher-order expansion

will improve on the accuracy, but requires higher-order differentiability of the function

being integrated.

The methods outlined above are sometimes useful, but only problems that are

very messy or problems for which extremely high precision is desired. I’ll elaborate on

this later. Perhaps the most widely used method is Gaussian quadrature. Gaussian

quadrature can be used for functions that are “well approximated by a polynomial”. In

particular, n -point quadrature wield yield an exactly correct expression for functions that

are equal to a 2 1n − -degree polynomial multiplied by some known weighting function

( )W x . In particular, the formula is,

1( ) ( ) ( )

nb

i ix ai

W x f x dx w f x=

=

≈ ∑∫

This will produce an exactly correct answer when ( )f x is a polynomial. The trick then is

to find the weights iw and the evaluation points ix .

Page 11: Note_Set_1

11

The following are the weighting functions that are typically used (each one is

given a name):

1. Gauss-Legendre Quadrature: ( ) 1W x = for 1 1x− < <

2. Guass-Chebychev Quadrature: 2 1 2( ) (1 )W x x −= − for 1 1x− < <

3. Guass-Laguerre: ( ) xW x x eα −= for 0 x< < ∞

4. Guass-Hermite: 2

( ) xW x e−= for x−∞ < < ∞

5. Gauss-Jacobi: ( ) (1 ) (1 )W x x xα β= − + for 1 1x− < <

How do we go about finding the weights and evaluation points then? Consider

Gauss-Hermite Quadrature and consider the polynomial 2 1

0

ni

ii

a x−

=∑ . We have,

2 22 1 2 1

0 0

n ni x i x

i ix xi i

a x e dx a x e dx− −∞ ∞− −

=−∞ =−∞= =

⎛ ⎞ =⎜ ⎟⎝ ⎠∑ ∑∫ ∫

Integrating by parts, we can determine that,

2 2212 ( 1)i x i x

x xx e dx i x e dx

∞ ∞− − −

=−∞ =−∞= −∫ ∫

2

0x

xxe dx

∞ −

=−∞=∫ ,

2

2x

xe dx π

∞ −

=−∞=∫

Notice, we want to have,

22 1 2 1

0 1 0( )

n n ni x j

i i j ixi i j

a x e dx w a x− −∞ −

=−∞= = =

⎛ ⎞= ⎜ ⎟

⎝ ⎠∑ ∑ ∑∫

For example, when 2n = , we have,

Page 12: Note_Set_1

12

2 2 2 22 30 1 2 3

10 2 2

2 2 3 31 2 0 1 1 2 2 1 1 1 2 2 2 1 1 2 2 3

2 2

( ) ( ) ( ) ( )

x x x x

x x x xa e dx a xe dx a x e dx a x e dx

a a

w w a w x w x a w x w x a w x w x a

π π

∞ ∞ ∞ ∞− − − −

=−∞ =−∞ =−∞ =−∞+ + +

= +

= + + + + + + +

∫ ∫ ∫ ∫

Notice that we need,

1 2

1 1 2 2

2 2 11 1 2 2 2

3 31 1 2 2

20

2

0

w ww x w x

w x w x

w x w x

π

π

+ =+ =

+ =

+ =

One can show that the solution to this system satisfies,

1 11 2 1 22 2 2

, ,w w x xπ= = = − =

This procedure works more generally, and for all of the quadrature formulas, except that

we don’t solve them by hand! Instead, there are standard computer programs that are

designed to computer the solutions to such systems of equations.

How do we choose which formula to use? The most important concern is the

range of the function. For example, is we want to compute the integral 21

27

3

x

xxe dx−

=∫ , we

would see that the function has a finite range. Therefore, we would apply Legendre,

Chebychev, or Jacoby. Guass Hermite would be a poor choice here because even though

we can write this integral as, 21

21{3 7} x

xx xe dx

∞ −

=−∞≤ ≤∫ , the resulting function

1{3 7}x x≤ ≤ would not we “well approximated by a polynomial.

Suppose we are estimating a Heckman selection model and have the equations,

* 'n n ny xα ε= +

* 'n n nr zβ η= +

Page 13: Note_Set_1

13

where nε and nη are standard normal random deviates with correlation ρ . Now, we

observe 1{ * 0}n ny y= ≥ only if 1{ * 0} 1n nr r= ≥ = . This means that conditional on ( , )n nx z ,

we observe three possible events- 0nr = , 0, 1n ny r= = , and 1, 1n ny r= = . Computing the

integral of the first event is easy,

Pr( 0 | , ) Pr( * 0 | , ) Pr( ' 0 | , )Pr( ' | , ) ( ' )

n n n n n n n n n n

n n n n n

r x z r x z z x zz x z z

β ηη β β= = ≤ = + ≤

= ≤ − = Φ −

Consider one of the other events,

2 2122(1 )

2

( 2 )' '1

2 2(1 )

Pr( 0, 1| , ) Pr( * 0, * 0 | , )Pr( ' 0, ' 0 | , )Pr( ' , ' | , )

n n

n n n n n n n n

n n n n n n

n n n n n n

x z

y r x z r y x zx z x z

x z x z

e d dρε ρεη ηα β

π ρε η

α ε β ηε α η β

ε η−− − +− −

−=−∞ =−∞

= = = ≥ ≤= + ≤ + ≤

= ≤ − ≤ −

= ∫ ∫

We can reduce the integrals by completing the square (or factoring the joint distraction

into a marginal and conditional),

2121 22(1 )22

( )' '1 12 (1 ) 2

Pr( 0, 1| , )n n

n n n n

z x

y r x z

e e d dρε ρηβ αη

π ρ πη εε η−

− −− −−

−=−∞ =−∞

= =

= ∫ ∫

If we use the change of variables 21

u ε ρηρ

−=

−, we obtain,

2 21 122 2

212

''1 112 2

'12 2

Pr( 0, 1| , )

'1

nn

n

n n n n

xz u

u

z n

y r x z

e e dud

xe d

α ρηβ η

ρπ πη

β ηπη

η

α ρη ηρ

− −− − −

−=−∞ =−∞

− −

=−∞

= =

=

⎛ ⎞− −⎜ ⎟= Φ⎜ ⎟−⎝ ⎠

∫ ∫

There are standard algorithms for computing Φ efficiently, so we have reduced our

problem to a one dimensional integral. Since the bounds are half infinite, Guass-Leugerre

Page 14: Note_Set_1

14

integration is a good choice here (with 0α = ). We need to transform the range however.

Define ' nv zβ η= − − . We have,

212 ( ' )1

2 20

Pr( 0, 1| , )

' ( ' )1

n

n n n n

v v zv n nv

y r x z

x v ze e dvβπ

α ρ β

ρ

∞ − +−

=

= =

⎧ ⎫⎛ ⎞− + +⎪ ⎪⎜ ⎟= Φ⎨ ⎬⎜ ⎟−⎪ ⎪⎝ ⎠⎩ ⎭∫

Finally, we have the approximation,

212 ( ' )1

2 2

Pr( 0, 1| , )

' ( ' )1

i i n

n n n n

v v z n i ni

i

y r x z

x v zw e βπ

α ρ β

ρ− +

= =

⎧ ⎫⎛ ⎞− + +⎪ ⎪⎜ ⎟≈ Φ⎨ ⎬⎜ ⎟−⎪ ⎪⎝ ⎠⎩ ⎭∑

As an alternative example, suppose we wanted to computed the expectation of

2

11 X+

where X is a normal random variable, then Gauss-Hermite would be the best

choice. We have,

2122

( )122 2

1 11 1

x

xE e dx

X xσ

µ

σ π

∞ − −

=−∞

⎡ ⎤=⎢ ⎥

+ +⎣ ⎦∫

We would consider the transformation,

2

2 2

1 11 (1 ( 2))

y

yE e dy

X yπ µ σ

∞ −

=−∞

⎡ ⎤=⎢ ⎥

+⎣ ⎦ + +∫

This allows us to write,

2

2 2

1 11 (1 ( 2))

iyi

ii

E w eX yπ µ σ

−⎡ ⎤≈⎢ ⎥

+⎣ ⎦ + +∑

where ( , )i iw y are Guass-Hermite weights and evaluation points.

The chief advantage of applying Gaussian Quadrature is that we can often obtain

extreme accurate results with very few function evaluations. The chief drawback is that

Page 15: Note_Set_1

15

some functions will not be well approximated by a polynomial of any order. It is when

very high accuracy is desired and the function is poorly approximated by a polynomial

that we will rely on the Trapezoid and related formulas (or as we discuss later, on

simulation methods).

In Table 1.3, we consider several examples, taken from page 254 in Kenneth

Judd’s textbook. We see that Gaussian quadrature sometime performs extremely well,

even with only a few points. The trapezoid rule, alternatively, does well for a large

number of points.

Page 16: Note_Set_1

16

Table 1.3 – Some Simple Integrals

Rule Number of Points

1 1/ 4

0x dx∫

10 2

1x dx−∫

1

0

xe dx∫ 1

1( 0.05)x dx

− ++∫Trapezoid 4 .7212 1.7637 1.7342 .6056 7 .7664 1.1922 1.7223 .5583 10 .7797 1.0448 1.72 .5562 13 .7858 .9857 1.7193 .5542 Simpson 3 .6496 1.3008 1.4662 .4037 7 .7816 1.0017 1.7183 .5426 11 .7524 .9338 1.6232 .4844 15 .7922 .9169 1.7183 .5528 Guass-Legendre

4 .8023 .8563 1.7183 .5713

7 .8006 .8985 1.7183 .5457 10 .8003 .9 1.7183 .5538 13 .8001 .9 1.7183 .5513 Truth .8 .9 1.7183 .55125

Page 17: Note_Set_1

17

1.4 – Numerical Solution of Nonlinear Equations

Consider the problem of solving the equation 3 4xx e+ = − . You will not be

successful because this equation simply does not admit an analytical solution. Equations

like this come up frequently in formal theory and methods applications. Fortunately,

solving this problem numerically is actually quite easy. We can write the general problem

as ( ) 0f x = . We will almost always assume that f is continuous. At minimum, the

function should be continuous and all but a countable number of points. Otherwise,

knowing the function at one point will not provide any information about the function at

other points.

Now, suppose that we have points x and x such that x x< , ( ) 0f x < , and

( ) 0f x > . In this case, the intermediate value theorem implies that if f is continuous on

[ , ]x x , then there exists an * ( , )x x x∈ such that ( *) 0f x = . Finding points x and x is

called bracketing a root (alternatively, we can bracket points with x x< such that

( ) 0f x > and ( ) 0f x < ).

Now consider evaluating the function at the point 12 ( )x x+ . If 1

2( ( )) 0f x x+ = ,

then we have found a root. If 12( ( )) 0f x x+ > , then we not have bracketed a root in a

smaller interval, 12[ ( ), ]x x x+ . If 1

2( ( )) 0f x x+ < , then we have bracketed a root in a

smaller interval, 12[ , ( )]x x x+ . By continuing this process, we can bracket a root in

smaller and smaller intervals. Eventually, we will converge to a root of the system. This

technique is called the bisection method.

Page 18: Note_Set_1

18

Suppose that, in addition, we know that f is strictly monotonic on [ , ]x x . Then

we also know that the root *x is unique on that interval. If f is monotonic everywhere,

there exists a unique solution.

Now, we know that at each iteration, the root *x will be between the upper and

lower bracket. We know that the size of the bracket is being cut in half at each iteration.

Therefore, we know that the bisection method converges q-linearly.

Consider an alternative algorithm for solving the sample problem. A first-order

Taylor approximation gives,

( *) ( ) '( )( * )f x f x f x x x≈ + −

Let us take *x to be a root of f so that,

( ) '( )( * )f x f x x x≈ − −

We have,

( )*'( )

f xx xf x

≈ −

This suggests the following algorithm. Given a current point, kx , compute a new point

using,

1( )'( )

kk k

k

f xx xf x+ ≈ +

It is clear that this process will stop if ( ) 0kf x = . If we start this procedure in a

neighborhood of the root, it is guaranteed to converge q-quadratically. Whether this

process will converge far from the root depends on the function f , and is a rather

complicated problem. Iteration is a rather complex branch of mathematics. For example,

it is well know that a differential equation cannot exhibit chaos in dimension less than 3.

Page 19: Note_Set_1

19

A nonlinear difference equation, however, can exhibit chaos in a single dimension. These

problems do exist for Newton’s method in practice.

In practice, Newton’s method works as follows. Set,

1( )'( )

kk k

k

f xx xf x

λ+ ≈ +

for 1λ = . If 1| ( ) | | ( ) |k kf x f x+ < , then we have successfully reduced that value of the

function, so accept 1( )kf x + . Otherwise, reduce λ and try again. One can show

theoretically, that under these conditions, Newton’s method will converge globally to a

local minimum of | ( ) |f x (which may not be a root).

Newton’s method has two problems. The first is that we need to be able to

compute 'f . The second is that the naive version of Newton’s method may not converge,

even when the bisection method will. While Newton’s method with line search can be

quite effective for higher-dimensional problems, it is inefficient for one-dimensional

problems.

There are solutions to each of these approaches. Suppose that 'f is not supplied

directly. We can approximate it using numerical derivatives (see Section 1.2). The main

drawback of this approach is that we need to compute f twice per iteration rather than

once. Instead, we consider the following approximation for '( )f x . We can used,

1

1

( ) ( )'( ) k kk

k k

f x f xf xx x

−≈

This suggest the following algorithm,

1 11

( )( )( ) ( )

kk k k k

k k

f xx x x xf x f x+ −

≈ + −−

Page 20: Note_Set_1

20

This approach is called the secant method. It achieves q-superlinear convergence, which

is faster than the bisection method, but slower than Newton’s method. Like Newton’s

methods, there is no guarantee that the secant method will converge. A variant of the

secant method is the false position method, which makes sure to keep one point on each

side of the root, but is otherwise similar to the secant method. This procedure retains the

q-superlinear convergence rate, but is guaranteed to converge. A picture will illustrate

this.

The second problem relates to convergence of Newton’s method, the Secant

method, and the false position method. These methods all converge faster than the

bisection method as we get close to the root, but may have worse performance far from

the root. Brent’s method follows the same lines as these other methods, but makes sure to

check the progress of the algorithm and reverts to the bisection method in cases of poor

performance. Brent’s method is otherwise similar to the false-position method, but uses

and inverse quadratic approximation rather that a linear one. This is the algorithm that

works best in practice, and which is the hardest to illustrate with a simple figure. The

Secant and Newton’s method are of interest because they are the only algorithms that

extend to the multidimensional case.

Let us now consider an example where we can apply the one-dimensional root-

finding algorithms. Consider two countries that must divide a surplus (whose value is

normalized to one) among themselves. A country may choose to fight or back down or

agree to a default settlement of 12 for each. The country must pay a cost of kc to fight. In

the event that only one country fights, that country gets the full surplus. In the event that

Page 21: Note_Set_1

21

both countries fight, each country wins with probability one half (and the country that

wins gets the full surplus). The surplus is discounted at rate 0 1β< < however.

Each country knows its own cost, but not the cost of the other country. The costs

are known to be drawn from the common distribution, F , where F admits a derivative

on 12[0, ]. We have the following utility functions,

Country 2

Fight Don’t

Figh

t 1 11 22 2( , )c cβ β− − 1(1 ,0)c−

Cou

ntry

1

Don

’t

2(0,1 )c− 1 12 2( , )

We will assume that all equilibria have the form, country 1 fights if their cost is lower

that 1 *c and country 2 fights if their cost is lower than 2 *c .

Country 1’s expected utility from fighting, given country 2’s strategy is,

1 12 1 2 1 2 12 2( *)( ) (1 ( *))(1 ) 1 ( *)( 1)F c c F c c F c cβ β− + − − = + − −

while his expected utility from not fighting is,

1 1 12 2 22 2 2( *)*0 (1 ( *))* ( *)F c F c F c+ − = −

Now, the cut point must be the point 1 1 *c c= that equates these utilities,

1 11 22 2* ( *)(1 )c F c β= − −

A similar calculation for the other country will show that,

1 12 12 2* ( *)(1 )c F c β= − −

Page 22: Note_Set_1

22

These two equations form a system of non-linear equations. We can however

reduce this to a single nonlinear equation by noting that there is a *c such that

1 2* * *c c c= = solves both equations. Such a point must satisfy,

1 12 2( ) ( )(1 ) 0g c c F c β= − + − =

Notice that,

1 1 1 1 12 2 2 2 2( (1 )) (1 )( ( (1 )) 1) 0g Fβ β β− − = − − − − <

1 1 12 2 2( ) ( )(1 ) 0g F β= − >

g is continuous

12'( ) 1 ( )(1 ) 0g c f c β= + − >

We immediately have that there exists a unique solution to ( )g c c= , which implies that

there is a unique symmetric equilibrium to the game described above (if our hypothesis

that all equilibria involve cut-point strategies is correct). Furthermore, we see that we can

bracket the root, which tells us that the bisection method will converge if we start from

1 1 12 2 2[ (1 ), ]β− − .

Solving this problem is quite easy. The first thing we must do is supply the

function, ( )g x . For example, let us consider the case where 0.7β = and ( ) 1 xF x e−= − .

In c/c++, we could write,

const double Beta = 0.7; double Func(double x) { return x - 0.5 - 0.5 * (x >= 0.0) * (1.0 – exp(-x)) * (1.0 - Beta); }

In numerical recipes, we could call the bisection method using the code,

/* Include NR Code */

Page 23: Note_Set_1

23

#include "nr3.h" #include "roots.h" /* Define Nonlinear Equation to Solve */ const double Beta = 0.7; double Func(double x); double Func(double x) { return x - 0.5 - 0.5 * (x >= 0.0) * (1.0 - exp(-x)) * (1.0 - Beta); } int main() { /* Bisection Method */ try { cout << "rtbis, x: " << rtbis(Func,-10.0,10.0,0.00000001) << "\n\n"; } catch(int i) { cout << "rtbis: FAILED\n\n"; } return 0; }

This produces the output,

rtbis, x: 0.564722

1.5 – Numerical Optimization

The final basic algorithms we will discuss relate to numerical optimization.

Suppose that we want to find the minimum (or maximum) of some function f . Suppose

that ( ) ( )f b f a< and ( ) ( )f c f a< with a b c< < . Then if f is continuous on [ , ]a c , it

must contain a local minimum in this region. We thus say we have bracketed a minimum.

The golden section search algorithm is very similar to the bisection method,

trying to enclose the local maximum in smaller and smaller brackets. We do this by

trying a new point in between a and c . If this point is greater than ( )f b , then it becomes

Page 24: Note_Set_1

24

the new middle point. Otherwise, it becomes one of the new boundary points. Continuing

this procedure, we will obtain smaller and smaller intervals.

1.6 – Suggested Reading

Numerical Recipes in C, section 4.0-4.5, 5.7, 9.0-9.3, 10.0-10.3.

Dennis and Schnabel, …

Judd, …