note_set_1
DESCRIPTION
stochasticTRANSCRIPT
![Page 1: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/1.jpg)
1
Note Set 1 – The Basics
1.1 – Overview
In this note set, we will cover the very basic tools of numerical methods.
Numerical methods are useful for both formal theory and methods because they free us
from having to employ restrictive assumptions in order to obtain solutions to important
problems.
Consider applications to formal theory first. Stylized formal models often show
that some result holds under a certain set of conditions. A good theorist however, will
attempt to characterize all conditions under which that result holds. The ideal is an if and
only if result. This may not be possible, but general conditions are more informative than
specific ones. When we go down this road, however, we may be able to derive certain
nice results, but lack specificity. Without restrictive assumptions, we may not be able to
solve our model analytically. When we make restrictive assumptions, then we would like
to at least make them realistic. The set of realistic assumptions will not always be the
![Page 2: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/2.jpg)
2
ones that lead to an easy solution. Here is where numerical methods provide a great
advantage, because they free of form having to choose the set of assumptions that make
our model easy to solve, and allow us to choose these assumptions based on other
concerns (e.g. realism, generality).
1.2 – Numerical Differentiation
Differentiation differs from many of the operation we consider because it will
transform an analytical expression into another analytical one. If our original expression
is made up of plusses, minuses, multiplication signs, division signs, powers, roots,
exponentials, logarithms, sines, cosigns, etc., then results derivative will have a
representation made up of the same set of expressions. If all we were required to do is
compute derivatives, we may never need to employ numerical methods. However, mixing
analytic and numerical methods can be hard. Because of this, we will often apply
numerical differentiation because it is required by the other numerical procedures.
Alternatively, if we would like to study the behavior of a function and are able to
compute its derivative, it may still be advantageous to apply numerical differentiation.
There are many important applications of numerical differentiation. Many
methods for solving nonlinear systems of equations and the optimization of non-linear
functions require that derivatives be supplied. It is sometimes possible to supply analytic
derivatives to the function, but there are limitations to this approach. Computing
analytical derivates can vary from tedious to near impossible (imagine optimization a log-
likelihood that depends on a variance matrix through its Cholesky decomposition). Other
![Page 3: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/3.jpg)
3
applications of numerical derivatives include computing standard errors for econometric
models, marginal effects for econometric models, and solving games with continuous
strategy spaces.
Recall that the definition of the derivatives of the function ( )f x is,
0
( ) ( )'( ) limh
f x h f xf xh→
+ −=
We assume that have access to a numerical function that computes ( )f x . For example,
we may have the c++ code,
double Func(double &x) { return x * x + 7; }
We would like to be able to take the function Func and the point x as inputs and return
the derivative of Func at x. The definition of the derivative suggests an approach- select a
small value, h, and compute,
( ) ( )'( ) f x h f xf xh
+ −≈
The key question here is, how small should h be? For example, why not choose
801.0*10h −= ? How small we can select h is limited by the precision of the computer.
Real numbers are stored on a computer using a finite number of bits representing the
decimal place, the sign, and the exponent. A float is represented by 4 bytes (or 32 bits)
and a double is represented by 8 bytes (or 64 bits). If we select 801.0*10h −= , then the
computer will likely not be able to return a different value for ( )f x h+ and ( )f x , and
are estimated derivative will be (arbitrarily) evaluated as zero.
![Page 4: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/4.jpg)
4
There are two major sources of error in computing numerical derivatives-
truncation error and round-off error. The first trick we employ is to make sure that x and
x h+ differ exactly by a number that can be represented by a computer. This will reduce
one source of round-off error to zero. This can be accomplished using the following lines
of code,
double Temp = x + h; h = Temp – x;
Let mε denote the smallest number that a computer can represent. For example,
for a typical Intel/AMD pc, this number will be -162.22045*10 . There exist routines that
will compute this number for your computer. The remaining round off error will have
size ~ | ( ) / |m f x hε . To calculate the truncation error, consider the Taylor expansion,
212( ) ( ) '( ) ''( ) ...f x h f x f x h f x h+ = + + +
The truncation error is given by ~ ''( )f x h . Now, imagine choosing the size of h to
minimize the total error 12 ''( ) | ( ) | /mf x h f x hε+ . We will obtain,
''m fhfε
≈ . It is often
good enough to choose | |''
f xf
≈ when x is not to close to zero. This leads to the
heuristic, | |, | | 0
,m
m
x xh
otherwise
ε
ε
⎧ ≠⎪≈ ⎨⎪⎩
. An alternative that is sometimes used is
(1 | |)mh xε≈ + . This may seem a little ad-hoc, but we will see later that this method is
quite good (and certainly a lot better than guessing!).
![Page 5: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/5.jpg)
5
The truncation error in the above calculation is ( )O h . We can do better then this
by employing a higher order Taylor expansion,
2 31 12 6( ) ( ) '( ) ''( ) '''( ) ...f x h f x f x h f x h f x h+ = + + +
2 31 12 6( ) ( ) '( ) ''( ) '''( ) ...f x h f x f x h f x h f x h− = − + −
Notice that,
213
( ) ( )'( ) '''( ) ...2
f x h f x hf x f x hh
+ − −≈ − +
This is the second difference formula for computing numerical derivatives. Notice that
this yields are more accurate calculation, with an accuracy of 2( )O h , but requires an
extra function evaluation (assuming that f is already available). One can use the same
approach to calculate an optimal h . We get, 1 3 | |mh xε≈ .
We may also want to obtain higher-order derivatives. We can obtain these
derivatives by “iterating” the approach suggested above. For example, let use combine,
2 3 (4) 41 1 12 6 24( ) ( ) '( ) ''( ) '''( ) ( ) ...f x h f x f x h f x h f x h f x h+ = + + + +
2 3 (4) 41 1 12 6 24( ) ( ) '( ) ''( ) '''( ) ( ) ...f x h f x f x h f x h f x h f x h− = − + − + +
to obtain approximation to '( )f x at two points,
2 (4) 31 1 12 6 24
( ) ( ) '( ) ''( ) '''( ) ( ) ...f x h f x f x f x h f x h f x hh
+ −= + + + +
2 (4) 31 1 12 6 24
( ) ( ) '( ) ''( ) '''( ) ( ) ...f x f x h f x f x h f x h f x hh
− −= − + + +
We can this apply the first-difference principle again to obtain,
(4) 2112
( ) ( ) ( ) ( )
''( ) ( ) ...
f x h f x f x f x hh h f x f x h
h
+ − − −−
= + +
![Page 6: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/6.jpg)
6
We can write this as,
(4) 21122
( ) 2 ( ) ( )''( ) ( ) ...f x h f x f x hf x f x hh
+ − + −≈ − +
We can apply a similar principle to obtain 1 4 | |mh xε≈ with an error rate of 2( )O h .
The question arises, how well do these methods work in practice. In practice, they
work quite well, as the results below in Table 1.1. We computed the first and second
derivatives of the function 1( ) 1.5f x x−= at 2.75x = for step sizes of various values, as
well as the optimal values indicated by the formulas. We can see that the optimal values
do a good job of finding the best possible step size. C++ code for this example will be
available on the course website.
The discussion above assumed that the function ( )f x could be computed at
machine precision. Depending on the function we are dealing with, this may or may not
be the case. For “simple” problems, it will be the case. Suppose, alternatively, that we are
optimization a likelihood that involves an integral, which is computed using quadrature
methods (see the next section). This will involve error in the calculation and will likely
result in a bumpy objective function. Figure 1.2 plots a simulated method of moments
objective function. These can be quite messy at a small scale, so if we want to compute
derivatives, we should use step sizes on the scale where the function begins to be smooth.
![Page 7: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/7.jpg)
7
Table 1.1: First and Second Derivatives of 1( ) 1.5f x x−= at 2.75x =
First Derivatives Second Derivatives h Forward Backward Central h Forward Backward
1.00E-01 6.96E-03 -7.48E-03 -2.63E-04 1.00E-01 -1.45E-02 1.72E-02 1.00E-02 7.19E-04 -7.24E-04 -2.62E-06 1.00E-02 -1.56E-03 1.59E-03 1.00E-03 7.21E-05 -7.22E-05 -2.62E-08 1.00E-03 -1.57E-04 1.58E-04 1.00E-04 7.21E-06 -7.21E-06 -2.62E-10 1.00E-04 -1.58E-05 1.57E-05 1.00E-05 7.21E-07 -7.21E-07 8.26E-13 1.00E-05 -3.38E-06 1.06E-06 1.00E-06 7.21E-08 -7.22E-08 -5.05E-11 1.00E-06 7.66E-05 -1.45E-04 1.00E-07 7.26E-09 -7.18E-09 4.13E-11 1.00E-07 7.66E-05 7.66E-05 1.00E-08 8.26E-10 -1.03E-08 -4.73E-09 1.00E-08 -1.44E-01 -1.25E+00 1.00E-09 7.34E-09 7.34E-09 7.34E-09 1.00E-09 -1.44E-01 -1.44E-01 1.00E-10 2.29E-07 -8.81E-07 -3.26E-07 1.00E-10 -1.44E-01 -1.11E+04 1.00E-11 5.78E-06 -5.32E-06 2.29E-07 1.00E-11 -1.11E+06 -1.11E+06 1.00E-12 7.89E-05 -3.21E-05 2.34E-05 1.00E-12 -1.11E+08 -1.44E-01 1.00E-13 5.69E-04 -5.42E-04 1.38E-05 1.00E-13 -1.11E+10 -1.11E+10 1.00E-14 2.69E-03 -8.17E-03 -2.74E-03 1.00E-14 -1.44E-01 -1.06E+12 1.00E-15 7.33E-02 -5.17E-02 1.08E-02 1.00E-15 -1.41E+14 -1.44E-01 1.00E-16 -nan -nan -nan 1.00E-16 -nan -nan 1.00E-17 -nan -nan -nan 1.00E-17 -nan -nan 1.00E-18 -nan -nan -nan 1.00E-18 -nan -nan 1.00E-19 -nan -nan -nan 1.00E-19 -nan -nan Optimal h 5.08E-09 -3.05E-09 -6.17E-12 Optimal h -5.28E-05 5.28E-05
![Page 8: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/8.jpg)
8
Figure 1.2: Plot of a Simulated of Moments Objective Function
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.100000 0.100002 0.100003 0.100005 0.100006 0.100007 0.100009
![Page 9: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/9.jpg)
9
1.3 – Numerical Integration
Unlike differentiation, most integration problem will not admit numerical
solution. Hence, we often have no choice but to employ numerical methods. For example,
consider the integral 23(1 ) x x
xx e dx− ++∫ . The integral clearly does not admit an analytical
solution. Like numerical differentiation, you can probably guess what the first approach
to numerical integration will be.
Recall the definition of the Riemann integral. For any
0 1 2 2 ... n na x t x t t x b= < < < < < < = , we consider the approximation,
( ) 11
( ) ( )nb
i i ix ai
f x dx f t x x+==
≈ −∑∫
If we consider patricians such that the distance between ix and 1ix + is arbitrarily small for
all i , we get the value of the integral. This suggests the following formula for numerical
integration,
( )11 12
1( ) ( ) ( )
nb
i i i ix ai
f x dx f x x x x+ +==
≈ + −∑∫
If we set ( )ii nx a b a= + − . We get,
( )2 112
1( ) ( ) ( )
nbi
n nx ai
f x dx b a f a b a+
==
≈ − + −∑∫
More precisely, let us consider integrating the function between ix and 1ix + where
1i ih x x+= − . Using A Taylor expansion, we can obtain,
![Page 10: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/10.jpg)
10
211 2( ) ( ) ( ) '( ) ...i i i iF x F x f x h f x h+ = + + +
where F denotes the indefinite integral of f . Hence, we have,
1 212( ) ( ) '( ) ...i
i
x
i ix xf x dx f x h f x h+
== + +∫
Summing these terms, we obtain,
11 1 1
21 12 2
0 0 0( ) ( ) ( ) '( ) ...i
i
n n nb x
i ix a x xi i i
f x dx f x dx h f x h f x+− − −
= == = =
= = + +∑ ∑ ∑∫ ∫
We therefore have,
1
0( ) ( ( )) ( )
nb
x ai
f x dx h f a h b a O h−
==
= + − +∑∫
This is almost exactly the same as the formula derived above. A higher-order expansion
will improve on the accuracy, but requires higher-order differentiability of the function
being integrated.
The methods outlined above are sometimes useful, but only problems that are
very messy or problems for which extremely high precision is desired. I’ll elaborate on
this later. Perhaps the most widely used method is Gaussian quadrature. Gaussian
quadrature can be used for functions that are “well approximated by a polynomial”. In
particular, n -point quadrature wield yield an exactly correct expression for functions that
are equal to a 2 1n − -degree polynomial multiplied by some known weighting function
( )W x . In particular, the formula is,
1( ) ( ) ( )
nb
i ix ai
W x f x dx w f x=
=
≈ ∑∫
This will produce an exactly correct answer when ( )f x is a polynomial. The trick then is
to find the weights iw and the evaluation points ix .
![Page 11: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/11.jpg)
11
The following are the weighting functions that are typically used (each one is
given a name):
1. Gauss-Legendre Quadrature: ( ) 1W x = for 1 1x− < <
2. Guass-Chebychev Quadrature: 2 1 2( ) (1 )W x x −= − for 1 1x− < <
3. Guass-Laguerre: ( ) xW x x eα −= for 0 x< < ∞
4. Guass-Hermite: 2
( ) xW x e−= for x−∞ < < ∞
5. Gauss-Jacobi: ( ) (1 ) (1 )W x x xα β= − + for 1 1x− < <
How do we go about finding the weights and evaluation points then? Consider
Gauss-Hermite Quadrature and consider the polynomial 2 1
0
ni
ii
a x−
=∑ . We have,
2 22 1 2 1
0 0
n ni x i x
i ix xi i
a x e dx a x e dx− −∞ ∞− −
=−∞ =−∞= =
⎛ ⎞ =⎜ ⎟⎝ ⎠∑ ∑∫ ∫
Integrating by parts, we can determine that,
2 2212 ( 1)i x i x
x xx e dx i x e dx
∞ ∞− − −
=−∞ =−∞= −∫ ∫
2
0x
xxe dx
∞ −
=−∞=∫ ,
2
2x
xe dx π
∞ −
=−∞=∫
Notice, we want to have,
22 1 2 1
0 1 0( )
n n ni x j
i i j ixi i j
a x e dx w a x− −∞ −
=−∞= = =
⎛ ⎞= ⎜ ⎟
⎝ ⎠∑ ∑ ∑∫
For example, when 2n = , we have,
![Page 12: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/12.jpg)
12
2 2 2 22 30 1 2 3
10 2 2
2 2 3 31 2 0 1 1 2 2 1 1 1 2 2 2 1 1 2 2 3
2 2
( ) ( ) ( ) ( )
x x x x
x x x xa e dx a xe dx a x e dx a x e dx
a a
w w a w x w x a w x w x a w x w x a
π π
∞ ∞ ∞ ∞− − − −
=−∞ =−∞ =−∞ =−∞+ + +
= +
= + + + + + + +
∫ ∫ ∫ ∫
Notice that we need,
1 2
1 1 2 2
2 2 11 1 2 2 2
3 31 1 2 2
20
2
0
w ww x w x
w x w x
w x w x
π
π
+ =+ =
+ =
+ =
One can show that the solution to this system satisfies,
1 11 2 1 22 2 2
, ,w w x xπ= = = − =
This procedure works more generally, and for all of the quadrature formulas, except that
we don’t solve them by hand! Instead, there are standard computer programs that are
designed to computer the solutions to such systems of equations.
How do we choose which formula to use? The most important concern is the
range of the function. For example, is we want to compute the integral 21
27
3
x
xxe dx−
=∫ , we
would see that the function has a finite range. Therefore, we would apply Legendre,
Chebychev, or Jacoby. Guass Hermite would be a poor choice here because even though
we can write this integral as, 21
21{3 7} x
xx xe dx
∞ −
=−∞≤ ≤∫ , the resulting function
1{3 7}x x≤ ≤ would not we “well approximated by a polynomial.
Suppose we are estimating a Heckman selection model and have the equations,
* 'n n ny xα ε= +
* 'n n nr zβ η= +
![Page 13: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/13.jpg)
13
where nε and nη are standard normal random deviates with correlation ρ . Now, we
observe 1{ * 0}n ny y= ≥ only if 1{ * 0} 1n nr r= ≥ = . This means that conditional on ( , )n nx z ,
we observe three possible events- 0nr = , 0, 1n ny r= = , and 1, 1n ny r= = . Computing the
integral of the first event is easy,
Pr( 0 | , ) Pr( * 0 | , ) Pr( ' 0 | , )Pr( ' | , ) ( ' )
n n n n n n n n n n
n n n n n
r x z r x z z x zz x z z
β ηη β β= = ≤ = + ≤
= ≤ − = Φ −
Consider one of the other events,
2 2122(1 )
2
( 2 )' '1
2 2(1 )
Pr( 0, 1| , ) Pr( * 0, * 0 | , )Pr( ' 0, ' 0 | , )Pr( ' , ' | , )
n n
n n n n n n n n
n n n n n n
n n n n n n
x z
y r x z r y x zx z x z
x z x z
e d dρε ρεη ηα β
π ρε η
α ε β ηε α η β
ε η−− − +− −
−=−∞ =−∞
= = = ≥ ≤= + ≤ + ≤
= ≤ − ≤ −
= ∫ ∫
We can reduce the integrals by completing the square (or factoring the joint distraction
into a marginal and conditional),
2121 22(1 )22
( )' '1 12 (1 ) 2
Pr( 0, 1| , )n n
n n n n
z x
y r x z
e e d dρε ρηβ αη
π ρ πη εε η−
− −− −−
−=−∞ =−∞
= =
= ∫ ∫
If we use the change of variables 21
u ε ρηρ
−=
−, we obtain,
2 21 122 2
212
''1 112 2
'12 2
Pr( 0, 1| , )
'1
nn
n
n n n n
xz u
u
z n
y r x z
e e dud
xe d
α ρηβ η
ρπ πη
β ηπη
η
α ρη ηρ
− −− − −
−=−∞ =−∞
− −
=−∞
= =
=
⎛ ⎞− −⎜ ⎟= Φ⎜ ⎟−⎝ ⎠
∫ ∫
∫
There are standard algorithms for computing Φ efficiently, so we have reduced our
problem to a one dimensional integral. Since the bounds are half infinite, Guass-Leugerre
![Page 14: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/14.jpg)
14
integration is a good choice here (with 0α = ). We need to transform the range however.
Define ' nv zβ η= − − . We have,
212 ( ' )1
2 20
Pr( 0, 1| , )
' ( ' )1
n
n n n n
v v zv n nv
y r x z
x v ze e dvβπ
α ρ β
ρ
∞ − +−
=
= =
⎧ ⎫⎛ ⎞− + +⎪ ⎪⎜ ⎟= Φ⎨ ⎬⎜ ⎟−⎪ ⎪⎝ ⎠⎩ ⎭∫
Finally, we have the approximation,
212 ( ' )1
2 2
Pr( 0, 1| , )
' ( ' )1
i i n
n n n n
v v z n i ni
i
y r x z
x v zw e βπ
α ρ β
ρ− +
= =
⎧ ⎫⎛ ⎞− + +⎪ ⎪⎜ ⎟≈ Φ⎨ ⎬⎜ ⎟−⎪ ⎪⎝ ⎠⎩ ⎭∑
As an alternative example, suppose we wanted to computed the expectation of
2
11 X+
where X is a normal random variable, then Gauss-Hermite would be the best
choice. We have,
2122
( )122 2
1 11 1
x
xE e dx
X xσ
µ
σ π
∞ − −
=−∞
⎡ ⎤=⎢ ⎥
+ +⎣ ⎦∫
We would consider the transformation,
2
2 2
1 11 (1 ( 2))
y
yE e dy
X yπ µ σ
∞ −
=−∞
⎡ ⎤=⎢ ⎥
+⎣ ⎦ + +∫
This allows us to write,
2
2 2
1 11 (1 ( 2))
iyi
ii
E w eX yπ µ σ
−⎡ ⎤≈⎢ ⎥
+⎣ ⎦ + +∑
where ( , )i iw y are Guass-Hermite weights and evaluation points.
The chief advantage of applying Gaussian Quadrature is that we can often obtain
extreme accurate results with very few function evaluations. The chief drawback is that
![Page 15: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/15.jpg)
15
some functions will not be well approximated by a polynomial of any order. It is when
very high accuracy is desired and the function is poorly approximated by a polynomial
that we will rely on the Trapezoid and related formulas (or as we discuss later, on
simulation methods).
In Table 1.3, we consider several examples, taken from page 254 in Kenneth
Judd’s textbook. We see that Gaussian quadrature sometime performs extremely well,
even with only a few points. The trapezoid rule, alternatively, does well for a large
number of points.
![Page 16: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/16.jpg)
16
Table 1.3 – Some Simple Integrals
Rule Number of Points
1 1/ 4
0x dx∫
10 2
1x dx−∫
1
0
xe dx∫ 1
1( 0.05)x dx
− ++∫Trapezoid 4 .7212 1.7637 1.7342 .6056 7 .7664 1.1922 1.7223 .5583 10 .7797 1.0448 1.72 .5562 13 .7858 .9857 1.7193 .5542 Simpson 3 .6496 1.3008 1.4662 .4037 7 .7816 1.0017 1.7183 .5426 11 .7524 .9338 1.6232 .4844 15 .7922 .9169 1.7183 .5528 Guass-Legendre
4 .8023 .8563 1.7183 .5713
7 .8006 .8985 1.7183 .5457 10 .8003 .9 1.7183 .5538 13 .8001 .9 1.7183 .5513 Truth .8 .9 1.7183 .55125
![Page 17: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/17.jpg)
17
1.4 – Numerical Solution of Nonlinear Equations
Consider the problem of solving the equation 3 4xx e+ = − . You will not be
successful because this equation simply does not admit an analytical solution. Equations
like this come up frequently in formal theory and methods applications. Fortunately,
solving this problem numerically is actually quite easy. We can write the general problem
as ( ) 0f x = . We will almost always assume that f is continuous. At minimum, the
function should be continuous and all but a countable number of points. Otherwise,
knowing the function at one point will not provide any information about the function at
other points.
Now, suppose that we have points x and x such that x x< , ( ) 0f x < , and
( ) 0f x > . In this case, the intermediate value theorem implies that if f is continuous on
[ , ]x x , then there exists an * ( , )x x x∈ such that ( *) 0f x = . Finding points x and x is
called bracketing a root (alternatively, we can bracket points with x x< such that
( ) 0f x > and ( ) 0f x < ).
Now consider evaluating the function at the point 12 ( )x x+ . If 1
2( ( )) 0f x x+ = ,
then we have found a root. If 12( ( )) 0f x x+ > , then we not have bracketed a root in a
smaller interval, 12[ ( ), ]x x x+ . If 1
2( ( )) 0f x x+ < , then we have bracketed a root in a
smaller interval, 12[ , ( )]x x x+ . By continuing this process, we can bracket a root in
smaller and smaller intervals. Eventually, we will converge to a root of the system. This
technique is called the bisection method.
![Page 18: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/18.jpg)
18
Suppose that, in addition, we know that f is strictly monotonic on [ , ]x x . Then
we also know that the root *x is unique on that interval. If f is monotonic everywhere,
there exists a unique solution.
Now, we know that at each iteration, the root *x will be between the upper and
lower bracket. We know that the size of the bracket is being cut in half at each iteration.
Therefore, we know that the bisection method converges q-linearly.
Consider an alternative algorithm for solving the sample problem. A first-order
Taylor approximation gives,
( *) ( ) '( )( * )f x f x f x x x≈ + −
Let us take *x to be a root of f so that,
( ) '( )( * )f x f x x x≈ − −
We have,
( )*'( )
f xx xf x
≈ −
This suggests the following algorithm. Given a current point, kx , compute a new point
using,
1( )'( )
kk k
k
f xx xf x+ ≈ +
It is clear that this process will stop if ( ) 0kf x = . If we start this procedure in a
neighborhood of the root, it is guaranteed to converge q-quadratically. Whether this
process will converge far from the root depends on the function f , and is a rather
complicated problem. Iteration is a rather complex branch of mathematics. For example,
it is well know that a differential equation cannot exhibit chaos in dimension less than 3.
![Page 19: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/19.jpg)
19
A nonlinear difference equation, however, can exhibit chaos in a single dimension. These
problems do exist for Newton’s method in practice.
In practice, Newton’s method works as follows. Set,
1( )'( )
kk k
k
f xx xf x
λ+ ≈ +
for 1λ = . If 1| ( ) | | ( ) |k kf x f x+ < , then we have successfully reduced that value of the
function, so accept 1( )kf x + . Otherwise, reduce λ and try again. One can show
theoretically, that under these conditions, Newton’s method will converge globally to a
local minimum of | ( ) |f x (which may not be a root).
Newton’s method has two problems. The first is that we need to be able to
compute 'f . The second is that the naive version of Newton’s method may not converge,
even when the bisection method will. While Newton’s method with line search can be
quite effective for higher-dimensional problems, it is inefficient for one-dimensional
problems.
There are solutions to each of these approaches. Suppose that 'f is not supplied
directly. We can approximate it using numerical derivatives (see Section 1.2). The main
drawback of this approach is that we need to compute f twice per iteration rather than
once. Instead, we consider the following approximation for '( )f x . We can used,
1
1
( ) ( )'( ) k kk
k k
f x f xf xx x
−
−
−≈
−
This suggest the following algorithm,
1 11
( )( )( ) ( )
kk k k k
k k
f xx x x xf x f x+ −
−
≈ + −−
![Page 20: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/20.jpg)
20
This approach is called the secant method. It achieves q-superlinear convergence, which
is faster than the bisection method, but slower than Newton’s method. Like Newton’s
methods, there is no guarantee that the secant method will converge. A variant of the
secant method is the false position method, which makes sure to keep one point on each
side of the root, but is otherwise similar to the secant method. This procedure retains the
q-superlinear convergence rate, but is guaranteed to converge. A picture will illustrate
this.
The second problem relates to convergence of Newton’s method, the Secant
method, and the false position method. These methods all converge faster than the
bisection method as we get close to the root, but may have worse performance far from
the root. Brent’s method follows the same lines as these other methods, but makes sure to
check the progress of the algorithm and reverts to the bisection method in cases of poor
performance. Brent’s method is otherwise similar to the false-position method, but uses
and inverse quadratic approximation rather that a linear one. This is the algorithm that
works best in practice, and which is the hardest to illustrate with a simple figure. The
Secant and Newton’s method are of interest because they are the only algorithms that
extend to the multidimensional case.
Let us now consider an example where we can apply the one-dimensional root-
finding algorithms. Consider two countries that must divide a surplus (whose value is
normalized to one) among themselves. A country may choose to fight or back down or
agree to a default settlement of 12 for each. The country must pay a cost of kc to fight. In
the event that only one country fights, that country gets the full surplus. In the event that
![Page 21: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/21.jpg)
21
both countries fight, each country wins with probability one half (and the country that
wins gets the full surplus). The surplus is discounted at rate 0 1β< < however.
Each country knows its own cost, but not the cost of the other country. The costs
are known to be drawn from the common distribution, F , where F admits a derivative
on 12[0, ]. We have the following utility functions,
Country 2
Fight Don’t
Figh
t 1 11 22 2( , )c cβ β− − 1(1 ,0)c−
Cou
ntry
1
Don
’t
2(0,1 )c− 1 12 2( , )
We will assume that all equilibria have the form, country 1 fights if their cost is lower
that 1 *c and country 2 fights if their cost is lower than 2 *c .
Country 1’s expected utility from fighting, given country 2’s strategy is,
1 12 1 2 1 2 12 2( *)( ) (1 ( *))(1 ) 1 ( *)( 1)F c c F c c F c cβ β− + − − = + − −
while his expected utility from not fighting is,
1 1 12 2 22 2 2( *)*0 (1 ( *))* ( *)F c F c F c+ − = −
Now, the cut point must be the point 1 1 *c c= that equates these utilities,
1 11 22 2* ( *)(1 )c F c β= − −
A similar calculation for the other country will show that,
1 12 12 2* ( *)(1 )c F c β= − −
![Page 22: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/22.jpg)
22
These two equations form a system of non-linear equations. We can however
reduce this to a single nonlinear equation by noting that there is a *c such that
1 2* * *c c c= = solves both equations. Such a point must satisfy,
1 12 2( ) ( )(1 ) 0g c c F c β= − + − =
Notice that,
1 1 1 1 12 2 2 2 2( (1 )) (1 )( ( (1 )) 1) 0g Fβ β β− − = − − − − <
1 1 12 2 2( ) ( )(1 ) 0g F β= − >
g is continuous
12'( ) 1 ( )(1 ) 0g c f c β= + − >
We immediately have that there exists a unique solution to ( )g c c= , which implies that
there is a unique symmetric equilibrium to the game described above (if our hypothesis
that all equilibria involve cut-point strategies is correct). Furthermore, we see that we can
bracket the root, which tells us that the bisection method will converge if we start from
1 1 12 2 2[ (1 ), ]β− − .
Solving this problem is quite easy. The first thing we must do is supply the
function, ( )g x . For example, let us consider the case where 0.7β = and ( ) 1 xF x e−= − .
In c/c++, we could write,
const double Beta = 0.7; double Func(double x) { return x - 0.5 - 0.5 * (x >= 0.0) * (1.0 – exp(-x)) * (1.0 - Beta); }
In numerical recipes, we could call the bisection method using the code,
/* Include NR Code */
![Page 23: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/23.jpg)
23
#include "nr3.h" #include "roots.h" /* Define Nonlinear Equation to Solve */ const double Beta = 0.7; double Func(double x); double Func(double x) { return x - 0.5 - 0.5 * (x >= 0.0) * (1.0 - exp(-x)) * (1.0 - Beta); } int main() { /* Bisection Method */ try { cout << "rtbis, x: " << rtbis(Func,-10.0,10.0,0.00000001) << "\n\n"; } catch(int i) { cout << "rtbis: FAILED\n\n"; } return 0; }
This produces the output,
rtbis, x: 0.564722
1.5 – Numerical Optimization
The final basic algorithms we will discuss relate to numerical optimization.
Suppose that we want to find the minimum (or maximum) of some function f . Suppose
that ( ) ( )f b f a< and ( ) ( )f c f a< with a b c< < . Then if f is continuous on [ , ]a c , it
must contain a local minimum in this region. We thus say we have bracketed a minimum.
The golden section search algorithm is very similar to the bisection method,
trying to enclose the local maximum in smaller and smaller brackets. We do this by
trying a new point in between a and c . If this point is greater than ( )f b , then it becomes
![Page 24: Note_Set_1](https://reader031.vdocuments.net/reader031/viewer/2022020519/577cc2111a28aba711941b5b/html5/thumbnails/24.jpg)
24
the new middle point. Otherwise, it becomes one of the new boundary points. Continuing
this procedure, we will obtain smaller and smaller intervals.
1.6 – Suggested Reading
Numerical Recipes in C, section 4.0-4.5, 5.7, 9.0-9.3, 10.0-10.3.
Dennis and Schnabel, …
Judd, …