contents · 2012-11-26 · fundamental theorem of algebra: if pis a polynomial of degree n 1, then...

Contents

1 Important Preliminary Information 1

1.1 Important Theorems from Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Important Theorems from Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Taylor and Maclaurin Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Types of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Nested Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Roots of Equations 17

2.1 The Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 The Method of False Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4 The Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.5 Fixed-Point Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Solving Linear Systems 39

3.1 Matrices & Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2 Gaussian-Elimination by Hand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.3 Systems of Equations - Direct Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.4 Systems of Equations - Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.5 Systems of Equations - Least Squares Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4 Interpolation and Approximation 61

4.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2 Divided Differences - Newton’s Interpolating Polynomial . . . . . . . . . . . . . . . . . . . . . 66

4.3 Chebyshev Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.4 Cubic Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5 Numerical Differentiation 75

5.1 Differentiation Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.2 Numerical Differentiation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.3 Difference Formula Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.4 Optimal Step Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6 Numerical Integration 85

6.1 Trapezoid and Simpson’s Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.2 Derivations and Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

i

ii CONTENTS

7 Differential Equations 937.1 Euler’s Method - Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 947.2 Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967.3 Runge Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1007.4 Multi-Step Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Chapter 1

Important Preliminary Information

At the very heart of all numerical methods is the desire to accurately obtain analytic solutions computationally.For example, we can analytically solve the following equation

cos(x) = 1.4 with x = arccos(1.4)

but what does that really mean? In theory, arccos(1.4) is a perfectly good answer but if we are interestedin landing a rover on another planet, we want to know exactly what that value is. The real problem iscompounded by the fact that many calculations are needed prior to, and possibly after, that particularcalculation is complete. We want to minimize error at every stage of the game. While calculators andcomputers seem to be quite flawless in their calculations, they are really only good at adding, subtracting,multiplying and dividing (and they have problems with these at times). Fortunately, mathematical theoryallows us to make approximations to arccos(1.4) and put a bound on the error that results .. and there isalways some error. If we can use a method that guarantees a small enough error, we can proceed accordinglyand rest assured that the final solution is sufficiently accurate.

In addition to putting bounds on the errors associated with standard calculations, numerical methodsallow us to solve equations that can not be solved analytically. This is a wonderful thing. If we can determinean equation for an unknown variable, we can usually devise a way to determine one or more values for thatvariable. But again, we must be concerned about accuracy. In this text/course we learn many ways ofsolving various equations but always consider the error involved in such methods. This is a text on numericalmethods. A text on numerical analysis would spend more time using mathematical theory to prove that themethod works and analyze the resulting errors. Such a course requires greater mathematical sophistication.The course focuses on the many very cool methods we use to solve problems on a computer.

I write this while my son is watching Star Wars III (Revenge of the Sith) in the other room. WhileAnikan is being dragged to the dark-side I can’t help but draw an analogy. These methods are clever, theyare powerful, they allow you to solve problems that you would otherwise not be able to solve. However, ifthey are not used with care, they can result in horribly wrong answers. At the heart of Numerical Analysisis recognizing the error that can result from Numerical Methods. We focus on the methods here and, attimes, do not sufficiently discuss the potential errors. Be careful that your desire for a solution does notinterfere with the necessity of a valid solution. With great power comes great responsibility. I think that’sfrom spiderman.

1

2 CHAPTER 1. IMPORTANT PRELIMINARY INFORMATION

1.1 Important Theorems from Algebra

• The Remainder Theorem:

Let Pn(x) be an n’th degree polynomial: Pn(x) = anxn + an−1x

n−1 + an−2xn−2 + . . .+ a1x+ a0

Dividing Pn(x) by the monomial (x− x1) yields

Pn(x)

x− x1

= Qn−1 +R

x− x1

rearranging:Pn(x) = (x− x1) Qn−1(x) +R

where R = Pn(x1).

• Fundamental Theorem of Algebra:

If P is a polynomial of degree n ≥ 1, then P (x) = 0 has at least one (possibly complex) solution.

– Corollary 1

If P(x) is a polynomial of degree n ≥ 1, then there exist unique constants x1, x2, . . ., xk, possibly

complex and positive integers m1, m2, . . ., mk, such thatk∑i=1

mi = n and

P (x) = an(x− x1)m1(x− x2)m2 · · · (x− xk)mk

Ie. Every polynomial of degree n has exactly n roots counting multiplicities.

– Corollary 2

Let P and Q be polynomials of degree at most n. If x1, x2, . . ., xk, with k > n are distinct numberswith P (xi) = Q(xi) for i = 1,2,. . .,k, then P (x) = Q(x) for all values of x.

Ie. If P and Q are both polynomials of degree ≤ n, and they agree at n + 1 or more values of x.Then they are the same polynomial.

Proof: Look at F (x) = P (x) − Q(x). It is clear that F is a polynomial of degree ≤ n. It is alsoclear that F equals zero at n+ 1 values of x. By corollary 2, F can only have n roots. So F is nota polynomial of degree ≥ 1. It is constant. Since it equals zero at a few places and is constant, itmust be the constant function 0. This means P (x)−Q(x) = 0 or P (x) = Q(x).

• Algebraic Versus Transcendental Functions:

– Algebraic Functions are functions involving algebraic expressions such at polynomials, rationalpowers, rational functions such as

f(x) = 3x2 − 2x2 + 5x+ 7, f(x) =x2 + 2

x3 − 1, x2 + y2 = 1(not even a function)

Computers are pretty good at evaluating algebraic functions.

– Transcendental Functions are functions which transcend algebra. These include exponential,logarithmic, trigonometric functions and their inverses. Example include

f(x) = ex, f(x) = xπ f(x) = x1x , f(x) = sin(x), f(x) = cosh(x)

Because these functions require more than the standard operations of addition, subtraction, mul-tiplication, and division, these types of functions pose a real problem to computer evaluation.

1.2. IMPORTANT THEOREMS FROM CALCULUS 3

1.2 Important Theorems from Calculus

• Limit (definition)

Let f be a function defined on a set X of real numbers; f is said to have a limit L at x0, writtenlimx→x0

f(x) = L, if, given any real number ε > 0, there exist a real number δ > 0 such that |f(x)−L| < ε

whenever x ∈ X and 0 < |x− x0| < δ.

• Continuity (definitions)

Let f be a function defined on a set X of real numbers and x0 ∈ X; f is said to be continuous at x0

if limx→x0

f(x) = f(x0).

The function f is said to be continuous on X if it continuous at each number in X.

C(X) denotes the set of all functions continuous on X. C[a, b] is the set of all functions continuouson [a, b], where the definition of continuity at the endpoints are redefined as lim

x→a+f(x) = f(a) and

limx→b−

f(x) = f(b).

• Differentiability (definition)

If f is a function defined on an open interval containing x0, f is said to be differentiable at x0 if

limx→x0

f(x)− f(x0)

x− x0

or limh→0

f(x0 + h)− f(x0)

h

exists. When this limit exists it is denoted by f ′(x0) and is called the derivative of f at x0.

A function that has a derivative at each number in a set X is said to be differentiable on X.

The set of all functions having n continuous derivatives on X is denoted Cn(X), and the set of functionsthat have derivatives of all orders on X is denoted C∞(X). Polynomials, rational, trigonometric,exponential, and logarithmic functions are C∞(X), where X consists of all numbers at which thefunctions are defined. Cn[a, b] requires left and right definitions of the derivative at the endpointssimilar to those used in the continuity case.

• Intermediate Value Theorem

If f ∈ C[a, b] and K is any number between f(a) and f(b), then there exists c ∈ (a, b) for whichf(c) = K.

Specifically If f ∈ C[a, b] and f(a) · f(b) < 0 there exists c ∈ (a, b) for which f(c) = 0.

• Mean Value Theorem

If f ∈ C[a, b] and f is differentiable on (a, b), the number c in (a, b) exists where:

f ′(c) =f(b)− f(a)

b− a.

Specifically If f(a) = f(b) = 0, there exists c ∈ (a, b) such that f ′(c) = 0. (Rolle’s Theorem).

• Generalized Rolle’s Theorem

Lef f ∈ C[a, b] be n times differentiable on (a, b). If f vanishes at the n+ 1 distinct numbers x0, . . . , xnin [a, b], then a number c exists in (a, b) such that f (n)(c) = 0. Where f (n) denotes the n’th derivativeof f .


• Extreme Value Theorem

If f ∈ C[a, b], then c1, c2 ∈ [a, b] exist with f(c1) ≤ f(x) ≤ f(c2) for all x ∈ [a, b]. If, in addition, f isdifferentiable on (a, b), then the numbers c1 and c2 occur either at the endpoints of [a, b] or where f ′ iszero.

• Riemann integral

The Riemann integral for the function f on the interval [a, b] is the following limit (provided it exists)∫ b

a

f(x) dx = limmax ∆xi→0

n∑i=1

f(x∗i )∆xi,

where the numbers x0, x1, . . . , xn satisfy a = x0 < x1 < . . . < xn = b and for each i, ∆xi = xi − xi−1,and x∗i is arbitrarily chosen in the interval [xi−1, xi].

Theorem: A function that is continuous on [a, b] is Riemann integrable on the interval.

The above theorem permits us to choose, for computational convenience, the points xi to be equallyspaced so that ∆xi = ∆x and x∗i to be chosen in an organized fashion, such as the left-endpoint,right-endpoint, or midpoint of each sub-interval.

• Fundamental Theorems of Calculus

– Part 1

If f ∈ C[a, b], then the function F defined by

F (x) =

∫ x

a

f(t) dt a ≤ x ≤ b

is continuous on [a, b], differentiable on (a, b), and F ′(x) = f(x).

– Part 2

If f ∈ C[a, b], and F is any antiderivative of f (ie. F ′ = f), then∫ b

a

f(x) dx = F (b)− F (a).

• Mean Value Theorems for Integrals

If f is continuous on [a, b] (f ∈ C[a, b]), then there is a number c in [a, b] such that

f(c) =1

b− a

∫ b

a

f(x) dx = the average value of f over [a, b]

1.3. TAYLOR AND MACLAURIN SERIES 5

1.3 Taylor and Maclaurin Series

• Taylor’s Theorem with Lagrange Remainder

Let f be continuous and have n+ 1 derivatives on [a, b] (I.e., f ∈ Cn+1[a, b]), and x0 ∈ [a, b]. Then, forevery x ∈ [a, b], there exists η between x0 and x such that

f(x) = Pn(x) + En(x)

where

Pn(x) =n∑k=0

f (k)(x0)

k!(x− x0)k,

and

En(x) =f (n+1)(η)

(n+ 1)!(x− x0)n+1.

– The infinite series obtained by taking the limit of Pn(x) as n→∞ is called the Taylor series forf about x0.

– If limn→∞

En(x) = 0 for all x such that |x − x0| < R, then f is equal to the sum of its Taylor series

for all such x.

– Pn(x) is called the n’th degree Taylor polynomial for f about x0

– En(x) is called the error term or truncation error associated with Pn(x). Notice, it is verysimilar to the first term neglected from the Taylor series in defining Pn(x).

– In the case x0 = 0, the Taylor polynomial is called a Maclaurin polynomial and the Taylorseries is called the Maclaurin Series.

– P1(x) is called the local linear approximation for f about x0;

P1(x) = f(x0) + f ′(x0)(x− x0).

• Another Form (letting x play the role of x0 and x+ h play the role of x from above.)

Suppose f ∈ Cn+1[a, b], then for any x and x+ h in [a, b], there exists η between x and x+ h such that

f(x+ h) =n∑k=0

f (k)(x)

k!hk + En(h)

where

En(h) =f (n+1)(η)

(n+ 1)!hn+1.

Here we say the error is O(hn+1) (big O of h to the n+1)

• Definition of Big O: We say that F (h) is of the order hn and write

F (h) = O(hn)

if there exists a positive constant C > 0 and an integer n such that

|F (h)| ≤ C|hn| for sufficiently small h


1. Example 1: Maclaurin Series for ex. (MATLAB R© file Example1.m)

(a) Derive the Maclaurin series for ex. (this is the Taylor series with x0 = 0).

(b) Define P3(x).

(c) Plot ex, P1(x), P2(x) and P3(x) over the interval [0, 2]. Also plot the associated errors for eachpolynomial.

(d) Find a bound on the error in approximating e1 with the third degree Maclaurin polynomial.

(e) Approximate e1 (using MATLAB R© ) and with P3(1). What is the error? Is it less than the erroryou determined above?

(f) Find the minimum degree Maclaurin polynomial for ex that ensures the error in approximating e1

is less than 10−6.

Answers to Example 1:

(a) Derive the Maclaurin series for ex. (this is the Taylor series with x0 = 0).

f(x) = ex f(0) = 1f ′(x) = ex f ′(0) = 1f ′′(x) = ex f ′′(0) = 1f (3)(x) = ex f (3)(0) = 1f (n)(x) f (n)(0) = 1

ex = 1 + x+x2

2!+x3

3!+ . . . =

∞∑n=0

xn

n!.

(b) Define P3(x)?

This is just the third degree polynomial from the series above.

P3(x) = 1 + x+x2

2+x3

6

(c) Plot ex, P1(x), P2(x) and P3(x) over the interval [0, 2]. Also plot the associated errors for eachpolynomial.

MATLAB R© filePlotExpMacError.m

(d) Find a bound on the error in approximating e1 with the third degree Maclaurin polynomial.


|E3(1)| =

∣∣∣∣f (4)(η)

4!14

∣∣∣∣ =

∣∣∣∣ eη24

∣∣∣∣ where η ∈ [0, 1]

|E3(1)| ≤ 3

24because eη ≤ e1 < 3

|E3(1)| ≤ 1

8= 0.125

We conclude that the the absolute error is less than 0.125.

(e) Approximate e1 (using MATLAB R© ) and with P3(1). What is the error? Is it less than the erroryou determined above?

• With MATLAB R© : exp(1) = 2.71828

• With P3(1) ≈ 2.66667

• |Error| = |e1 − P3(1)| ≈ 0.05162 which less than the bound of 0.125.

(f) Find the minimum degree Maclaurin polynomial for ex that ensures the error in approximating e1

is less than 10−6.

This must be done by trial and error (with the aid of a computer if necessary) by increasing n.

n 3(n+1)!

7 7.44 ·10−5

8 8.27 ·10−6

9 8.27 ·10−7

When n = 9 the error is less than 10−6. So we need to use a polynomial of degree 9 or more.That’s pretty big!


2. Example 2: Taylor and Maclaurin Series for cos(x). (MATLAB R© file Example2.m)

(a) Derive the Maclaurin series for cos(x). (this is the Taylor series with x0 = 0).

(b) Define P2(x). The plot P2(x) on [−2, 2] with MATLAB R© .

(c) Plot cos(x), P0(x), P2(x) and P4(x) over the interval [−4, 4]. Also plot the associated errors foreach polynomial.

(d) Find a bound on the error in approximating cos(3) with the second degree Maclaurin polynomial.

(e) Approximate cos(3) (using MATLAB R© ) and with P2(3). What is the error? Is it less than theerror you determined above?

(f) Find the minimum degree Maclaurin polynomial for cos(x) that ensures the error in approximatingcos(3) is less than 10−6.

(g) Now derive the Taylor series for cos(x) about x0 = π.

(h) Plot cos(x), P0(x), P2(x) and P4(x) from the Taylor Series over the interval [0, 6]. Also plot theassociated errors for each polynomial.

(i) Approximate cos(3) with the Taylor polynomial P2 and determine the error. It should be muchless than the error from part (e).

Answers to Example 2:

(a) Derive the Maclaurin series for cos(x). (this is the Taylor series with x0 = 0).

f(x) = cos(x) f(0) = 1f ′(x) = − sin(x) f ′(0) = 0f ′′(x) = − cos(x) f ′′(0) = −1f (3)(x) = sin(x) f (3)(0) = 0f (4)(x) = cos(x) f (4)(0) = 1f (n)(x) f (n)(0) = 0 if n is odd, ±1 if n is even

cos(x) = 1− x2

2!+x4

4!− . . . =

∞∑n=0

(−1)nx2n

(2n)!.

(b) Define P2(x).

This is just the second degree polynomial from the series above.

P2(x) = 1− x2

2

(c) Plot cos(x), P0(x), P2(x) and P4(x) over the interval [−4, 4]. Also plot the associated errors foreach polynomial.


MATLAB R© filePlotCosMacError.m

(d) Find a bound on the error in approximating cos(3) with the second degree Maclaurin polynomial(P2(x)).

|E2(3)| =∣∣∣∣f (3)(η)

3!33

∣∣∣∣ =

∣∣∣∣f (3)(η)

627

∣∣∣∣ ≤ 27

6=

9

2because |f (3)(η)| ≤ 1.

We conclude that the the absolute error is less than 9/2 = 4.5.

Even better: Since P2(x) = P3(x) the error using P2 cannot be any worse than the error usingP3. Now,

|E3(3)| =∣∣∣∣f (4)(η)

4!34

∣∣∣∣ =

∣∣∣∣f (4)(η)

24

∣∣∣∣ 81 ≤ 81

24=

27

8because |f (4)(η)| ≤ 1

and we conclude that the absolute error is less than 27/8 = 3.3750.

(e) Approximate cos(3) (using MATLAB R© ) and with P2(3). What is the error? Is it less than theerror you determined above?

• With MATLAB R© : cos(3) = −0.9899

• With P2(3) = 1− 32

2= 1− 9

2= −3.5

• |Error| = | cos(3)−P2(3)| ≈ 2.51 which less than the error bound of 3.375, but it’s still prettybig.

(f) Find the minimum degree Maclaurin polynomial for cos(x) that ensures the error in approximatingcos(3) is less than 10−6.

Notice

|En(3)| =∣∣∣∣f (n+1)(η)

(n+ 1)!3n+1

∣∣∣∣ ≤ 3n+1

(n+ 1)!because |f (n+1)(η)| ≤ 1

so we need

3n+1

(n+ 1)!< 10−6

This must be done by trial and error (with the aid of a computer if necessary) by increasing n.


n 3n+1

(n+1)!

15 3.29 ·10−5

16 6.17 ·10−6

17 1.09 ·10−6

18 1.82 ·10−7

When n = 18 the error is less than 10−6. So we need to use a polynomial of degree 18 or more.That is huge!

(g) Now derive the Taylor series for cos(x) about x0 = π.

f(x) = cos(x) f(π) = −1f ′(x) = − sin(x) f ′(π) = 0f ′′(x) = − cos(x) f ′′(π) = 1f (3)(x) = sin(x) f (3)(π) = 0f (4)(x) = cos(x) f (4)(π) = −1f (n)(x) f (n)(π) = 0 if n is odd, ±1 if n is even

cos(x) = −1 +(x− π)2

2!− (x− π)4

4!+ . . . =

∞∑n=0

(−1)n+1 (x− π)2n

(2n)!.

(h) Plot cos(x), P0(x), P2(x) and P4(x) from the Taylor Series over the interval [0, 6]. Also plot theassociated errors for each polynomial.

MATLAB R© filePlotCosTaylorErrorAboutPie.m

(i) Approximate cos(3) with the Taylor polynomial P2 and determine the error. It should be muchless than the error from part (e).

• With MATLAB R© : cos(3) = −0.9899

• With P2(3) ≈ −0.98998

• |Error| = | cos(3)− P2(3)| ≈ 0.00002 which much less than the error from part (e).

1.4. TYPES OF ERRORS 11

1.4 Types of Errors

• Truncation Error happens when you use a Taylor Polynomial to evaluate a function because you can’tkeep infinitely many terms.

ex ≈ 1 + x+x2

2+x3

6

• Round-Off Error

When the true value of a number is not stored exactly by a computers representation, (either byconversion to a nonterminating fraction, in the computers base, or input beyond precision) the errorassociated with this imperfection is called Round-Off Error.

• Propagated Errors:

Suppose a real number x is represented by the machine as x∗ where x∗ = x(1 + δ) and δ (the initialrelative error) is small.

Suppose I want to calculate x2. In the machine I’ll get (x∗)2 = x2(1 + δ)2. Now I get an errorE = (x∗)2 − x2 = x2 [(1 + δ)2 − 1] ≈ x2(2δ) which can be very big, especially if x is big. If I lookat relative error = E

x2, I still get a relative error of 2δ. Notice, the relative error doubled. This is an

example of an error being propagated.

• Computer Arithmetic

– machine numbers: Three parts: the sign, the fraction part called the mantissa or significand, andthe exponent called the characteristic.

±.d1d2d3 . . . dp ∗Be

∗ B = the base that is used generally: 2, 16, 10

∗ di’s digits or bits 0 < d1 < B and 0 ≤ di < B for i 6= 1.

∗ p = the precision the number of significand bits. Actually the precision is one more than thisto include the sign bit.

∗ e = the characteristic (the exponent).

∗ In double precision format there are 64 bits used to store a floating point number.precision gets 53 bits (including the sign) and the exponent gets 11 bits.This gives about 15 to 17 significant decimal digits worth of precision.

– Min, Max, and Machine Zero in MATLAB R© (default is double precision)These are calculated in the file machine zero.m.

∗ The maximum positive number is ≈ 10308.

∗ The smallest positive number is ≈ 10−308.

∗ The machine zero is ≈ 10−16.Simplest Definition: Machine zero (ε) is the largest number such that: 1 + ε = 1.

– Errors in conversion (round-off error)

In base 2, 110

is the repeating decimal .11002. It will never be stored exactly in a base 2 machine.

In MATLAB: |10, 000−100,000∑k=1

1

10| ≈ 2 · 10−8


– Relative and Absolute error

If p∗ is the machine’s approximation to p:

∗ absolute error = |p− p∗|

∗ relative error =|p− p∗||p|

provide p 6= 0.

– Arithmetic Accuracy (loss of significance)

Suppose we are using base 10, five digit (chopping) arithmetic.

Actual computervariable Value Valuex 1/3 0.33333 x 100

y 5/7 0.71428 x 100

u 0.714251 0.71425 x 100

v 98765.9 0.98765 x 105

w 0.111111 x 10−4 0.11111 x 10−4

computer computer Actual Absolute Relativeoperation Result Value Error Errorx+ y 0.10476 x 101 22/21 0.190 x 10−4 0.182 x 10−4

y − x 0.38095 x 100 8/21 0.238 x 10−5 0.625 x 10−5

x · y 0.23809 x 100 5/21 0.524 x 10−5 0.220 x 10−4

y÷ x 0.21428 x 101 15/7 0.571 x 10−4 0.267 x 10−4

y − u* 0.30000 x 10−4 0.34714 x 10−4 0.471 x 10−5 0.136u+ v** 0.98765 x 105 0.98766 x 105 1.161 0.163 x 10−4

lessons:

* Subtracting similar values creates a large relative error. This is called a loss of significance.

** Adding two numbers of disparate size causes large absolute error.For example in MATLAB: 1, 000, 000 + 10−11 = 1, 000, 000.

• Error analysis of algorithms generally assumes perfect precision, ie. no round-off error. However, it isthere and is worth keeping it in mind. Especially if you are doing many sequential calculations wherethe output from one is input into another. In this way, errors can be propagated and your final answercan be garbage. The next page shows two examples of ways to avoid error build-up.

1.4. TYPES OF ERRORS 13

• Avoiding loss of significance in the quadratic formula

Recall the roots of ax2 + bx+ c can be found by

x1 =−b+

√b2 − 4ac

2aand x2 =

−b−√b2 − 4ac

2a(1.1)

Suppose b2 − 4ac > 0, b > 0 and b ≈√b2 − 4ac. There is thus a good chance of x1 containing a large

relative error, and this can turn into large error if a is small. We resolve this by changing the form ofx1 by rationalizing the numerator

x1 =−b+

√b2 − 4ac

2a

(−b−

√b2 − 4ac

−b−√b2 − 4ac

)=

−2c

b+√b2 − 4ac

(1.2)

With the above form for x1 the subtraction of nearly equal numbers is eliminated. Therefore, it wouldbe wise to calculate x1 by equation (1.2) and x2 by equation (1.1). A similar revision is needed if b < 0in which case, x2 from equation (1.1) is at risk of losing precision due to subtracting similar numbers.

• Avoiding round off error with nested multiplication

Even in the absence of subtracting similar numbers, there will be round-off error and the best wayto minimize this is to reduce the number of computer operations. A good example of this is nestedmultiplication.

Consider the function f(x) = x3 − 6x2 + 3x− 0.149 evaluated at x = 4.71. We will assume three digitbase 10 arithmetic for the machine computations.

– traditional evaluation: 5 multiplies and 3 adds/subtracts.

The relative error with either chopping or rounding is ≈ 0.04

– nested multiplication: 2 multiplies and 3 adds/subtracts.

f(x) = ((x− 6)x+ 3)x− 0.149 (1.3)

The relative error with chopping is ≈ 0.0093

The relative error with rounding is ≈ 0.0025


1.5 Nested Multiplication

The Remainder Theorem, Synthetic Division, Nested Multiplication1. The Remainder Theorem:

Let Pn(x) be an n’th degree polynomial: Pn(x) = anxn + an−1x

n−1 + an−2xn−2 + . . .+ a1x+ a0

Dividing Pn(x) by the monomial (x− x1) yields

Pn(x)

x− x1

= Qn−1 +R

x− x1

rearranging:Pn(x) = (x− x1) Qn−1(x) +R

where R = Pn(x1).

2. Synthetic Division

x1 | an an−1 an−2 . . . a1 a0

x1bn−1 x1bn−2 . . . x1b1 x1b0

an an−1 + x1bn−1 an−2 + x1bn−2 . . . a1 + x1b1 a0 + x1b0

↓ ↓ ↓ ↓ ↓ ↓bn−1 bn−2 bn−3 . . . b0 R = Pn(x1)

andQn−1(x) = (bn−1x

n−1 + bn−2xn−2 + . . .+ b1x+ b0)

3. Nested Multiplication is the same as synthetic division. Notice

bn−1 = an

bn−2 = an−1 + x1bn−1 = an−1 + x1an

bn−3 = an−2 + x1bn−2 = an−2 + x1(an−1 + x1an)... =

...

b0 = a1 + x1b1

R = a0 + x1b0

example: Let P5(x) = 9x5 − 3x4 + 2x3 + 13x2 − 7x+ 2. Evaluate P5(x1) using nested multiplication:

P5(x1) = ((((9x1 − 3)x1 + 2)x1 + 13)x1 − 7)x1 + 2

Using this to evaluate P5(x1) reduces the number of multiplications from 5+4+3+2+1=15 using theoriginal form to 5 multiplications using the nested form. The number of additions is unchanged.

Generalization: For an arbitrary n’th degree polynomial, nested form reduces the number of multi-plications from n(n+ 1)/2 to n. In other words the number of multiplications goes for O(n2) to O(n).This is significant increase in efficiency.Lesson: Always use nested multiplication to evaluate polynomials.

1.5. NESTED MULTIPLICATION 15

A MATLAB program for the nested multiplicationOnce in MATLAB, click on file - new - Mfile. A window will open and you should make the following

function file and save it as nested.m in a directory where where you can access it. MATLAB can access itanywhere, you may have to change directories in MATLAB to get to the file location. Ask me if you needhelp doing this. Everything after a % (per line) is ignored by MATLAB, however, the first set will be printedif you type help nested.

The first line: function R = nested(a,x) tells MATLAB that this is a function file, all variables remainlocal, it accepts two input variables, and returns R to the calling location.

• Input:

– a = [ao, a1, a2, . . . , an] an n-vector of the coefficients of the polynomial

– x = a real number at which to evaluate the polynomial

• Output: Pn(x) = anxn + an−1x

n−1 + . . .+ a1x+ a0

function R = nested(a,x)

% Input - a is a vector of coefficients of a polynomial P in

increasing order

% - x is the value at which to evaluate P

% Output - P(x)

%

n = length(a); % this finds the length of the a vector

b = zeros(1,n); %this builds an n-vector of all zeros

i = n-1 % initializes the index variable i

while i > 0 % starts the while loop

b(i) = a(i+1) + x*b(i+1); % computes b(i) values

i = i - 1; % reduces the index by 1

end % ends the while loop

Q = b(1:n-1); % the coefficients of Qn−1

R = a(1) + x * b(1); % the remainder term = P(x)

Go back to MATLAB and use this function to evaluate a few polynomials. Also type help nested. Youshould get the first set of comments. Notice, the line; Q = ..., which gets the first n-1 terms in b and placesthen in Q, is unnecessary. However, if you replace the first line with function [Q,R] = nested(a,x) thefunction would then return a 2-vector: the first term being a vector of the coefficients of Qn−1 and the secondterm being Pn(x). This is how you return multiple elements from a function.

Now download the file PolyEval.m from the class website and put it in the same location of nested.m.Open PolyEval.m in MATLAB and click on the green arrow. This will run the program file PolyEval.m,which requests user input, calculates Pn(x1) using traditional and nested multiplication (calling the abovefunction). It then returns the values of Pn(x1) calculated by the two different techniques and the difference.

There is a difference between a program file such as PolyEval.m and a function file such as nested.m.

• A program file is just a list of commands and MATLAB goes through these commands as if they hadbeen typed in the console. Specifically, the variables are no longer local to the function. Therefore, thecommand clear generally precedes all other commands in a program file so that previous results do notinterfere with current calculations.


• A function file always starts with the line

function output variable = function name(input variables)

function [output variables] = function name(input variables)

Here, the function name must be the same as the filename (without the .m).The input variable(s) remain local to the function evaluations and only the output variable(s) are re-turned to the calling program. If you try to run a function file with the green arrow, you will probablyget an error unless there are no input variables. You call the function file from the console or a callingprogram with the command

function name(input variables)

Chapter 2

Roots of Equations

In this chapter we seek to find the roots (or zeros) of a function. That is, we seek x such that

f(x) = 0 (2.1)

This is far more useful than it first appears. In prior classes all the way back to middle school you havebeen asked to solve for x or find x. What they didn’t always tell you was that you usually can’t solve for xwith just a pencil and paper. There are only a few types of equations where you are certain to be able to doit. Fortunately for you, when you were asked to solve for x, it turned out that you could. That is misleading.

For example, if you were asked to solve for x in

x3 − sin(x) = 3x+ 5 (2.2)

you could try and try without any luck. With numerical methods, you can solve for x. There is probablymore than one solution. Numerical methods will allow you to find at least one of them if you start with aclose enough first guess. You do this by casting the problem as a root-finding problem. For example you cansolve for x in the above equation by defining the function

f1(x) = x3 − sin(x)− 3x+ 5 or f2(x) = 3x+ 5− x3 + sin(x). (2.3)

If you can find the roots of either of these functions you will have found the solutions to equation (2.2). Theroots of the functions from (2.3) we say that you have found the roots of equation (2.2).

Definitions, Notation, and Terminology

• I will generally denote a root of a function f(x) by the Greek letter α (alpha). I.e.,

f(α) = 0 (2.4)

• We will try to create a sequence of x-values

x0, x1, x2, . . . denoted by {xn}∞n=0 (2.5)

where x0 is an initial guess and the sequence of x values converge to the root α. That is, as n getslarger, xn gets closer to α. This notion of convergence is expressed as a limit statement like

limn→∞

xn = α (2.6)

spoken the limit as n goes to infinity of xn equals α. This notion of equality (or convergence) has aformal definition.

17

18 CHAPTER 2. ROOTS OF EQUATIONS

• The sequence {xn}∞n=0 is said to converge to α or limn→∞

xn = α is defined as follows.

For any ε > 0 (think small) there is an integer N (think large) such that|xn − α| < ε for all n ≥ N

In other words, for any small number (ε) you can find a large integer (N) so that all of the xn’safter xN will be within ε of α. This means you can get arbitrarily close to α with your sequence.

• Error and Order of ConvergenceThe error at the n’th iteration is denoted by en

en = xn − α (2.7)

Assuming that the sequence {xn}∞n=0 converges to α we define how quickly it converges by the orderof convergence.

If there exists a number k and a positive constant C such that

limn→∞

|en+1||en|k

= C

then k is called the order of convergence.

This definition does not make the relationship between errors immediately obvious so here is a slightlyless formal definition

If there exists a number k and a positive constant C such that|en+1| → C |en|k as n→∞

then k is called the order of convergence.

This second version shows that if en is small then the next error is like ekn which is even smallerthan en (provided k > 1 and C does not change much).

• Stopping CriterionWe will be generating the sequence of xn with a computer loop of some kind. As such we need to knowwhen to stop because we do not want an infinite loop. So we introduce a few ways to stop the loop.Here, tol represents some tolerance (small number). You will usually want to use a combination of thefollowing.

– If we can put a bound on the error at each step we can stop when we know the error is less thansome tolerance. We can’t always to this.

– If the sequence of xn’s gets very small such as |xn − xn−1| < tol.

– If f(xn) < tol.

– Emergency stop if the iterations go more that some predetermined maximum.

2.1. THE BISECTION METHOD 19

2.1 The Bisection Method

The bisection method is a type of bracketing method. We know the root is between to numbers so we keepreducing the bracket which must contain the root.

1. This method is based on the Intermediate Value Theorem (IVT).

If f ∈ C[a, b] (continuous on [a,b]) and K is any number between f(a) andf(b), then there exists c ∈ (a, b) for which f(c) = K.

Specifically If f ∈ C[a, b] and f(a) · f(b) < 0 there exists c ∈ (a, b) forwhich f(c) = 0.

2. The Bisection Method of Bolzano (Interval Halving)

• Start with an initial interval [a0, b0] where f(a0) and f(b0) have opposite signs. (graph or search)

By the Intermediate value theorem, there exists α ∈ (a0, b0) such that f(α) = 0.

• Let x0 =a0 + b0

2– If f(a0) and f(x0) have opposite signs: zero is in [a0, x0].

– If f(x0) and f(b0) have opposite signs: zero is in [x0, b0].

– If f(x0) = 0, the a zero occurs at x0. (not likely to happen)

In either of the first two cases, the new interval is one half the width of the original. Label thisnew interval [a1, b1] and do it again.

first interval is [a0, b0] and x0 = (a0 + b0)/2

second interval is [a1, b1] and x1 = (a1 + b1)/2where a1 = x0 and b1 = b0 or a1 = a0 and b1 = x0

n’th interval is [an, bn] and xn = (an + bn)/2where an = xn−1 and bn = bn−1 or an = an−1 and bn = xn−1

{an}n=∞n=0 is an increasing sequence.

{bn}n=∞n=0 is an decreasing sequence.

and an ≤ r ≤ bn for all n.

• PseudocodeInput: f, a, b, max error, min f, max its

check that f(a) · f(b) < 0x = (a + b)/2;

Do

If f(a) ∗ f(x) < 0b = x;

else

a = x;

end

x = (a+b)/2;

Until (|b-a|<max error) or (f(x) < min f) or (iterations > max its)

Output: x (or an error message if too many iterations)


• Theorem (Bisection Theorem) Assume f ∈ C[a0, b0] and that f(a0) and f(b0) are nonzero ofopposite sign. Then there exists a number α ∈ (a0, b0) such that f(α) = 0 and the sequence ofxn’s generated by the bisection process satisfies

limn→∞

xn = α.

Proof

(a) Existence: Consider g(x) = f(x) − x, Notice g(a0) · g(b0) < 0, Therefore by IVT α ∈ (a, b)exists such that f(α) = 0.

(b) Convergence

|e0| = |x0 − α| ≤ b0−a02

|e1| = |x1 − α| ≤ b1−a12

= b0−a022

... by induction

|en| = |xn − α| ≤ b0−a02n+1

So, 0 ≤ |xn − α| ≤ b0−a02n+1 and lim

n→∞

b0 − a0

2n+1= 0

By the squeezing theorem,

limn→∞

|xn − α| = 0.

note: This doesn’t really show that f( limn→∞

xn) = 0. A more complete proof requires a little theorem

about continuous functions. Specifically, if a function is continuous on a set containing a sequence{xn}∞n=0 where lim

n→∞= α, then lim

n→∞f(xn) = f(α).

• Stopping Criterion: The nice thing about this method is that if you have a tolerance on theerror, you can determine how many iterations to perform prior to starting the loop. For one ofyour homework problems you will need to do this.

Suppose you want to approximate α with an error ≤ ε, How many iterations should you perform?Put your answer in terms of ε and the original interval length (b0 − a0).

2.2. THE METHOD OF FALSE POSITION 21

2.2 The Method of False Position

This is very similar to the bisection method only instead of choosing the midpoint for xn we find where theline connecting an and bn hits the x axis and this is xn.

1. The Method of False Position or regula falsi.

• Start with an initial interval [a0, b0] where f(a0) and f(b0) have opposite signs. (graph or search)

By the Intermediate value theorem, there exists α ∈ (a0, b0) such that f(α) = 0.

• Let x0 be the location where the line connecting (a0, f(a0)) and (b0, f(b0)) hits the x-axis.

y − f(b) =f(b)− f(a)

b− a(x− b) point-slope form of our line

y =f(b)− f(a)

b− a(x− b) + f(b) our line

0 =f(b)− f(a)

b− a(x− b) + f(b) set y = 0

x = b− f(b)b− a

f(b)− f(a)solving for x

x = bf(b)− f(a)

f(b)− f(a)− f(b)

b− af(b)− f(a)

common denominator

x =b f(b)− b f(a)− b f(b) + a f(b)

f(b)− f(a)distributing

x =a f(b)− b f(a)

f(b)− f(a)simplifying

• Let x0 =a0 f(b0)− b0 f(a0)

f(b0)− f(a0)

– If f(a0) and f(x0) have opposite signs, the zero is in [a0, x0]. Set a1 = a0 and b1 = x0.

– If f(x0) and f(b0) have opposite signs, the zero is in [x0, b0]. Set a1 = x0 and b1 = b0.

– If f(x0) = 0, the a zero occurs at x0. (not likely to happen)

• Continue with this process.

Let xn =an f(bn)− bn f(an)

f(bn)− f(an)

– If f(an) and f(xn) have opposite signs, the zero is in [an, xn]. Set an+1 = an and bn+1 = xn.

– If f(xn) and f(bn) have opposite signs, the zero is in [xn, bn]. Set an+1 = xn and bn+1 = bn.


• PseudocodeInput: f, a, b, min f, max its

check that f(a) · f(b) < 0x = (a * f(b) - b * f(a))/(f(b) - f(a));

Do

If f(a) ∗ f(x) < 0b = x;

else

a = x;

end

x = (a * f(b) - b * f(a))/(f(b) - f(a));

Until (f(x) < min f) or (iterations > max its) Output: x (or an error

message if too many iterations)

• Problems:

– While this method will produce a sequence converging to α it can do so very slowly. Thishappens when the graph is concave up or concave down over (an, bn).

– Sometimes (if the graph is concave up or down over (an, bn)) the bracket never decreases insize. So we can’t use a stopping criterion on the interval width.

– So we have to use a stopping criterion such as f(xn) < ε.

2.3. NEWTON’S METHOD 23

2.3 Newton’s Method

We seek α wheref(α) = 0.

To use Newton’s method we need f to be continuous and twice differentiable on [a, b] (f ∈ C2[a, b]) whereα ∈ [a, b]. We also need an initial guess (x0) that is close enough to α. We’ll define close enough later.

1. Description Using Tangent Lines

Assume we start with an initial guess x0. We then follow the tangent line to the curve at (x0, f(x0)) towhere it hits the x-axis (y = 0). The x-value of where it hits will be x1, our next guess in the sequence.

y − f(x0) = f ′(x0) (x− x0) point-slope form of the tangent line

y = f ′(x0) (x− x0) + f(x0) the tangent line

y = f ′(x0)x+ (f(x0)− f ′(x0) x0) tangent line y = mx + b

0 = f ′(x0)x+ (f(x0)− f ′(x0) x0) set y = 0, solve for x

x =1

f ′(x0)(f ′(x0) x0 − f(x0)) solved for x

x = x0 −f(x0)

f ′(x0)simplify

x1 = x0 −f(x0)

f ′(x0)define x1

2. Description Using the Local Linear ApproximationWe need an initial guess called x0 and then find the local linear approximation to f(x) about x = x0.This is the first degree Taylor Polynomial.

f(x) ≈ f(x0) + f ′(x0)(x− x0)

set the above equation equal to zero, solve for x and call the solution x1.

x1 = x0 −f(x0)

f ′(x0)

This was much easier to derive.

3. Newton’s Iteration Function

The function g(x) = x− f(x)

f ′(x)is called the Newton Iteration Function. For Newton’s method:

xn+1 = g(xn) = xn −f(xn)

f ′(xn)(2.8)

4. Stopping Criterion Keep iterating until |f(xn)| < δ or |xn − xn−1| < ε. Of course include a maximumnumber of iterations.


5. Pseudo-Code

Input: f ,f ′,x0,δ,ε,M,y = feval(f,x0)for k=1:Mx1 = x0− y / feval(f’,x0)err = abs(x1 − x0)x0 = x1

y = feval(f,x0)if (err < ε) | (abs(y) < δ),break,end

endOutput: x0

if k = M, display(’method failed’)

6. Pathologies

• Usually Newton’s method converges to the nearest root. Sometimes it converges to a differentRoot. This can happen if f ′(xn) ≈ 0 at some point in the process.

• If f ′(xn) = 0 for some n, the method will fail.

• The sequence diverges. For example f(x) = xe−x and x0 is greater than where the max occurs.

• The sequence diverges by going around and around forever (hard to do).

xeven = x0 and xodd = x1

f(x) = x(x− 1)(x+ 1)

x0 = − 1√5

x1 =1√5

x0 = − 1√5


7. Convergence

• Will Newton’s method converge?

So long as x0 is chosen close enough to an existing root α and f ∈ C2[a, b] where α ∈ [a, b] thenthe sequence of approximations generated by the iteration (2.8) will converge to α. Proof is leftfor fixed point theory.

• Quadratic Convergence to a Simple Root

First, α is a simple root of f if f(α) = 0 but f ′(α) 6= 0. Once xn gets close enough to α, Newton’smethod will create a sequence of xn’s with order 2 convergence to α. This is called QuadraticConvergence.

|en+1| →|f ′′(α)|2|f ′(α)|

|en|2 (2.9)

ProofInvoke Taylor’s theorem to expand f(x) about xk to get

f(x) = f(xk) + f ′(xk)(x− xk) +1

2f ′′(ηk)(x− xk)2, ηk is between x and xk.

Set x = α (and f(α) = 0 and obtain

0 = f(xk) + f ′(xk)(α− xk) +1

2f ′′(ηk)(α− xk)2, ηk is between α and xk.

0 =f(xk)

f ′(xk)+ α− xk +

f ′′(ηk)

2f ′(xk)(α− xk)2, because f ′(xk) 6= 0(

xk −f(xk)

f ′(xk)

)− α =

f ′′(ηk)

2f ′(xk)(xk − α)2

ek+1 =f ′′(ηk)

2f ′(xk)e2k, ηk is between α and xk.

Now since xk → α as n → ∞ this implies that ηk → α as well. Therefore, f ′(xk) → f ′(α)and f ′′(ηk)→ f ′′(α) and

|ek+1| →|f ′′(α)|2|f ′(α)|

|ek|2 as in equation (2.9)

• Linear Convergence to a Root of Multiplicity M:

First, α is a root of multiplicity M of f if fM−1(α) = 0 but fM(α) 6= 0. Once xn gets closeenough to α, Newton’s method will create a sequence of xn’s with order 1 convergence to α. Thisis called Linear Convergence.

|en+1| →M − 1

M|en| (2.10)

Proof is left to fixed point theory


• Regaining Quadratic convergence at a root of multiplicity M

If f has a root of multiplicity M, then quadratic convergence can be regained by altering Newton’siteration function to

xn+1 → xn −M ·f(xn)

f ′(xn)(2.11)

Proof is left to fixed point theory

But what is M, if we don’t know the solution?

If we suspect there is a multiple root (of unknown multiplicity) we know that

en+1 ≈ C · en where C =M − 1

M

Define∆xn = xn+1 − xn = (xn+1 − α)− (xn − α) = en+1 − en = en(C − 1)

then∆xn+1

∆xn→ en+1(C − 1)

en(C − 1)=en+1

en= C =

M − 1

M

and then solve this last equation for M

M =1

1− C

and change the iteration function to

xn+1 = xn −Mf(xn)

f ′(xn)

8. Numerical Evidence for Order of Convergence

Here are some ways to demonstrate a certain rate of convergence for a sequence converging to a root.For this demonstration, you need to know the value of the root (α).

en = xn − α

• linear: |en+1| → K|en| where K is a positive constant less than one.

• quadratic: The number of zeros in the error doubles each time.

• graphical evidence of order of convergence.A plot of log(en+1) vs log(en) should have slope equal to the order of convergence.

|en+1| → C |en|k

log(|en+1|) → log(C |en|k)log(|en+1|) → log(C) + k log(|en|) k = slope


9. An interesting example: Fast Inverse Square Root

Here we try to find a fast way to determine1√x

because dividing is harder than multiplying and square

roots are also computationally expensive. We look at two methods to find a sequence y values thatconverge to 1√

x. I.e., solve the equation

y =1√x

(2.12)

y2 =1

x(2.13)

• Solve f(y) =1

y2− x for y. Notice, f ′(y) =

−2

y3and

yn+1 = yn −f(yn)

f ′(yn)(2.14)

yn+1 = yn −1y2n− x−2y3n

(2.15)

yn+1 = yn +1

2

(yn − y3

nx)

(2.16)

yn+1 = yn(1.5− 0.5 x y2

n

)(2.17)

(2.18)

• Solve f(y) = y2x− 1 for y. Notice, f ′(y) = 2xy and

yn+1 = yn −f(yn)

f ′(yn)(2.19)

yn+1 = yn −xy2

n − 1

2xyn(2.20)

(2.21)

• Which is easier? Which is faster?

http://en.wikipedia.org/wiki/Fast_inverse_square_root


2.4 The Secant Method

The secant method is just like Newton’s method only instead of following the tangent line to the x-axis wefollow the secant line based on the previous 2 iterates. Another way of thinking is that the derivative isapproximated by

f ′(xn) ≈ f(xn)− f(xn−1)

xn − xn−1

(2.22)

so that the iteration looks like Newton’s method with equation (2.22) replacing the f ′(xn) term.

xn+1 = xn −f(xn)

f(xn)−f(xn−1)xn−xn−1

= xn −f(xn)

f(xn)− f(xn−1)(xn − xn−1) (2.23)

• Advantages

– You don’t have to determine or evaluate f ′(xn)

– Fewer function evaluations because f(xn−1) has already been made.

• Disadvantages

– The order of convergence for the Secant Method is1 +√

5

2≈ 1.62 (proof is left to fixed point

theory).|en| → C|en−1|1.62. (2.24)

This is less than Newton’s Method.

– The Secant Method is subject to the same pathologies as Newton’s method.

PseudoCode for Secant Method

Input: x0 and x1

Set f0 = f(x0) and f1 = f(x1)Repeat

If |f0| < |f1|Swap x0 with x1 and f0 with f1

Set x2 = x1 − f1f1−f0 (x1 − x0)

Set x0 = x1 and f0 = f1

Set x1 = x2 and f1 = f(x2)Until |f1| ≤ ε1 or |x1 − x0| < ε2

note: The swapping ensures that |f(x1)| ≤ |f(x0)|This ensures that the absolute value of the function is non-increasing.

2.5. FIXED-POINT THEORY 29

2.5 Fixed-Point Theory

Here we investigate the theory that allows us to conclude that Newton’s method converges and details howyou can determine the order of convergence for various methods. These are from older notes where I used rinstead of α to denote a root. In other words, we seek r such that f(r) = 0.

1. Fixed-Point Theory

• A solution to the equationx = g(x)

is called a fixed point of the function g. Generally g is chosen from f in such a way that f(r) = 0when r = g(r). For example, a root of the equation

f(x) = x2 − 2x− 3 = 0

is also a fixed point ofx = g(x) =

√2x+ 3

• The iterationxn+1 = g(xn)

is called a fixed point iteration and g is called the iteration function.

• Example: Find a root of f(x) = x2 − 2x− 3 = (x+ 1)(x− 3) = 0 using fixed point iteration.

iterate x = g1(x) =√

2x+ 3 x = g2(x) =3

x− 2x = g3(x) =

x2 − 3

2

x0 4 4 4x1 3.31662 1.5 6.5x2 3.10375 -6 19.625x3 3.03439 -0.375 191.070x4 3.01144 -1.263158x5 3.00381 -0.919355x6 -1.02762x7 -0.990876x8 -1.00305xn → monotonically → 3 oscillatory → -1 monotonically divergent

g′1(3) = 13

g′2(−1) = −13

g′3(3) = 3

-1 1 2 3 4

-1

1

2

3

4

5g1

-6 -4 -2 2 4

-6

-4

-2

2

4

6g2

-2 2 4 6 8

-2

2

4

6

8g3


• Existence and Uniqueness of a Fixed Point: Assume g ∈ C[a, b].

(a) If the range of the mapping y = g(x) satisfies y ∈ [a, b] then g has a fixed point.Proof (1) If g(a) = a or g(b) = b, done. (2) Otherwise let h(x) = x − g(x). Notice h(x) ∈C[a, b], h(a) < 0 and h(b) > 0. By the IVT, there exists r ∈ (a, b) such that h(r) = 0, and thisimplies that r is a fixed point of g.

(b) Furthermore, suppose g′(x) is defined over (a, b) and that a positive constant K < 1 existswith |g′(x)| ≤ K < 1 for all x ∈ (a, b), then g has a unique fixed point r in [a, b].Proof Suppose by contradiction that g(r1) = r1 and g(r2) = r2 then by the mean valuetheorem there exists η ∈ (a, b) such that

g′(η) =g(r2)− g(r1)

r2 − r1

=r2 − r1

r2 − r1

= 1 6=

• Convergence to the Fixed Point: If g(x) and g′(x) are continuous on an interval I = (r − δ, r + δ)about a root r of the equation x = g(x), and if |g′(x)| ≤ K < 1 on I then the iteration xn+1 = g(xn)will converge to r, provided x0 is chosen in the interval.

Proof

(a) If xn ∈ I then by MVT

xn+1 − r = g(xn)− g(r) =g(xn)− g(r)

xn − r(xn − r) = g′(ηn)(xn − r)

where ηn is between xn and r. Therefore

|xn+1 − r| = |g′(ηn)| |xn − r| ≤ K · |xn − r| < |xn − r| < δ

and xn+1 ∈ I.This gives g(I) ⊂ I ⇒r is the unique fixed point.By induction all xn ∈ I.

(b) Let en = xn − r. By the argument above applied inductively

|e1| ≤ K|e0||e2| ≤ K|e1| ≤ K2|e0|

...

|en| ≤ Kn|e0|

and since 0 ≤ K < 1→ Kn → 0 and thus en → 0 and xn → r.

(c) Furthermore the convergence is order 1.

|en+1| = |g′(ηn)| · |en| where ηn is between xn and r

therefore ηn → r and|en+1| → |g′(r)| · |en|

and order 1 convergence is achieved.

• Example from g1(x) =√

2x+ 3, x0 = 4 and xn → 3.

|e5| = .00381, |e4| = .01144,|e5||e4|≈ .33304, and g′(3) =

1

3


2. Order of Convergence for Newton’s Method Expressing Newton’s iterative function as a fixed pointfunction:

xn+1 = xn −f(xn)

f ′(xn)= g(xn)

and

g′(x) = 1− f ′(x) · f ′(x)− f(x) · f ′′(x)

[f ′(x)]2=f(x) · f ′′(x)

[f ′(x)]2

Order of convergence for Simple Roots (f(r) = 0 and f ′(r) 6= 0) is quadratic.

• Notice, if f(r) = 0, f ′(r) 6= 0 and f ′′(r) exists then g(r) = r and g′(r) = 0.

Hence there is an interval I about r where g′(x) is defined and |g′(x)| < 1 for all x ∈ I. Then, ifwe choose x0 ∈ I fixed point theory guarantees xn → r (linearly).

• Question Why is the convergence better than linear as fixed point theory suggests?

• Answer Because g′(r) = 0.

• Proof of Quadratic Convergence let en = xn − r.

xn+1 − r = g(xn)− g(r)

expand g(xn) in a Taylor series in terms of (xn − r)

g(xn) = g(r) + g′(r)(xn − r) +g′′(η)

2(xn − r)2

where η is between xn and r. Because g′(r) = 0

g(xn) = g(r) +g′′(η)

2(xn − r)2

g(xn)− g(r) =g′′(η)

2(xn − r)2

xn+1 − r =g′′(η)

2(xn − r)2

en+1 =g′′(η)

2e2n

|en+1| =

∣∣∣∣g′′(η)

2

∣∣∣∣ |en|2since η → r as xn → r

|en+1| →∣∣∣∣g′′(r)2

∣∣∣∣ |en|2 = K · |en|2 as n→∞

and the convergence is quadratic.

Note: We showed earlier that Newton’s method converged quadratically as

|en+1| → |f ′′(r)

2f ′(r)||en|2 and above we showed that |en+1| → |

g′′(r)

2||en|2.

Show that g′′(r) = f ′′(r)f ′(r)

provided that f ′′′(x) exists at x = r.

We showed earlier that Newton’s method converges quadratically without this restriction so thatproof is better. However, the above proof illustrates how quadratic convergence is achieved wheng′(r) = 0.


Newton’s Method: Order of convergence for Multiple Roots is linear.

Example Suppose f has a double root at r, f(r) = f ′(r) = 0 but f ′′(r) 6= 0. Look at g′(r)

g′(r) = limx→r

f(x) · f ′′(x)

[f ′(x)]2indeterminate form of type 0/0

= limx→r

f ′(x)f ′′(x) + f(x)f ′′′(x)

2f ′(x)f ′′(x)by L’Hopital’s Rule

=1

2+ lim

x→r

f(x)f ′′′(x)

2f ′(x)f ′′(x)indeterminate form of type 0/0

=1

2+ lim

x→r

f(x)f (4)(x) + f ′(x)f ′′′(x)

2(f ′(x)f ′′′(x) + [f ′′(x)]2)by L’Hopital’s Rule

=1

2+

0 + 0

0+ 6= 0

=1

2

In general, if r is a root of multiplicity M , g′(r) = M−1M

and |en+1| →M − 1

M|en| from a similar analysis

as that for quadratic convergence using g(x).

Regaining Quadratic Convergence

If g(x) = x− f(x)/f ′(x)

then

g′(r) = 1− (f(r)/f ′(r))′ = M−1M

= 1− 1M

implies

(f(r)/f ′(r))′ = 1/M.

So if we let

g(x) = x−Mf(x)/f ′(x)

then

g′(x) = 1−M(f(x)/f ′(x))′

and

g′(r) = 1−M 1M

= 0

and quadratic convergence is retrieved.

Therefore, for a root of multiplicity M use

xn+1 = xn −Mf(xn)

f ′(xn)for quadratic convergence


3. Secant Method Convergence: This is more difficult than proving convergence for the Regula FalsiMethod in that g remains a function of xn and xn−1.

xn+1 = xn −f(xn)

f(xn)− f(xn−1)(xn − xn−1)

Therefore we need to examine the Taylor series for g as a function of two variables. It turns out thatthe equivalent to g′(r) does indeed equal zero and the method is better than linear but

|en+1| → C · |en · en−1| where C =1

2

∣∣∣∣f ′′(r)f ′(r)

∣∣∣∣ (2.25)

which is not |en|2. It turns out the secant method has order of convergence

1 +√

5

3≈ 1.62

This is better than linear but not as good as quadratic (super-linear).

Proof Assume a relationship of the form:

|en+1| ∼ A|en|p, A > 0 ⇒ |en| ∼ A|en−1|p and |en−1| ∼ (A−1|en|)1/p

and plug these into equation (2.25).

A|en|p ∼ C|en|A(−1/p)|en|1/p

A(1+1/p)C−1 ∼ |en|(1−p+1/p)

now the term on the right must be constant and so

1− p+1

p= 0

p− p2 + 1 = 0

p2 − p− 1 = 0

p =1±√

5

2choose +

p =1 +√

5

2≈ 1.62

Now you can easily solve for A

A(1+1/p)C−1 = 1

C = A(1+1/p)

= Ap

C1/p = A

C(p−1) = A

So

|en+1| ∼ A|en|p, where A =

∣∣∣∣ f ′′(r)2f ′(r)

∣∣∣∣(p−1)


4. Bisection Method Convergence The error bound is linear convergent to zero:

En+1 =1

2En

where En is the maximum possible error at the n’th iteration. This does not mean the actual error isreduced. For example if r ≈ (b + a)/2 then c0 could be a lot closer to the actual solution than cn formany values of n.

5. Summary We have not explicitly gone over all of these methods. They are described on the followingpages.

Method Special Relation betweenConsiderations successive error terms

Bisection Bracket Exists |En+1| = 12|En|

Regula Falsi Bracket Exists |en+1| ≈ A|en|Secant Method Simple Root |en+1| ≈ A|en|1.62

Muller’s Method Simple Root |en+1| ≈ A|en|1.85

Newton’s Method Simple Root |en+1| ≈ A|en|2Newton’s Method Multiple Root |en+1| ≈ A|en|Secant Method Multiple Root |en+1| ≈ A|en|Accelerated

Aitken’s Linear Conv. |en+1| ≈ A|en|q 1<q≤2

Steffenson Fixed Point |en+1| ≈ A|en|2Newton Multiple Root |en+1| ≈ A|en|2

Notes

• En is an upper-bound on en

• Regula falsi is a combination the secant and bisection method.

• Muller’s Method solves the local quadratic polynomial.

• Aitken’s method accelerates any linearly convergent sequence. Described on the next page.

• Stephenson’s method is an application of Aitken’s method to fixed-point algorithm.


Accelerating linear convergence with Aitken’s Method: Assume that the sequence {xn}∞n=0 converges lin-early to the value r and that xn − r 6= 0 for all n ≥ 0. If there exist a real number A where 0 < |A| < 1 suchthat

limn→∞

r − xn+1

r − xn= A,

then the sequence {qn}∞n=0 defined by

qn = xn −(xn+1 − xn)2

xn+2 − 2xn+1 + xn(2.26)

converges to r faster than {xn}∞n=0, in the sense that

limn→∞

∣∣∣∣ r − qnr − xn

∣∣∣∣ = 0 (2.27)

• Derivation of Equation (2.26)

r − xn+1

r − xn≈ A and

r − xn+2

r − xn+1

≈ A for n large

therefore(r − xn+1)2 ≈ (r − xn)(r − xn+2)

and solving for r you get

r ≈xn+2xn − x2

n+1

xn+2 − 2xn+1 + xn= qn

which is equivalent to the right hand side of (2.26)

• Proof of Equation (2.27)

limn→∞

r − qnr − xn

= limn→∞

r −(xn − (xn+1−xn)2

xn+2−2xn+1+xn

)r − xn

= 1 + limn→∞

[xn+1 − xn]2

(r − xn)[xn+2 − 2xn+1 + xn]

= 1 + limn→∞

[(xn+1 − r)− (xn − r)]2

(r − xn)[(xn+2 − r)− 2(xn+1 − r) + (xn − r)]· 1/(r − xn+1)2

1/(r − xn+1)2

= 1 +[−1 + 1

A]2

1A

[−1A

+ 2− A]= 1 +

1A2 − 2

A+ 1

−1A2 + 2

A− 1

= 1 +1− 2A+ A2

−1 + 2A− A2= 1− A2 − 2A+ 1

A2 − 2A+ 1

= 0 if A2 − 2A+ 1 = (A− 1)2 6= 0 ie A 6= 1


• Steffensen’s Method

If Aitken acceleration is applied after every second iteration of a fixed-point iteration, and the improvedestimate is used to begin each set of iterates, this method is known as Steffensen’s method.

x1 = g(x0) x2 = g(x1) p1 = q0

set x0 = p1 andx1 = g(x0) x2 = g(x1) p2 = q0

set x0 = p2 and repeat to get p3, p4, ...

This generates a sequence {pn}∞n=1 that converges quadratically:

|r − pn+1| → K · |r − pn|2

• Example Steffensen’s method

Considerxn+1 = g(xn) = xn + f(xn)

Notice a zero of f(x) is associated with a fixed point of g. However, g′(r) = 1 + f ′(r), therefore itis unlikely that this iteration will converge because |g′(r)| may be ≥ 1. If it does it will most likelyconverge linearly because g′(r) probably does not equal zero. There is a theorem that states if g′(r) 6= 1and g ∈ C3[a, b] then Steffensen’s method will converge quadratically if x0 is chosen close enough to r.

Applying Aitken’s accelerator via Steffensen’s method we start with x0 then

x1 = x0 + f(x0) (2.28)

x2 = x1 + f(x1) = x0 + f(x0) + f(x0 + f(x0)) (2.29)

plugging these into equation 2.26 we get:

q0 = x0 −(x1 − x0)2

x0 + f(x0) + f(x0 + f(x0))− 2(x0 − f(x0)) + x0

setting p1 = q0 and simplifying we get

p1 = x0 −[f(x0)]2

f(x0 + f(x0))− f(x0).

We can do away with the p and q notation to create the sequence

xn+1 = xn −[f(xn)]2

f(xn + f(xn))− f(xn)= G(xn)

and G defines our new fixed-point iteration.


Does our new G(x) converge quadratically? Check G′(r):

Notice

G′(x) = 1− 2f(x)f ′(x)

f(x+ f(x))− f(x)+

[f(x)]2 (f ′(x+ f(x))(1 + f ′(x))− f ′(x))

(f(x+ f(x))− f(x))2

G′(r) = 1− 2f ′(r) limx→r

f(x)

f(x+ f(x))− f(x)+ (f ′(r)(1 + f ′(r))− f ′(r)) lim

x→r

[f(x)]2

[f(x+ f(x))− f(x)]2

because f(r) = 0. Simplifying and carrying the limit inside the continuous function we get

G′(r) = 1− 2f ′(r)

(limx→r

f(x)

f(x+ f(x))− f(x)

)+ [f ′(r)]2

(limx→r

f(x)

f(x+ f(x))− f(x)

)2

.

now evaluating the limit inside the large parentheses:

limx→r

f(x)

f(x+ f(x))− f(x)= lim

x→r

f ′(x)

f ′(x+ f(x))(1 + f ′(x))− f ′(x)L’hopital

=f ′(r)

f ′(r)(1 + f ′(r))− f ′(r)

=1

f ′(r)

and putting this result back into G′(r) yields

G′(r) = 1− 2 + 1 = 0

and this new iteration converges quadratically provided r is a simple root of f .

Notes:

Aitken’s method can be applied to any linearly convergent sequence and improves the order of conver-gence. When applied to a fixed-point algorithm in the fashion described above, it is called Steffensen’smethod and results in quadratic convergence.

Chapter 3

Solving Linear Systems

Often in mathematics we are concerned with solving equations. Specifically, if we want to solve an equationfor a particular variable we are guaranteed to be able to do it provided the equation is linear in that variable.For example, consider the equation

x5 + sinx− 72y = 105− 12y. (3.1)

This equation is linear in y and it would be easy to solve this equation for y. However, the equation isnot linear in x and hence solving this equation for x would be far more difficult and, in this case, impossibleto solve algebraically. In order to solve this equation for x we would have to use a numerical method. In thischapter, we will focus on solving equations for variables that appear linearly.

Before you get too excited about this simple task, the situation is more complicated than you might firstthink. For example, consider solving the simple equation ax = b for x. If a 6= 0, then x = b/a, if a = 0 thereare two options. What are they? This same situation occurs with systems of linear equations but it is hardertell whether we have one, infinitely many, or no solutions. For example, use whatever method you can to findthe solutions to the following systems of equations.

(1)x + y = 43x + y = 4

(2)x + y = 43x + 3y = 12

(3)x + y = 43x + 3y = 10

There are many ways you could have solved the previous systems of equations. Some ways are better thanothers depending on the equations involved. In this chapter look at ways to have a computer solve a systemof equation. We start with Naive Gaussian Elimination and move on to more sophisticated methods afterthat. The problem with Naive Gaussian Elimination is that it doesn’t always work. The previous systems ofequations were two-by-two systems. This means there were two equations and two unknowns (x and y). Wewill generally be dealing with larger systems. Since we don’t want to run out of letters, we will just order ourvariables with subindexes. For now, we will look to solve n x n systems of equations written in the form

a11x1 + a12x2 + a13x3 + . . . + a1nxn = b1

a21x1 + a22x2 + a23x3 + . . . + a2nxn = b2...

......

...... =

...an1x1 + an2x2 + annx3 + . . . + annxn = bn

(3.2)

Before we get started we need to describe matrices and matrix operations.

39

40 CHAPTER 3. SOLVING LINEAR SYSTEMS

3.1 Matrices & Operations

1. Some Definitions about Matrices

(a) An m x n matrix is a rectangular array of numbers with m rows and n columns.

A =

a11 a12 a13 . . . a1n

a21 a22 a23 . . . a2n...

......

......

am1 am2 am3 . . . amn

= [aij]

(b) The dimension of A is m by n or m x n.

(c) Two matrices are equivalent if they have the same dimension and corresponding entries are equal.

(d) The matrix of all zeros is called the zero matrix denoted 0.For example

0 =

[0 00 0

]0 =

0 0 00 0 00 0 0

0 =

[0 0 0 00 0 0 0

]

(e) The n x n matrix with ones along the diagonal and zeros elsewhere is called the identity matrixand is denoted In. For example,

I2 =

[1 00 1

]I3 =

1 0 00 1 00 0 1

I4 =

1 0 0 00 1 0 00 0 1 00 0 0 1

(f) If A = [aij], the transpose of A, denoted AT , is defined by aTij = aji. (switch rows and columns).

For example,

if A =

1 2 34 5 67 8 9

then AT =

1 4 72 5 83 6 9

(g) If AT = A, then A is called symmetric. For example, A =

1 2 32 5 63 6 9

is symmetric

(h) Other types of Matrices:

Upper LowerTriangular Triangular Diagonalx x x x0 x x x0 0 x x0 0 0 x

x 0 0 0x x 0 0x x x 0x x x x

x 0 0 00 x 0 00 0 x 00 0 0 x

3.1. MATRICES & OPERATIONS 41

2. Some Operations on Matrices

(a) Scalar Multiplication: When a matrix is multiplied by a number (scalar) this is called scalarmultiplication. Each term in the matrix is multiplied by the scalar.

if A =

[1 −1 10 1 6

]then 3A =

[3 −3 30 3 18

]

(b) Matrix Addition: Matrices of the same dimensions are added term by term.

if A =

[1 −1 10 1 6

]and B =

[2 1 01 0 −1

]then A+B =

[3 0 11 1 5

]

3. Properties of Matrix Addition and Scalar Multiplication:

Assume A and B are matrices of the same dimension and r is a scalar (number).

(a) Commutative Property of Addition: A+B = B + A

(b) Associative Property of Addition: (A+B) + C = A+ (B + C)

(c) Identity property of Addition: A+ 0 = A

(d) Distributive property of scalar multiplication: r(A+B) = rA+ rB

4. Row and Column Vectors

A row vector is a 1 by n matrix:

x = [x1, x2, . . . , xn]

A column vector is an m by 1 matrix:

y =

y1

y2

y3...ym

You can get a column vector from a row vector by taking the transpose. For example,

[1, 2, 3]T =

123

.This notation is regularly used to save space.


5. Some More Operations on Vectors and Matrices

(a) The Dot Product

Two vectors of the same length may be multiplied by the dot product:

x · y = [x1, x2, . . . , xn] · [y1, y2, . . . , yn] = x1y1 + x2y2 + . . . xnyn

Example: Find x · y for x = [2, 3, 4] and y = [−1, 7,−3] Answer: x · y = −2 + 21− 12 = 7

(b) Matrix Multiplication

Two matrices can be multiplied as follows.

If A is m x n given by A = [aij] and B is n by p given by [bij] then AB = C is m by p where

Cij = i’th row of A dotted with the j’th column of B

Example: Find AB when A =

[1 −1 10 1 6

]and B =

1 −10 12 3

Answer: AB =

[1 + 0 + 2 −1− 1 + 30 + 0 + 12 0 + 1 + 18

]=

[3 112 19

]

6. Properties of Matrix Multiplication

(a) If A is an m x n matrix and B is an n x p matrix, then AB is an m x p matrix.

(b) Associative: A(BC) = (AB)C

(c) Distributive: A(B + C) = AB + AC and (A+B)C = AC +BC

(d) Identity: If A is n x n then AIn = InA = A.

(e) Zero: A 0 = 0 and 0A = 0 where 0 is the zero matrix of appropriate dimension.

(f) (AB)T = BT AT

7. Warning: Order Matters!

(a) No Commutative Law! AB 6= BA unless you get real lucky.

(b) By definition, the dot product of two vectors always yields a scalar (x · y = y · x). However, onmost software, the dot product is just a form of matrix multiplication. It depends on how youdefine your vectors but quite often x · y 6= y · x.


8. Linear Systems ⇐⇒ Matrix Equations

Any linear system can be expressed as a matrix equation.

(a) Example: The system of equations defined by

x1 − x2 + x3 + 2x4 = 1x2 + 6x3 + 2x4 = 0x1 + 7x3 + 5x4 = 3

can be expressed as Ax = b where

1 −1 1 20 1 6 21 0 7 5

x1

x2

x3

x4

=

103

A x = b.

(b) Example: Going the other way, the matrix equation Ax = b:

1 0 −32 2 90 1 5

x1

x2

x3

=

87−2

A x = b.

represents the linear system

x1 − 3x3 = 82x1 + 2x2 + 9x3 = 7

x2 + 5x3 = −2

9. Solving Linear Systems: In MATLAB R© you can solve the system of equations defined by Ax = bwith:1

x = A\b

If there is a unique solution, this works perfectly. If there are infinitely many solutions or no solutions,it gets a little tricky. We’ll investigate all three situations in the worksheet.

1Note, the backslash command in A\b is not the division symbol.


• MATLAB R© : Vectors and Vector Operations (Vectors.m)


• MATLAB R© : Matrices, Vectors as Matrices, Solving Linear Systems (Matrices.m)


Some MATLAB R© commands for vectors and matrices:

• Vectors

– v = [2 4 6] or v = [2, 4, 6] yields a row vector.

– v = [2;4;6] or v = [2 4 6]’ yields a column vector.

– v(2) yields the second element in v.

– v(2:3) = yields elements 2 through 3 of v.

– v(1) = 0 replaces the 2 with a zero

– v(4) = 0 appends a zero to v now: v = [2 4 6 0]

– [m,n] = size(v) yields (m = 1 and n = 3) or (m = 3 and n = 1)

– v = 0:0.5:2 yields v = [0 0.5 1 1.5 2]

• Matrices

– A = [1 2 3; 4 5 6; 7 8 9] or [1,2,3;4,5,6;7,8,9] (semicolon separates rows)

– [m,n] = size(A) yields m = number of rows in A and n = the number of columns in A.

– A(1,2) yields the element in row 1 and column 2 of A.

– A(:,2) yields the second column of A.

– A(2,:) yields the second row of A.

– A(1:2,3:4) = yields rows 1 to 2 and columns 3 to 4 of A.

– A([1 3 2],[1 3]) = yields rows 1 3 2 and columns 1 3 of A.

– A + B yields term by term addition (appropriate dimensions reguired)

– A * B yields normal matrix multiplication (appropriate dimensions required)

– A∧2 = A * A

– A.∧2 squares each entry in A.

– A./2 divides each entry in A by 2.

– cos(A) takes the cosine of each term in A.

– eye(n) yields the n x n identity matrix

– zeros(n,m) yields an n x m zero matrix

– ones(n,m) yields an n x m matrix of all ones.

– transpose(A) yields the transpose of A.

– A’ = conjugate transpose (or just transpose if real)

– inv(A) yields the inverse of A if one exists.

– det(A) yields the determinant of A.

– x = A\b produces a solution to Ax = b. (back-slash).

3.2. GAUSSIAN-ELIMINATION BY HAND 47

3.2 Gaussian-Elimination by Hand

If this were a course in Linear Algebra we would start by solving systems of equations by hand using GaussianElimination. We will focus on a few types of methods in this course but we will start with GaussianElimination because that provides a nice starting point. First, create the augmented matrix representingthe system of equations. This consists of the coefficient matrix, a vertical line, and the column of constants.

System of Equations Augmented Matrix

2x1 + 4x2 + 6x3 = 184x1 + 5x2 + 6x3 = 243x1 + x2 − 2x3 = 4

2 4 6 184 5 6 243 1 −2 4

We will perform Gaussian elimination on the augmented matrix. The goal is to perform various row operationswhich result in equivalent systems until the coefficient matrix has one’s on the diagonal and zeros below thediagonal.

The Idea Behind Gaussian Elimination

1. Work from the top left to the bottom right of the coefficient matrix.

2. At each column get a 1 on the diagonal and all zeros below it.

3. Continue this and try to get 1’s along the diagonal and zeros below it.

4. This is called row-echelon form.

1 ∗ ∗ ∗ ∗0 1 ∗ ∗ ∗0 0 1 ∗ ∗0 0 0 1 ∗

Once the augmented matrix is in row echelon form, we can use back substitution to solve for the variables.We’ll get to this later. There are only three types of row operations required to get the augmented matrixinto row echelon form.

ROW OPERATIONS [notation]:

1. Multiply a row by a number. [Ri → a Ri]

2. Add/subtract a multiple of one row to/from another and replace it. [Ri → Ri ± aRj]

3. Switch any two rows. [Ri ↔ Rj]


Example 1

System of Equations Augmented Matrix Row Operation(s)

2x1 + 4x2 + 6x3 = 184x1 + 5x2 + 6x3 = 243x1 + x2 − 2x3 = 4

2 4 6 184 5 6 243 1 −2 4

R1 → 1/2 R1

x1 + 2x2 + 3x3 = 94x1 + 5x2 + 6x3 = 243x1 + x2 − 2x3 = 4

1 2 3 94 5 6 243 1 −2 4

R2 → R2 − 4R1

R3 → R3 − 3R1

x1 + 2x2 + 3x3 = 9−3x2 − 6x3 = −12−5x2 − 11x3 = −23

1 2 3 90 −3 −6 −120 −5 −11 −23

R2 → −1/3 R2

x1 + 2x2 + 3x3 = 9x2 + 2x3 = 4

−5x2 − 11x3 = −23

1 2 3 90 1 2 40 −5 −11 −23

R3 → R3 − (−5) R2

x1 + 2x2 + 3x3 = 9x2 + 2x3 = 4− 1x3 = −3

1 2 3 90 1 2 40 0 −1 −3

R3 → −1 R3

x1 + 2x2 + 3x3 = 9x2 + 2x3 = 4

x3 = 3

1 2 3 90 1 2 40 0 1 3

This isRow Echelon Form

Gaussian Elimination Stops Here.You solve for the variables using back substitution. This means you start at the last equation and solvefor the last variable and work your way to the first equation substituting the values you find along the way.

• The third equation is x3 = 3 or x3 = 3.

• The second equation is x2 + 2x3 = 4 or x2 + 6 = 4 or x2 = −2.

• The first equation is x1 + 2x2 + 3x3 = 9 or x1 − 4 + 9 = 9 or x1 = 4.

• The solution is x1 = 4, x2 = −2, x3 = 3.


Example 2

System of Equations Augmented Matrix Row Operation(s)

−3x1 − 8x2 − 8x3 = −72x1 + 6x2 + 10x3 = 91x1 + 3x2 + 4x3 = 3

−3 −8 −8 −72 6 10 91 3 4 3

R1 ↔ R3

1x1 + 3x2 + 4x3 = 32x1 + 6x2 + 10x3 = 9−3x1 − 8x2 − 8x3 = −7

1 3 4 32 6 10 9−3 −8 −8 −7

R2 → R2 − 2R1

R3 → R3 + 3R1

1x1 + 3x2 + 4x3 = 32x3 = 3

x2 + 4x3 = 2

1 3 4 30 0 2 30 1 4 2

R2 ↔ R3

1x1 + 3x2 + 4x3 = 3x2 + 4x3 = 2

2x3 = 3

1 3 4 30 1 4 20 0 2 3

R3 → 1/2 R3

1x1 + 3x2 + 4x3 = 3x2 + 4x3 = 2

x3 = 3/2

1 3 4 30 1 4 20 0 1 3/2

This isRow Echelon Form

Gaussian Elimination Stops Here. You solve for the variables using back substitution.

• The third equation is 1x3 = 3/2 or x3 = 3/2.

• The second equation is x2 + 4x3 = 2 or x2 + 6 = 2 or x2 = −4.

• The first equation is x1 + 3x2 + 4x3 = 3 or x1 − 12 + 6 = 3 or x1 = 9.

• The solution is x1 = 9, x2 = −4, x3 = 3/2.

From now on, the preceding sequence of row operations will be denoted simply by

−3 −8 −8 −72 6 10 91 3 4 3

∼ 1 3 4 3

2 6 10 9−3 −8 −8 −7

∼ 1 3 4 3

0 0 2 30 1 4 2

∼ 1 3 4 3

0 1 4 20 0 2 3

∼ 1 3 4 3

0 1 4 20 0 1 3/2

and it is up to you to properly infer the associated row operations. If you can get one matrix from another

by performing any number of sequential row operations, the matrices are called row equivalent. We use thesymbol ∼ to represent row equivalence.


In both of the previous examples we were lucky. Why?

1. We didn’t have to deal with fractions. In most problems there will inevitably be more complicatedcalculations. In the future, we will be using a computer to perform Gaussian elimination. For now, theproblems won’t get too messy.

2. We were dealing with a square system of equations. This means the number of equations is thesame as the number of variables resulting in a square coefficient matrix. It is the most common typeof system of equations. This represents our best shot at getting a unique solution. We’ll deal withnon-square systems later.

3. We were able to get all ones down the diagonal of the coefficient matrix with all zeros below it. Assuch, we were able to obtain a unique solution. This doesn’t always happen. When it doesn’t we canget different solution options. This is especially frequent when dealing with non-square systems but itcan happen with square systems as well.

There are two types of systems of equations and three types of solutions.

1. A consistent system has at least one solution:

(a) A unique solution like in the previous examples, or

(b) infinitely many solutions which occur from a dependent system.

2. An inconsistent system has no solutions.

Example 3: No Solutions → An inconsistent system

2x2 + 3x3 = 42x1 − 6x2 + 7x3 = 15x1 − 2x2 + 5x3 = 10 0 2 3 4

2 −6 7 151 −2 5 10

∼ 1 −2 5 10

2 −6 7 150 2 3 4

∼ 1 −2 5 10

0 −2 −3 −50 2 3 4

∼ 1 −2 5 10

0 −2 −3 −50 0 0 −1

.The third equation says 0x1 + 0x2 + 0x3 = −1. This equation has no solutions and so the system of equationshas no solution and we call the system inconsistent.


Example 4: Infinite Number of Solutions → Consistent and Dependent System

2x1 + 4x2 + 6x3 = 184x1 + 5x2 + 6x3 = 242x1 + 7x2 12x3 = 30

2 4 6 184 5 6 242 7 12 30

∼ 1 2 3 9

4 5 6 242 7 12 30

∼ 1 2 3 9

0 −3 −6 −120 3 6 12

∼ 1 2 3 9

0 1 2 40 3 6 12

∼ 1 2 3 9

0 1 2 40 0 0 0

Here, the third equation has essentially disappeared - it is meaningless: 0x1 + 0x2 + 0x3 = 0. This hasinfinitely many solutions. So, we turn to the second equation: x2 + 2x3 = 4 and let x3 be a free variable.This is done by setting x3 = t where t represents any real number.

Let x3 = t

From the second equation:x2 + 2x3 = 4x2 = 4− 2x3

x2 = 4− 2t

From the first equation:x1 + 2x2 + 3x3 = 9x1 = 9− 2x2 − 3x3

x1 = 9− 2(4− 2t)− 3tx1 = 9− 8 + 4t− 3tx1 = 1 + t

The general solution is given in the form x1

x2

x3

=

1 + t4− 2tt

for −∞ < t <∞.

This actually represents a line in 3 space. A particular solution is found by assigning any number to theparameter t. For example, it we set t = 0, a particular solution is x1 = 1, x2 = 4, and x3 = 0.


3.3 Systems of Equations - Direct Methods

Here we investigate numerical methods for solving systems of equations which try to solve the system directly.I.e., these methods do not require iterating on a sequence of approximations. Since MATLAB R© has a greatdirect solving function utilizing the back-slash comand:

Ax = b → x = A \ b,

we will focus on the problems that can occur even with this tool at your disposal.

• Residuals:Unlike analytic solutions done by hand, numerical solutions will contain some error. We might notknow what that error is but we can always check our answer to a certain extent with the residual. Weare seeking the solution x to the equation Ax = b. If x is our computed solution then we would hopethat x ≈ x right? But we usually don’t know x. However, it should also be true that Ax ≈ b. We cancalculate this difference and it is called the residual.

If x is the computed solution to the linear system Ax = b then the residual (r) is given by

residual: r = Ax− b. (3.3)

A good solution x has a small residual right? Well what do we mean by small? There are a few waysto measure the size of a vector.

• Norms:We measure the size of a vector with a norm. You are already familiar with the L2 (Euclidean) normbut there are a couple more worth knowing.

– The L2 norm or Euclidean Norm:

||x||2 =√x2

1 + x22 + . . . x2

n MATLAB R© : norm(x) (3.4)

– The L1 norm.

||x||1 =n∑i=1

|xi| MATLAB R© : norm(x,1) (3.5)

– The L∞ norm

||x||∞ = max(|x1|, |x2|, . . . |xn|) MATLAB R© : norm(x,inf) (3.6)

Example: If x = [1, 2,−3] then

||x|| =√

14, ||x||1 = 6, ||x||∞ = 3

3.3. SYSTEMS OF EQUATIONS - DIRECT METHODS 53

• Ill-conditioning

A linear system, Ax = b is considered ill-conditioned if small changes in A or b lead to large changes inthe solution x. It is not associated with they way in which a system is solved but instead the conditioningis intrinsic to the matrix A. This can be measured by something called the condition number of thematrix A. A low condition number is good while a large condition number is bad. MATLAB R© calculatesthe condition number of a matrix with

condition number = k = cond(A)

There are actually three options

– cond(A,2) which uses the Euclidean norm (default).

– cond(A,1) which uses L1 norm.

– cond(A,inf) which uses the L∞ norm.

Example on page 111 of the text. Consider the system of equations

Ax = b →[

1.00 2.000.49 0.99

] [x1

x2

]=

[3.001.47

]The solution and condition number is given by

x = A\b =

[30

]and k = cond(A) = 622 (this is big)

Now suppose we change the 0.49 to 0.48 in the coefficient matrix.

Ax = b →[

1.00 2.000.48 0.99

] [x1

x2

]=

[3.001.47

]The solution to this slightly different system is

x = A\b =

[11

].

The large condition number of A indicates that this linear system is ill-conditioned. This can be seenby the very large difference in solutions that result from a small change in A. Loosely speaking, thecondition number gives the rate at which at the solution changes with respect to changes in A or b. Thesmallest condition number is one and

cond

([1 00 1

])= 1.


1. Naive Gaussian Elimination: Chapter 4.2 in the text.

This is just like the Gaussian Elimination by Hand except we don’t worry about getting ones along thediagonal (a machine doesn’t need this convenience and it only adds computations).

The Idea Behind Naive Gaussian Elimination

(a) Work from the top left to the bottom right of the coefficient matrix.

(b) Get zeros below each diagonal term (pivot element).

(c) Continue working top to bottom until (hopefully) you have an uppertriangular matrix.

x x x x x0 x x x x0 0 x x x0 0 0 x x

Once the augmented matrix is in this form, we use back-substitution to solve for the variables (andhope that none of the diagonal terms are zero).

Only one type of row operation in Naive Gaussian Elimination [notation]:

(a) Subtract a multiple of one row from another and replace it. [Rj → Rj −mi Ri] for i < j ≤ n

Row Reduction of the Augmented Matrix

for k = 1:n-1

for i = k+1:n

m = Aik/Akk

Aik = 0

for j = k+1:n

Aij = Aij −m Ak,j

end

bi = bi −m bk

end

end

Back Substitution

xn = bn/Ann

for i = n-1:-1:1

S = bi

for j = i+1:n

S = S − Aij xjend

xi = S/Aii

end

• The method fails if any of the diagonal (pivot) elements are zero.

• Theorem: The total number of multiplications and divisions required to obtain a solution of an nx n linear system using Naive Gaussian Elimination is

n3

3+ n2 − n

3= O(n3)

3.3. SYSTEMS OF EQUATIONS - DIRECT METHODS 55

Proof thatn3

3+ n2 − n

3= O(n3)

∣∣∣∣n3

3+ n2 − n

3

∣∣∣∣ ≤ n3

3+ n2 +

n

3

≤ n

3(n2 + 3n+ 1)

≤ n

3(n2 + 3n2) for n ≥ 1

≤ n

3(4n2)

≤ 4

3n3 = K|n3|

2. Gaussian Elimination with Scaled Partial Pivoting: Chapter 4.3 in the text.

Naive Gaussian Elimination has problems. For one, if there is a zero at any of the pivot elements, themethod fails. Additionally, there is a lot of round-off error if the pivot term is relatively small comparedto the other terms in the row. Scaled partial pivoting helps resolve both of these issues.

Ax = b →[.0001 1000

2 −20

] [x1

x2

]=

[12

]

• Naive (start with row 1): Error of 10−13

• With Scaled Partial Pivoting (start with row 2): Error = 0.

• MATLAB R© ’s backslash command does this automatically.

• The number of multiplies and divides needed to solve a system an nxn system with scaled par-tial pivoting is a small amount bigger than that for naive elimination (due to the comparisonrequirements) but is still O(n3).


3. LU factorizations: Chapter 4.4 in the text.This method is best if you have to solve Ax = b for many different b’s.The whole idea here is to solve Ax = b as follows.

• Find L and U (lower and upper triangular matrices respectively) such that2

A = LU.

• Now solve LUx = b by the sequence

solve Ly = b for y with back-substitution

solve Ux = y for x with back-substitution

• However, this decomposition might need some pivoting (row permutations).

• MATLAB R© ’s LU Decomposition with permutations. 3

[L, U, P] = lu(A)

produces L, U , and P , such that

L U = PA

• Process to solve Ax = b:

Ax = b (3.7)

PAx = Pb (3.8)

LUx = Pb (3.9)

then

solve Ly = Pb for y with back-substitution

solve Ux = y for x with back-substitution

4. Cholesky Factorization Just like an LU factorization except that L = UT (there are not necessarily onesalong the diagonal).

2Crout’s method puts ones along the diagonal of U . Doolittle’s method puts ones along the diagonal of L.3MATLAB R© ’s method puts ones along the diagonal of L.

3.4. SYSTEMS OF EQUATIONS - ITERATIVE METHODS 57

3.4 Systems of Equations - Iterative Methods

Here we investigate iterative numerical methods for solving a system of linear equations. These are callediterative because we start with an initial guess and then go through a sequence of iterations to improve uponthis guess. These methods are actually preferable in many situations due to the round-off error associatedwith direct methods. This is Chapter 4.5 in the Kharab text.

• A Demonstrative Example:Suppose we start with a 3x3 system Ax = b expressed as

a11x1 + a12x2 + a13x3 = b1

a21x1 + a22x2 + a23x3 = b2

a31x1 + a32x2 + a33x3 = b3

(3.10)

Now starting at the top row and working down, solving for xi

a11x1 = b1 − a12x2 − a13x3

a22x2 = b2 − a21x1 − a23x3

a33x3 = b3 − a31x1 − a32x2

(3.11)

and dividing by aii,x1 = (b1 − a12x2 − a13x3)/a11

x2 = (b2 − a21x1 − a23x3)/a22

x3 = (b3 − a31x1 − a32x2)/a33

(3.12)

• Notation: Since we will be creating a sequence of solution vectors (x), we need to number them.However, a vector x already has terms which include subindexes. Therefore, we will use superscripts todenote the iteration number.

x(0) → x(1) → x(2) → x(k)

• Our Two Methods:

1. The Jacobi Method

Start with an initial guess at the solution: x(0) = [x(0)1 , x

(0)2 , x

(0)3 ]T .

The sequence of solutions is generated by

x(k+1)1 = (b1 − a12x

(k)2 − a13x

(k)3 )/a11

x(k+1)2 = (b2 − a21x

(k)1 − a23x

(k)3 )/a22

x(k+1)3 = (b3 − a31x

(k)1 − a32x

(k)2 )/a33

(3.13)

For an arbitrary nxn system,

x(k+1)i =

bi −i−1∑j=1

aijx(k) −

n∑j=i+1

aijx(k)

aii.(3.14)

2. The Gauss-Seidel MethodThis is just like the Jacobi method except you use the updated values of x as they become available.

x(k+1)1 = (b1 − a12x

(k)2 − a13x

(k)3 )/a11

x(k+1)2 = (b2 − a21x

(k+1)1 − a23x

(k)3 )/a22

x(k+1)3 = (b3 − a31x

(k+1)1 − a32x

(k+1)2 )/a33

(3.15)


For an arbitrary nxn system,

x(k+1)i =

bi −i−1∑j=1

aijx(k+1) −

n∑j=i+1

aijx(k)

aii.(3.16)

• Convergence Theorem:If the matrix A is strictly diagonally dominant, then the Jacobi and Gauss-Seidel methods willconverge to the solution for any starting guess x(0).

• Strictly Diagonally DominantA matrix A is called strictly diagonally dominant if

|aii| >∑j 6=i

|aij|, for i = 1, 2, . . . , n (3.17)

• Examples with MatricesHere are two nearly identical matrices.

Diagonally Dominant

A =

−4 1 2 0 0

1 −6 2 −1 10 −2 5 1 11 −1 1 −5 11 −1 1 −2 7

NOT Diagonally Dominant

A =

1 −6 2 −1 1−4 1 2 0 0

0 −2 5 1 11 −1 1 −5 11 −1 1 −2 7

• Examples with Systems

Here are two identical systems of equations. If you tried to solve the system using one of the twoiterative methods above, the order of the equations matters.

Sequence of solutionswill converge

4x1 + x2 = 1

x1 + 3x2 = 3

Sequence of solutionswill diverge

x1 + 3x2 = 3

4x1 + x2 = 1

• Stopping CriteriaWhen you generate at sequence of solutions you want to know when to stop. There are two ways to dothis. Suppose you have a small number ε denoting a tolerance of some type.

– The residual is small.Stop when norm(Ax(k) − b) < ε

– The difference in sequential approximations is small.Stop when norm(x(k) − x(k−1)) < ε

– You should put a cap on the number of iterations you perform in case the sequence does notconverge.

3.5. SYSTEMS OF EQUATIONS - LEAST SQUARES METHOD 59

3.5 Systems of Equations - Least Squares Method

Suppose A in the systemAx = b

is singular or not square. You can find the best solution to this equation by solving

ATAx = AT b

in this equation ATA is square and nonsingular and the equation has a unique solution. It is the best solutionof the original equation in the following senses.

• If there is a unique solution, this finds it. I.e., x = x.

• If there is no solution, it finds x with the smallest residual (Ax− b) using the Euclidean norm.

• If there are infinitely many solutions, it finds the the solution with the smallest Euclidean norm.

Chapter 4

Interpolation and Approximation

Here we look at two closely related topics: Interpolation and Approximation.

• InterpolationWe are given a table of n+ 1 data points (xi, yi):

x x0 x1 x2 . . . xny y0 y1 y2 . . . yn

and seek a polynomial p of lowest degree such that

p(xi) = yi for 0 ≤ i ≤ n

Such a polynomial is said to interpolate the data. The values of x (x0, x1, x2, . . .) are called the nodes.

• ApproximationHere we try to approximate transcendental functions on an interval [a, b] with polynomials by interpo-lating the function values on a set of x-nodes where xi ∈ [a, b]. The error in the approximation occursat all of the x-values in [a, b] that are not nodes.

61

62 CHAPTER 4. INTERPOLATION AND APPROXIMATION

Before we get to the methods involved in these tasks, there is some preliminary theory and terminologyrequired.

• Fundamental Theorem of Algebra:

If P is a polynomial of degree n ≥ 1, then P (x) = 0 has at least one (possibly complex) solution.

– Corollary 1If P(x) is a polynomial of degree n ≥ 1, then there exist unique constants x1, x2, . . ., xk, possibly

complex and positive integers m1, m2, . . ., mk, such thatk∑i=1

mi = n and

P (x) = an(x− x1)m1(x− x2)m2 · · · (x− xk)mk

I.e. Every polynomial of degree n has exactly n roots counting multiplicities.

– Corollary 2Let P and Q be polynomials of degree at most n. If x1, x2, . . ., xk, with k > n are distinct numberswith P (xi) = Q(xi) for i = 1,2,. . .,k, then P (x) = Q(x) for all values of x.

I.e. If P and Q are both polynomials of degree ≤ n, and they agree at n+ 1 or more values of x.Then they are the same polynomial.

Proof: Look at F (x) = P (x) − Q(x). It is clear that F is a polynomial of degree ≤ n. It is alsoclear that F equals zero at n+ 1 values of x. By corollary 2, F can only have n roots. So F is nota polynomial of degree ≥ 1. It is constant. Since it equals zero at a few places and is constant, itmust be the constant function 0. This means P (x)−Q(x) = 0 or P (x) = Q(x).

• Algebraic Versus Transcendental Functions:

– Algebraic Functions are functions involving algebraic expressions such at polynomials, rationalpowers, rational functions such as

f(x) = 3x2 − 2x2 + 5x+ 7, f(x) =x2 + 2

x3 − 1, x2 + y2 = 1(not even a function)

Computers are pretty good at evaluating algebraic functions.

– Transcendental Functions are functions which transcend algebra. These include exponential,logarithmic, trigonometric functions and their inverses. Example include

f(x) = ex, f(x) = xπ f(x) = x1x , f(x) = sin(x), f(x) = cosh(x)

Because these functions require more than the standard operations of addition, subtraction, mul-tiplication, and division, these types of functions pose a real problem to computer evaluation.

4.1. INTERPOLATION 63

4.1 Interpolation

• Finding the interpolating polynomial using the Vandermonde matrix.

Here, pn(x) = ao + a1x+ a2x2 + . . . + anx

n where the coefficients are found by imposing p(xi) = yi foreach i = 0 to n. The resulting system is

1 x0 x20 . . . xn0

1 x1 x21 . . . xn1

1 x2 x22 . . . xn2

......

.... . .

...1 xn x2

n . . . xnn

a0

a1

a2...an

=

y0

y1

y2...yn

. (4.1)

The matrix above is called the Vandermonde matrix. If this was singular it would imply that for somenonzero set of coefficients the associated polynomial of degree ≤ n would have n + 1 zeros. This can’tbe so this matrix equation can be solved for the unknown coefficients of the polynomial.

• The Lagrange interpolation polynomial.

pn(x) = y0`0(x) + y1`1(x) + . . .+ yn`n(x) where ì(x) =n∏

j = 0j 6= i

x− xjxi − xj

The polynomials ì(x) are called the Lagrange basis functions (or cardinal functions). They have thefollowing property

ì(xj) =

{1 if i = j0 if i 6= j

(4.2)

• The Newton interpolation polynomial (Divided Differences - next section)

pn(x) = c0 + c1(x− x0) + c2(x− x0)(x− x1) + . . .+ cn(x− x0)(x− x1) · · · (x− xn−1)

where these coefficients will be found using divided differences. It is fairly clear however that c0 = y0

and c1 = y1−y0x1−x0 . For now, the remaining terms can be found by the recursive relationship described

below.

Let p0(x) = y0 = c0, and

pk(x) = pk−1(x) + ck

k−1∏i=0

(x− xi) where ck = (yk − pk−1(xk))/(k−1∏i=0

(xk − xi)) for 1 ≤ k ≤ n

• MATLAB R© ’s polyfit function

p = polyfit([x0, x1, . . . xn], [y0, y1, . . . yn], n)

then

p = [an, an−1, . . . a1, a0]


• Existence and Uniqueness Theorem

If x0, x1, . . . , xn are n+ 1 distinct real numbers, then for arbitrary values y0, y1, . . . , yn, there is a uniquepolynomial pn of degree at most n such that

pn(xi) = yi for 0 ≤ i ≤ n.

Proof: We have proven the existence by finding such polynomials. The uniqueness property comesfrom Corollary 2 from the Fundamental Theorem of Algebra.

Conclusion: All three methods of finding an interpolating polynomial result in the same polynomial,they are just expressed differently.

• Benefits of the various interpolation polynomials

– Using the Vandermonde matrix is not a very good method for any situation. The system is ill-conditioned and therefore the coefficients may be calculated very inaccurately. Also the amountof work is excessive.

– Using the Lagrange interpolating polynomial is well suited for using the same set of x-values forvarious y-values. In this case you could easily change the coefficients of the `i(x) functions to suitthe desired y values.

– Using the Newton interpolating polynomial is usually the best choice. It has the advantage thatdata pairs can be added and interpolated by merely adding one additional term to the previousinterpolating polynomial. Under other restrictions the coefficients give information about thederivatives of a function being approximated as well as the error.

4.1. INTERPOLATION 65

• Example: Find the coefficients for the third degree polynomial that interpolates the following fourpoints.

i 0 1 2 3xi 1 3 5 7yi 12 30 40 -6

1. Vandermonde Matrixpn(x) = ao + a1x+ a2x

2 + . . .+ anxn

1 1 1 11 3 9 271 5 25 1251 7 49 343

a0

a1

a2

a3

=

123040−6

. (4.3)

a = A \y = [15, -10, 8, -1]

p(x) = 15− 10x+ 8x2 − x3

2. The Lagrange interpolation polynomialThe coefficients are just the y-values that need to be interpolated.

p(x) = 12`0(x) + 30`1(x) + 40`2(x)− 6`3(x) where `i(x) =3∏

j = 0j 6= i

x− xjxi − xj

for example

l2(x) =(x− x0)(x− x1)(x− x3)

(x2 − x0)(x2 − x1)(x2 − x3)=

(x− 1)(x− 3)(x− 7)

(5− 1)(5− 3)(5− 7)

3. Newton’s interpolation polynomialNext Chapter

4. MATLAB R© ’s polyfit command

p = polyfit([1, 3, 5, 7], [12, 30, 40,−6], 3)

then

p = [−1, 8,−10, 15]


4.2 Divided Differences - Newton’s Interpolating Polynomial

• The Problem

Interpolate a function f at n + 1 distinct values of x using the Newton Interpolation Polynomial bycalculating the coefficients of this polynomial using the divided differences of f .

The divided difference polynomial is just Newton’s interpolating polynomial applied to this type ofproblem.

• Newton Interpolation Polynomial

pn(x) = c0 + c1(x− x0) + c2(x− x0)(x− x1) + . . .+ cn(x− x0)(x− x1) · · · (x− xn−1)

• Divided Difference Polynomial:

We define the first two divided differences as follows

f [x0] = c0 = f(x0) (4.4)

f [x0, x1] = c1 =f(x1)− f(x0)

x1 − x0

(4.5)

These first two are simple and you can see why the phrase divided differences is used. We will definethe remaining divided differences by

f [x0, x1, x2, . . . , xn] = cn

and the divided difference polynomial becomes

pn(x) = f [x0]+f [x0, x1](x−x0)+f [x0, x1, x2](x−x0)(x−x1)+. . .+f [x0, x1, . . . , xn](x−x0)(x−x1) · · · (x−xn−1).

• The theorem that makes calculating these divided differences possible

Divided differences satisfy

f [x0, x1, x2, . . . , xn] =f [x1, x2, . . . , xn]− f [x0, x1, . . . , xn−1]

xn − x0

(4.6)

Proof: Let pn denote the polynomial that interpolates f at x0, x1, ...xn. Let q denotethe polynomial that interpolates f at x1, x2, ...xn. I claim

pn(x) = q(x) +x− xnxn − x0

[q(x)− pn−1(x)] .

Proof of claim: both sides are polynomials of degree ≤ n and both sides evaluate tof(xi) for 0 ≤ i ≤ n. Therefore the polynomials must be the same (claim proved).Equating the coefficient of the xn on both sides yields equation (4.6) ♦.

Now we can now expand this to arbitrary starting values of xi:

f [xi, xi+1, . . . , xi+j] =f [xi+1, xi+2, . . . , xi+j]− f [xi, xi+2, . . . , xi+j−1]

xi+j − xi(4.7)

4.2. DIVIDED DIFFERENCES - NEWTON’S INTERPOLATING POLYNOMIAL 67

• Defining divided differences.

f [x0] = f(x0)

f [x0, x1] = f [x1]−f [x0]x1−x0 = f(x1)−f(x0)

x1−x0f [x0, x1, x2] = f [x1,x2]−f [x0,x1]

x2−x0f [x0, x1, x2, x3] = f [x1,x2,x3]−f [x0,x1,x2]

x3−x0... =

...

f [x0, x1, . . . , xn] = f [x1,...,xn]−f [x0,...,xn−1]xn−x0

(4.8)

• The divided differences table: Generalized Example.

An example for five data points (xi, f(xi)) 0 ≤ i ≤ 4:

x0 f(x0) f [x0, x1] f [x0, x1, x2] f [x0, x1, x2, x3] f [x0, x1, x2, x3, x4]x1 f(x1) f [x1, x2] f [x1, x2, x3] f [x1, x2, x3, x4]x2 f(x2) f [x2, x3] f [x2, x3, x4]x3 f(x3) f [x3, x4]x4 f(x4)

(4.9)

You are given the data in the first two columns and have to determine the remaining columns in orderby the algorithm defined by equation (4.7) repeated below:

f [xi, xi+1, . . . , xi+j] =f [xi+1, xi+2, . . . , xi+j]− f [xi, xi+2, . . . , xi+j−1]

xi+j − xi

The coefficients in the divided difference polynomial and hence the Newton Interpolation Polynomialare given by the first row in the above table.

• In Simpler form:

x0 c00 c01 c02 c03 c04

x1 c10 c11 c12 c13

x2 c20 c21 c22

x3 c30 c31

x4 c40

(4.10)

Again, you are given the data in the first two columns and the first row of c values are the coefficientsof the divided difference polynomial.

Here’s an algorithm for creating the above table given the vector x and the first column of c.

for j=1:4for i = 0:n-jci,j = (ci+1,j−1 − ci,j−1)/(xi+j − xi)

end (i loop)end (j loop)


• The divided differences table: Specific Example.Find the coefficients for the third degree Newton interpolating polynomial for the following four points.

i 0 1 2 3xi 1 3 5 7yi 12 30 40 -6

Divided Difference Chart.i xi f [xi] f [xi, xi+1] f [xi, xi+1, xi+2] f [xi, xi+1, xi+2, xi+3]

0 1 12 30−123−1

= 9 5−95−1

= −1 −7−(−1)7−1

= −1

1 3 30 40−305−3

= 5 −23−57−3

= −7

2 5 40 −6−407−5

= −23

3 7 −6

For the Newton interpolating polynomial

pn(x) = c0 + c1(x− x0) + c2(x− x0)(x− x1) + c3(x− x0)(x− x1)(x− x2)

the coefficients arei = 0 i = 1 i = 2 i = 3

Newton (ci) 12 9 -1 -1

• Theorem on derivatives and divided differences (useful later in the course)

If f ∈ Cn[a, b] and if x0, x1, . . . , xn are distinct points in [a, b] then there is an η in (a, b) such that

f [x0, x1, . . . , xn] =1

n!fn(η)

Proof: Let p be the polynomial of degree ≤ n − 1 that interpolates f at the nodesx0, x1, . . . , xn−1. By the approximation error theorem there exists an η in (a,b)such that

f(xn)− p(xn) =1

n!fn(η)

n−1∏j=0

(xn − xj).

By the above theorem this can be expressed as

f(xn)− p(xn) = f [x0, x1, . . . , xn]n−1∏j=0

(xn − xj)

Setting these equal to each other yields the desired result ♦.

This theorem will help us out later when trying to approximate nth order derivatives.

4.3. CHEBYSHEV NODES 69

4.3 Chebyshev Nodes

• Chebyshev Polynomials

T0(x) = 1

T1(x) = x

Tn+1(x) = 2xTn(x)− Tn−1(x)

It can be shown (not easily) that

Tn(x) = cos(n cos−1(x)) for x ∈ [−1, 1]

• Minimizing Error by Choosing Chebyshev Nodes

In the Approximation Error Theorem we saw

En(x) =f (n+1)(η)

(n+ 1)!

n∏i=0

(x− xi)

The second term in the product on the right hand side can be minimized by choosing scaled values ofthe Chebyshev Nodes. These x values are chosen as the roots of Tn+1(x) with the result that

xi = cos

(2i+ 1

2(n+ 1)π

)for 0 ≤ i ≤ n where x ∈ [−1, 1] (4.11)

If we are dealing with a different interval: suppose [a, b] instead of [−1, 1] we make the transformation:

ti = a+b− a

2(xi + 1).

If we now choose ti as the nodes over which we interpolate our function, the error in the approximationshould be minimized.

• Example: Suppose we want to interpolate a function over the interval [0,4] with 5 even-spaced nodesand 5 Chebyshev nodes.

– Even-spaced nodes: [x0, x1, x2, x3, x4] = [0, 1, 2, 3, 4]

– Chebyshev Nodes: In equation (4.11), n = 4. The final nodes for interpolation are the ti’s.i xi ≈ xi ti0 cos

(π10

)0.9511 3.9021

1 cos(

3π10

)0.5878 3.1756

2 cos(

5π10

)0 2.0000

3 cos(

7π10

)−0.5878 0.8244

4 cos(

9π10

)−0.9511 0.0979


4.4 Cubic Spline Interpolation

The previous sections dealt with interpolating points on the interval [a, b] with polynomials of degree oneless than the number of points being. Here we break up the interval [a, b] into subintervals and perform apiecewise polynomial interpolation. A cubic spline (S) has the following properties.

1. S interpolates the points (xi, yi).

2. On each subinterval [xi, xi+1], S is represented by a cubic polynomial Si.

3. S, S ′, and S ′′ are continuous on [a, b].

Suppose you break up the interval [a,b] into n subintervals defined by the nodes: a = x1, x2, . . ., xn, xn+1 = b.This gives you n+ 1 nodes and n intervals. On each subinterval [xi, xi+1] we define

Si(x) = ai(x− xi)3 + bi(x− xi)2 + ci(x− xi) + di for i = 1, 2, . . . , n. (4.12)

Then the cubic spline S is defined by

S(x) = Si(x) for x ∈ [xi, xi+1].

We now have 4n unknown coefficients and we impose the following 4n− 2 constraints.

1. Interpolating property at left end: n equations

Si(xi) = yi for i = 1, 2, . . . , n (4.13)

2. Interpolating property at right end: n equations

Si(xi+1) = yi+1 for i = 1, 2, . . . , n (4.14)

3. Continuity of first derivatives at interior nodes: n− 1 equations

S ′i(xi+1) = S ′i+1(xi+1) for i = 1, 2, . . . , n− 1 (4.15)

4. Continuity of second derivatives at interior nodes: n− 1 equations

S ′′i (xi+1) = S ′′i+1(xi+1) for i = 1, 2, . . . , n− 1 (4.16)

Making the following definitions will help out later:

Ri = S ′′i (xi) and hi = xi+1 − xi for i = 1, 2, . . . n

and finally Rn+1 = S ′′n(xn+1). We get the following forms for the coefficients.

• From continuity on the left (equation (4.13)):

di = yi (4.17)

• Differentiating Si(x) twice and evaluating at xi:

bi =Ri

2(4.18)

4.4. CUBIC SPLINE INTERPOLATION 71

• From continuity of the second derivative (equation (4.16)):

ai =Ri+1 −Ri

6hi(4.19)

• From interpolating on the right (equation 4.14)):

ci =yi+1 − yi

hi− 2hiRi + hiRi+1

6(4.20)

• From continuity of the first derivative (equation (4.15)) at the n− 1 internal nodes:

hi−1Ri−1 + 2(hi−1 + hi)Ri + hiRi+1 = 6

(yi+1 − yi

hi− yi − yi−1

hi−1

)for i = 2, 3, . . . , n (4.21)

Equation (4.21) represents n− 1 nodal equations. There are n+ 1 unknown Ri’s (including Rn+1). Twomore constraints will be imposed later. After all Ri’s are determined, the coefficients of each polynomialare determined by equations (4.17)-(4.20)

Casting equation (4.21) into matrix form yields the n− 1 by n+ 1 system.

h1 2(h1 + h2) h2 0 . . . 0 0 0 00 h2 2(h2 + h3) h3 . . . 0 0 0 0

.

.

.

.

.

.

.

.

.0 0 0 0 . . . hn−2 2(hn−2 + hn−1) hn−1 00 0 0 0 . . . 0 hn−1 2(hn−1 + hn) hn

R1R2

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.Rn

Rn+1

= 6

y3−y2h2

− y2−y1h1

y4−y3h3

− y3−y2h2

.

.

.

.

.

.

.

.

.yn−yn−1

hn−1−

yn−1−yn−2hn−2

yn+1−ynhn

−yn−yn−1

hn−1

We need 2 more constraints (equations). There are many different ways of imposing these extraconstraints, we will consider two.

• Natural Spline (most popular)R1 = 0 and Rn+1 = 0

This makes the end cubics approach linearity (zero curvature) at their extremities. It removes the firstand last column from the matrix and removes R1 and Rn+1 from the unknowns. This results in thetri-diagonal n− 1 by n− 1 system:

2(h1 + h2) h2 0 0 0 . . . 0 0h2 2(h2 + h3) h3 0 0 . . . 0 00 h3 2(h3 + h4) h4 0 . . . 0 0.........0 0 0 . . . 0 hn−2 2(hn−2 + hn−1) hn−1

0 0 0 . . . 0 0 hn−1 2(hn−1 + hn)

R2

R3

...

.

..

...

...Rn−1

Rn

= 6

y3−y2h2

− y2−y1h1

y4−y3h3

− y3−y2h2

...

...

...yn−yn−1

hn−1− yn−1−yn−2

hn−2yn+1−yn

hn− yn−yn−1

hn−1

And all of the 4n coefficients can be found in terms of these Ri’s by equations (4.17)-(4.20).


• Periodic Spline (good for closed curves):Here it is required before hand that yn+1 = y1. If this is not the case, you can add a data point so thatit is. Recall from the previous pages:

The standard nodal equations forcing continuity of the derivatives:

hi−1Ri−1 + 2(hi−1 + hi)Ri + hiRi+1 = 6

(yi+1 − yi

hi− yi − yi−1

hi−1

)for i = 2, 3, . . . , n (4.22)

This Equation in Matrix Form:

h1 2(h1 + h2) h2 0 . . . 0 0 0 00 h2 2(h2 + h3) h3 . . . 0 0 0 0

.

.

.

.

.

.

.

.

.0 0 0 0 . . . hn−2 2(hn−2 + hn−1) hn−1 00 0 0 0 . . . 0 hn−1 2(hn−1 + hn) hn

R1R2

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.Rn

Rn+1

= 6

y3−y2h2

− y2−y1h1

y4−y3h3

− y3−y2h2

.

.

.

.

.

.

.

.

.yn−yn−1

hn−1−


yn+1−ynhn

−yn−yn−1

hn−1

(4.23)

For a smooth curve to run from yn to yn+1 = y1 then to y2 again we will require that:

S ′1(x1) = S ′n(xn+1) and S ′′1 (x1) = S ′′n(xn+1)

The second of these means R1 = Rn+1. Making this substitution in the last equation allows us to eliminatethe last column in the matrix (all zeros), to eliminate Rn+1, and then put hn at the bottom of the first column.

Applying equation (4.22) for continuity of derivatives at i = 1 you get

hnRn + 2(hn + h1)R1 + h1R2 = 6

(y2 − y1

h1

− y1 − ynhn

)(4.24)

Now make this a new equation added at the top. This yields the n by n almost tri-diagonal symmetricsystem:

2(hn + h1) h1 0 . . . 0 0 0 hnh1 2(h1 + h2) h2 0 . . . 0 0 00 h2 2(h2 + h3) h3 . . . 0 0 0

.

.

.

.

.

.

.

.

.0 0 0 0 . . . hn−2 2(hn−2 + hn−1) hn−1hn 0 0 0 . . . 0 hn−1 2(hn−1 + hn)

R1R2

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.Rn

= 6

y2−y1h1

− y1−ynhn

y3−y2h2

− y2−y1h1

y4−y3h3

− y3−y2h2

.

.

.

.

.

.

.

.

.yn−yn−1

hn−1−


yn+1−ynhn

−yn−yn−1

hn−1

This represents n equations in the n unknowns S1 to Sn.The last equation is

Rn+1 = R1

And all of the 4n coefficients can be found in terms of these Ri’s by equations (4.17)-(4.20).

4.4. CUBIC SPLINE INTERPOLATION 73

1. Example: Interpolate the following points with a natural cubic spline and a periodic cubic spline.

i 1 2 3 4 5 6 7xi 0 1 2 3 4 5 6yi 4 2 1 0 1 3 4

Here, n = 6.

(a) Natural Cubic Spline: R1 = 0 and R7 = 0 the other R’s are found by solving the 5 x 5 system:

4 1 0 0 01 4 1 0 00 1 4 1 00 0 1 4 10 0 0 1 4

R2

R3

R4

R5

R6

= 6

1021−1

And all of the 4n coefficients can be found in terms of these Ri’s by equations (4.17)-(4.20).Notice, the slopes do not match up at the extremities.

(b) Periodic Cubic Spline: Notice y7 already equals y1 as needed.The first 6 R’s come from solving the 6 x 6 system

4 1 0 0 0 11 4 1 0 0 00 1 4 1 0 00 0 1 4 1 00 0 0 1 4 11 0 0 0 1 4

R1

R2

R3

R4

R5

R6

= 6

−31021−1

and R7 = R1. The cubic coefficients can be found in terms of these Ri’s by equations (4.17)-(4.20).Notice, the slopes do match up at the extremities.

2. Example: Closed Curve:Use a periodic cubic spline to smoothly connect the points given below as a closed curve.

i 1 2 3 4 5 6 7xi -2 0 2 1 0 -1 -2yi 7 5 7 8 10 8 7

Notice, the first and last point are the same. Youmight have to force this if it is not already thecase. This is done by adding an additional pointthat is the same as the first. More details on thenext page


Details on the closed curve periodic cubic spline:Here you need the last point (x7, y7) = (x1, y1). If that is not the case then you have to a last pointthat is the same as the first. After that, introduce a new independent variable (t) and let it take onany arbitrary increasing values such as t1 = 1, t2 = 2, . . ., t7 = 7, and consider x and y functions of t.Then you create two periodic splines, one for x called Sx(t) and one for y called Sy(t) that interpolatethe given values.

• Create ti’s and interpolate xi’s with periodic Sx(t):

i 1 2 3 4 5 6 7ti 1 2 3 4 5 6 7xi -2 0 2 1 0 -1 -2

4 1 0 0 0 11 4 1 0 0 00 1 4 1 0 00 0 1 4 1 00 0 0 1 4 11 0 0 0 1 4

Rx1

Rx2

Rx3

Rx4

Rx5

Rx6

= 6

30−3000

Then Ry7 = Ry1 and the coefficients of Sxi are determined by equations (4.17)-(4.20).

• Create ti’s and interpolate yi’s with periodic Sy(t):

i 1 2 3 4 5 6 7ti 1 2 3 4 5 6 7yi 7 5 7 8 10 8 7

4 1 0 0 0 11 4 1 0 0 00 1 4 1 0 00 0 1 4 1 00 0 0 1 4 11 0 0 0 1 4

Ry1

Ry2

Ry3

Ry4

Ry5

Ry6

= 6

−14−11−46

Then Rx7 = Rx1 and the coefficients of Sxi are determined by equations (4.17)-(4.20).

• Now instead of plotting (t,x(t)) and (t,y(t)), just plot (x(t),y(t)) to get the closed curve depictedon the previous page.

Chapter 5

Numerical Differentiation

In Calculus you learned many ways to differentiate a function. However, sometimes we don’t have an explicitfunction to differentiate and sometimes we just have data points. But more importantly we will estimatederivatives when we start to solve differential equations of the form

y′ = f(y) such as y′ = 3y

y′ = f(x) such as y′ = ex2

y′ = f(x, y) such as y′ = 3y + ex2

ay′′ + by′ + cy = f(x) such as y′′ + 2y′ + y = cos(x)

In solving differential equations we will want to approximate the derivatives involved. All such derivativesare defined in terms of difference quotients.

• Difference Quotients: f(x)−f(a)x−a .

The derivative is, by definition, the limit of difference quotients:

f ′(x) = limh→0

f(x+ h)− f(x)

h

provided this limit exists.

So it is safe to assume that

f ′(x) ≈ f(x+ h)− f(x)

hfor small h. (5.1)

The above formula is called a forward difference formula. We have two main questions to ask aboutsuch difference formulas.

1. How small shall we make h?

2. How accurate is the approximation?

Bad answer to Question (1): The smaller the better

Roundoff error and loss of significance can occur in the numerator of equation 5.1.

75

76 CHAPTER 5. NUMERICAL DIFFERENTIATION

• Example: Let’s start with a simple function.

f(x) = x4

f ′(x) = 4x3

f ′(2) = 32

We will investigate the accuracy of the forward difference approximation to this derivative.

Let hk = 10−k and define the difference approximation to the derivative by Dk and error by Ek.

hk = 10−k

Dk =f(2 + hk)− f(2)

hk

error = Ek = |Dk − 32|

The graph below shows that the error decreases as h decreases from 10−1 to 10−8 but then increaseswhen h is smaller than 10−8. Moral: smaller h is not necessarily better! This graph was generatedusing MATLAB R© working at full precision, this is not a contrived machine with small precision.

Goal: Develop formulas that give good accuracy for large h values.

5.1. DIFFERENTIATION FORMULAS 77

5.1 Differentiation Formulas

Here is a list of various numerical differentiation formulas. In these, we are evaluating a derivative at xi usingfunction evaluations at xi and other nearby xi’s as defined by

xi+1 = xi + h

xi−1 = xi − hxi+2 = xi + 2h

xi−2 = xi − 2h

xi±k = xi ± kh

and finally, the function evaluated at these values of x are defined by

fi = f(xi)

The order of the method describes the error in terms of the power of the step size (h). If D is thedifference approximation to the derivative, the error is denoted (E) and

E = |f (n)(xi)−D|

then a method is called O(hn) if

E → Chn as h→ 0

where C is a positive constant. It should be pointed out that this relationship only holds until h gets so smallthat roundoff error ruins it.

• Forward-Difference Formulas of Order h : O(h)

f ′(xi) ≈fi+1 − fi

h

f ′′(xi) ≈fi+2 − 2fi+1 + fi

h2

f (3)(xi) ≈fi+3 − 3fi+2 + 3fi+1 − fi

h3

f (4)(xi) ≈fi+4 − 4fi+3 + 6fi+2 − 4fi+1 + fi

h4

• Central-Difference formulas of Order h2: O(h2)

f ′(xi) ≈fi+1 − fi−1

2h

f ′′(xi) ≈fi+1 − 2fi + fi−1

h2

f (3)(xi) ≈fi+2 − 2fi+1 + 2fi−1 − fi−2

2h3

f (4)(xi) ≈fi+2 − 4fi+1 + 6fi − 4fi−1 + fi−2

h4


• Forward-Difference formulas of Order h2: O(h2)

f ′(xi) ≈−fi+2 + 4fi+1 − 3fi

2h

f ′′(xi) ≈−fi+3 + 4fi+2 − 5fi+1 + 2fi

h2

f (3)(xi) ≈−3fi+4 + 14fi+3 − 24fi+2 + 18fi+1 − 5fi

2h3

f (4)(xi) ≈−2fi+5 + 11fi+4 − 24fi+3 + 26fi+2 − 14fi+1 + 3fi

h4

• Backward-Difference formulas of Order h2: O(h2)

f ′(xi) ≈3fi − 4fi−1 + fi−2

2h

f ′′(xi) ≈2fi − 5fi−1 + 4fi−2 − fi−3

h2

f (3)(xi) ≈5fi − 18fi−1 + 24fi−2 − 14fi−3 + 3fi−4

2h3

f (4)(xi) ≈3fi − 14fi−1 + 26fi−2 − 24fi−3 + 11fi−4 − 2fi−5

h4

• Central-Difference formulas of Order h4: O(h4)

f ′(xi) ≈−fi+2 + 8fi+1 − 8fi−1 + fi−2

12h

f ′′(xi) ≈−fi+2 + 16fi+1 − 30fi + 16fi−1 − fi−2

12h2

f (3)(xi) ≈−fi+3 + 8fi+2 − 13fi+1 + 13fi−1 − 8fi−2 + fi−3

8h3

f (4)(xi) ≈−fi+3 + 12fi+2 − 39fi+1 + 56fi − 39fi−1 + 12fi−2 − fi−3

6h4

5.2. NUMERICAL DIFFERENTIATION EXAMPLES 79

5.2 Numerical Differentiation Examples

Here we demonstrate some methods on f(x) = x4 at x = 2where h = .1 and

fi±n = f(2± nh)

• First Derivatives: f ′(x) = 4x3 and D approximates f ′(2) = 32

– Forward-Difference Formula of Order h : O(h)

D =fi+1 − fi

h= 34.4810

– Central-Difference formula of Order h2: O(h2)

D =fi+1 − fi−1

2h= 32.0800

– Forward-Difference formula of Order h2: O(h2)

D =−fi+2 + 4fi+1 − 3fi

2h= 31.8340

– Backward-Difference formula of Order h2: O(h2)

D =3fi − 4fi−1 + fi−2

2h= 31.8460


D =−fi+2 + 8fi+1 − 8fi−1 + fi−2

12h= 32.0000

• Second Derivatives: f ′′(x) = 12x2 and D approximates f ′′(2) = 48

– Forward-Difference Formula of Order h : O(h)

D =fi+2 − 2fi+1 + fi

h2= 52.9400


D =fi+1 − 2fi + fi−1

h2= 48.0200

– Forward-Difference formula of Order h2: O(h2)

D =−fi+3 + 4fi+2 − 5fi+1 + 2fi

h2= 47.7800

– Backward-Difference formula of Order h2: O(h2)

D =2fi − 5fi−1 + 4fi−2 − fi−3

h2= 47.7800


D =−fi+2 + 16fi+1 − 30fi + 16fi−1 − fi−2

12h2= 48.0000


• Numerically verifying the order of the method:Verify the central difference formula for f ′(x) is O(h2)

f(x) = x4

f ′(x) = 4x3

f ′(2) = 32

Let hk = 10−k and define the difference approximation to the derivative by Dk and error by Ek:

Dk =f(2 + hk)− f(2− hk)

2hk

Ek = |Dk − 32|

We claim that Ek → Ch2k as hk → 0.

Consider y = log10(Ek)

y = log10(Ek) (5.2)

≈ log10(Ch2k) (5.3)

= log10C + 2 log10(hk) (5.4)

So, if the method is truly O(h2) then when you plot log10(Ek) -vs log10(hk) the slope of this relationshipshould be 2. The graph below shows that as h decreases from 10−1 to 10−5 the slope is about 2.However, for 0 < hk < 10−6, the roundoff error and resulting loss of precision ruins the relationship.Again, smaller is not always better.

• Homework: Demonstrate that the O(h2) central difference approximation to f ′′(x) is O(h2) until theloss of precision ruins it for h too small. You can use the same function f(x) = x4 and approximatef ′′(2) = 48.

• Student Demonstration 1: Demonstrate the order of error for one of the central difference O(h4) formu-las.

• Student Demonstration 2: Describe, develop, and demonstrate Richardson’s O(h4) method described inthe text.

5.3. DIFFERENCE FORMULA DERIVATIONS 81

5.3 Difference Formula Derivations

• Central Difference Formula for f ′(xi) of order h2. (Two function evaluations)

Assume f ∈ C3[a, b] and x− h, x, and x+ h ∈ [a, b]. Expanding f(x+ h) and f(x− h) about x:

f(x+ h) = f(x) + f ′(x)h+f ′′(x)h2

2!+f (3)(ξ1)h3

3!where ξ1 ∈ [x, x+ h] (5.5)

f(x− h) = f(x)− f ′(x)h+f ′′(x)h2

2!− f (3)(ξ2)h3

3!where ξ2 ∈ [x− h, x] (5.6)

Subtracting (5.6) from (5.5) yields

f(x+ h)− f(x− h) = 2f ′(x)h+

(f (3)(ξ1) + f (3)(ξ2)

)h3

3!.

Since f (3) is continuous, the intermediate value theorem implies

f (3)(ξ1) + f (3)(ξ2)

2= f (3)(ξ) where ξ ∈ [x− h, x+ h].

Now solving for f ′(x) yields

f ′(x) =f(x+ h)− f(x− h)

2h− f (3)(ξ)

3!h2

f ′(xi) =fi+1 − fi−1

2h− f (3)(ξ)

3!h2

f ′(xi) =fi+1 − fi−1

2h+ O(h2)

as desired.

• Central Difference Formula for f ′′(xi) of order h2. (Three function evaluations)

Adding (5.6) and (5.5) (where one more term is retained) yields

f(x+ h) + f(x− h) = 2f(x) + f ′′(x)h2 +

(f (4)(ξ1) + f (4)(ξ2)

)h4

4!

= 2f(x) + f ′′(x)h2 +2f (4)(ξ)h4

4!where ξ ∈ [x− h, x+ h].

Now solving for f ′′(x) yields

f ′′(x) =f(x+ h)− 2f(x) + f(x− h)

h2− f (4)(ξ)

12h2

f ′′(xi) =fi+1 − 2fi + fi−1

h2+ O(h2)


• Central difference formula for f ′(xi) of order h4 (Four function evaluations)

Expanding f(x+ h) , f(x− h), f(x+ 2h), and f(x− 2h) in Taylor Series’ about x and subtracting theresults yields

f(x+ h)− f(x− h) = 2f ′(x)h+2f (3)(x)h3

3!+

2f (5)(ξ1)h5

5!where ξ1 ∈ [x− h, x+ h] (5.7)

f(x+ 2h)− f(x− 2h) = 4f ′(x)h+16f (3)(x)h3

3!+

64f (5)(ξ2)h5

5!. where ξ2 ∈ [x− 2h, x+ 2h](5.8)

Multiplying equation (5.7) by 8 and subtracting equation (5.8) yields

−f(x+ 2h) + 8f(x+ h)− 8f(x− h) + f(x− 2h) = 12f ′(x)h+

(16f (5)(ξ1)− 64f (5)(ξ2)

)h5

120

or

f ′(x) =−f(x+ 2h) + 8f(x+ h)− 8f(x− h) + f(x− 2h)

12h+

(64f (5)(ξ2)− 16f (5)(ξ1)

)h4

12 · 120

f ′(xi) =−fi+2 + 8fi+1 − 8fi−1 + fi−2

12h+

(64f (5)(ξ2)− 16f (5)(ξ1)

)h4

12 · 120

f ′(xi) =−fi+2 + 8fi+1 − 8fi−1 + fi−2

12h+ O(h4).

Note:

64f (5)(ξ2)− 16f (5)(ξ1) = 16(4f (5)(ξ2)− f (5)(ξ1)

)= 48

(4

3f (5)(ξ2)− 1

3f (5)(ξ1)

)

Since f (5) is continuous, if h is chosen small enough, ε will be such that(4

3f (5)(ξ2)− 1

3f (5)(ξ1)

)=

(4

3(f (5)(ξ1) + ε)− 1

3f (5)(ξ1)

)= f (5)(ξ1) +

4ε

3= f (5)(ξ)

and the difference formula (with truncation error) can be written:

f ′(xi) =−fi+2 + 8fi+1 − 8fi−1 + fi−2

12h+f (5)(ξ)

30h4 where ξ ∈ [xi−2, xi+2]

5.4. OPTIMAL STEP SIZE 83

5.4 Optimal Step Size

• Analytic Method for a Specific Difference Scheme.

Example: central difference for f ′(xi), O(h2).

f ′(xi) ≈fi+1 − fi−1

2h

f ′(xi) =fi+1 − fi−1

2h+ E(f, h)

where E(f, h) is the total error:

E(f, h) = Eround(f, h) + Etrunc(f, h)

=ei+1 − ei−1

2h− h2f (3)(ξ)

6

Let ε be a maximum of the rounding and/or measurement error. Then

|ei−1| ≤ ε and |ei+1| ≤ ε.

Let M be the max of the 3rd derivative of f on [a,b]:

M = maxa≤x≤b

{|f (3)(x)|}.

Now we can say

|E(f, h)| ≤ ε

h+Mh2

6= g(h).

The goal is then to minimize g(h) by differentiating it and setting the result equal to zero.

g′(h) = − ε

h2+Mh

3= 0

yields

hopt =

(3ε

M

)1/3

• Numerical Method for an Arbitrary Difference Scheme.

Define a decreasing sequence of step sizes {hk}nk=1 such that hk → 0, and let

Dk = approximation to the derivative.

Dk should be computed until|DN+1 −DN | ≥ |DN −DN−1|.

ThenhN ≈ the optimum step size = hopt.


Example: Approximate the optimal step size for estimating the the derivative of f(x) = ex over theinterval [0,2] using second order (O(h2)) central differencing and assume machine round-off error to havea maximum of 10−15 (appropriate for MATLAB where machine zero is 10−16). Check this numericallywith a graph of log(h) -vs- log(error) at xi = 1.

f ′(xi) =yi+1 − yi−1

2h+ E(f, h)

|ei−1| ≤ ε = 10−15

|ei+1| ≤ ε = 10−15

M = max0≤x≤2

{|f (3)(x)|} = max0≤x≤2

{|ex|} = e2

hopt =

(3ε

M

)1/3

≈ 10−5.13

ndfig2.eps

Notes:

– The slope of log(h) -vs- log(error) at xi = 1 is 2 as should be the case with an order h2 method.

– This slope is only valid until round-off error starts effecting the results. Ie: For h < hopt.

– If there was no round-off error, the slope would be 2 for the entire domain of h values.

– You can’t do much better than 10−10 error with our resources and an order h2 method.

– If you want better accuracy, you should use a higher order differencing scheme.

– Numerical method of determing hopt suggests 10−5.5 is the optimal value.

Chapter 6

Numerical Integration

Calculus is generally broken into two parts; Differential Calculus (Derivatives) and Integral Calculus (Inte-grals). The two parts are linked by The Fundamental Theorem of Calculus. When you study IntegralCalculus you generally study two forms of integration.

• Indefinite Integrals also called anti-derivatives.∫f(x) dx = F (x) + C where F ′(x) = f(x).

Here, F (x) is called an antiderivative of f(x). For example∫(x2 + 5) dx =

x3

3+ 5x+ C

• Definite Integrals∫ b

a

f(x) dx = The signed area below f(x) above the x-axis between x = a and x = b

The Fundamental Theorem of Calculus tells us∫ b

a

f(x) dx = F (b)− F (a) where F ′(x) = f(x)

for example ∫ 2

0

(x2 + 5) dx =

[x3

3+ 5x

]2

0

=

(23

3+ 5(2)

)−(

03

3+ 5(0)

)=

8

3+ 10 =

38

3

Problem: Sometimes you can’t find an antiderivative of f(x) so the Fundamental Theorem of Calculuscan’t help us with the definite integral. This happens in a case such as∫ 2

−2

e−x2

dx

85

86 CHAPTER 6. NUMERICAL INTEGRATION

This chapter details a couple ways to numerically solve a definite integral.

• Question: What is

∫ b

a

f(x) dx ?

Suppose the interval [a, b] is divided into n subintervals by a partition a = x0 < x1 < x2 < . . . < xn = b,where ∆xi = xi − xi−1 and x∗i is in the i’th subinterval. Then the Riemann Sum is given by

Riemann Sum = Sn =n∑i=1

f(x∗i )∆xi (6.1)

Now, assume that as n→∞, ∆xi → 0 for all i, then∫ b

a

f(x) dx = limn→∞

n∑i=1

f(x∗i )∆xi (6.2)

provided this limit exists. If this limit exists then the function is called integrable on [a, b].In this chapter we use uniform ∆xi’s called ∆x and then choose ∆x to be small and approximate thedefinite integral. As always, there are good and not-so-good ways to do this.

• Definition: Quadrature

Suppose a = x0 < x1 < . . . < xM = b. A formula of the form

Q[f ] =M∑k=0

ωkf(xk) = ω0f(x0) + . . . ωMf(xM) (6.3)

with the property that ∫ b

a

f(x) dx = Q[f ] + E[f ] (6.4)

is called a numerical integration or quadrature formula. The term E[f ] is called the truncationerror. The values {xk}Mk=0 are called the quadrature nodes, and {ωk}Mk=0 are called the weights.

• Definition: Order of precisionThe order of precision of a quadrature formula is the maximum n for which E[Pk] = 0 for all k ≤ n.

It should not be surprising then that the truncation error for a quadrature formula with a degree ofprecision n is of the form

E[f ] = kf (n+1)(ξ) (6.5)

This way, if f = Pn, E[f ] = 0.

• Definition: Newton-Cotes FormulasIf the quadrature formula is derived by interpolating M+1 evenly space points {(xk, fk)}Mk=0 with PM(x)and then integrating PM(x), the formula is called a Newton-Cotes formula.

Most quadrature formulas are Newton-Cotes formulas (even if other derivations produce the sameformula).

87

• Definition: Closed -vs- Composite Formulas

If x0 = a and xM = b the associated Newton-Cotes formula is called a closed Newton-Cotes formula.

A composite formula is a simplified sum of closed formulas. Most formulas in Calculus books arecomposite formulas such as

Trapezoid Rule:

∫ b

a

f(x)dx ≈ ∆x

2[f(x0) + 2f(x1) + 2f(x2) + . . . 2f(xM−1) + f(xM)]

Simpson’s Rule:

∫ b

a

f(x)dx ≈ ∆x

3[f(x0) + 4f(x1) + 2f(x2) + . . . 2f(xM−2) + 4f(xM−1) + f(xM)]

(Notes on this: M is even and the coefficient pattern follows 1,4,2,4,2,4,. . .,2,4,1)

• Closed Newton-Cotes Quadrature Formulas xk = x0 + kh and fk = f(xk).

– Trapezoid Rule:

∫ b

a

f(x) dx ≈ h

2(f0 + f1) Local Error = − 1

12h3f ′′(ξ)

– Simpson’s1

3Rule:

∫ b

a

f(x) dx ≈ h

3(f0 + 4f1 + f2) Local Error = − 1

90h5f (4)(ξ)

– Simpson’s3

8Rule:

∫ b

a

f(x) dx ≈ 3h

8(f0 + 3f1 + 3f2 + f3) Local Error = − 3

80h5f (4)(ξ)

– Boole’s Rule:

∫ b

a

f(x) dx ≈ 2h

45(7f0+32f1+12f2+32f3+7f4) Local Error = O(h6)f (5)(ξ)

• Orders of precision:

– Trapezoid Rule: 1

– Simpson’s 1/3 rule: 3

– Simpson’s 3/8 rule: 3

– Boole’s Rule: 4

Notice the bonus precision and local truncation error with Simpson’s 1/3 rule.


6.1 Trapezoid and Simpson’s Rules

In this section we just give the formulas for the two different methods and their errors. Derivations aredescribed in the next section.

Trapezoid RuleThis is derived by approximating f(x) with a line across each subinterval from the left endpoint to the rightendpoint.

• Trapezoid Rule: Closed FormLet h = (b− a) then

∫ b

a

f(x) dx ≈ h

2(f(a) + f(b)) (6.6)

Example: Consider

∫ 1

0

e2x dx =

[e2x

2

]1

0

=e2

2− e0

2= 3.1945∫ 1

0

e2x dx ≈ h

2

(e0 + e2

)=

1

2

(1 + e2

)= 4.1945 error = 1

• Trapezoid Rule: Local Truncation Error

Elocal =−f ′′(ξ)

12h3 where ξ ∈ (a, b) (6.7)

• Trapezoid Rule: Composite FormLet N be a fixed positive integer (number of subintervals). Let h = (b − a)/N and xi = a + i h for0 ≤ i ≤ N . Then the composite form of the trapezoid rule with equally spaced nodes is∫ b

a

f(x) dx =

∫ xN

x0

f(x) dx ≈ h

2

N∑i=1

[f(xi−1) + f(xi)] =h

2

[f(a) +

(2N−1∑i=1

f(xi)

)+ f(b)

]. (6.8)

Example: Consider

∫ 1

0

e2x dx = 3.1945 and use 4 subintervals so that h = 14

= 0.25∫ 1

0

e2x dx ≈ .25

2

(e0 + 2e.5 + 2e1 + 2e1.5 + e2

)= 3.2608 error = 0.0663

• Trapezoid Rule: Composite Truncation Error

Ecomposite =−(b− a)h2

12f ′′(η) for η ∈ (a, b) (6.9)

6.1. TRAPEZOID AND SIMPSON’S RULES 89

Simpson’s 1/3 RuleThis is derived by approximating f(x) with a quadratic across each subinterval at 3 evenly spaced nodes.

• Simpson’s 1/3 Rule: Closed FormThis is accomplished with 3 function evaluations evenly spaced in [a, b].Let x0 = a, x1 = (a+ b)/2, x2 = b, and h = (b− a)/2.

∫ b

a

f(x) dx ≈ h

3[f(x0) + 4f(x1) + f(x2)] (6.10)

Example: Consider

∫ 1

0

e2x dx =

[e2x

2

]1

0

=e2

2− e0

2= 3.1945∫ 1

0

e2x dx ≈ 0.5

3

(e0 + 4e1 + e2

)= 3.2104 error = 0.0158

• Simpson’s 1/3 Rule: Local Truncation Error

Elocal = − 1

90h5f (4)(η) where η ∈ (a, b) (6.11)

• Simpson’s 1/3 Rule: Composite FormThis requires that N is even, h = (b− a)/N and xi = a+ i h for 0 ≤ i ≤ N. Then

∫ b

a

f(x) dx =

∫ x2

x0

f(x) dx+

∫ x4

x2

f(x) dx+ . . .+

∫ xN

xN−2

f(x) dx =

n/2∑i=1

∫ x2i

x2i−2

f(x) dx

and apply the closed form of Simpson’s rule to each subinterval∫ b

a

f(x) dx ≈ h

3

N/2∑i=1

[f(x2i−2) + 4f(x2i−1) + f(x2i)]

In order to minimize function evaluations and floating point operations, the following form is desirable

∫ b

a

f(x) dx ≈ h

3

f(x0) +

2

N/2∑i=2

f(x2i−2)

+

4

N/2∑i=1

f(x2i−1)

+ f(xN)

(6.12)

Example: Consider

∫ 1

0

e2x dx = 3.1945 and use 4 subintervals so that h = 14

= 0.25∫ 1

0

e2x dx ≈ 0.25

3

(e0 + 4e.5 + 2e1 + 4e1.5 + e2

)= 3.1956 error = 0.0011

• Simpson’s 1/3 Rule: Composite Truncation Error

Ecomposite = − 1

180(b− a) h4f (4)(η) for η ∈ (a, b) (6.13)


6.2 Derivations and Error

• Consider the Lagrange interpolating polynomial and the associated basis polynomials = ì(x):

pn(x) = y0`0(x) + y1`1(x) + . . .+ yn`n(x) where ì(x) =n∏

j = 0j 6= i

x− xjxi − xj

If a function f(x) is interpolated at n+ 1 nodes (a = x0 < x1 < ... < xn = b) by a polynomial of degree≤ n then we have

p(x) =n∑i=0

f(xi)ì(x)

If we assume that f(x) ≈ p(x) on x ∈ [a, b] then assume that∫ baf(x) dx ≈

∫ bap(x) dx we arrive at the

approximation

∫ b

a

f(x) dx ≈1∑i=0

Aif(xi)

where

Ai =

∫ b

a

ì(x) dx

A formula of this form is called a Newton-Cotes formula if the nodes are equally spaced.

• Trapezoid Rule: Closed Form Newton-Cotes with two nodes x0 = a and x1 = b.

∫ b

a

f(x) dx ≈n∑i=0

Aif(xi)

A0 =

∫ b

a

`0(x) dx and A1 =

∫ b

a

`1(x) dx

`0(x) =b− xb− a

and `1(x) =x− ab− a

Its not hard to show that A0 = A1 = b−a2

and the quadrature formula becomes∫ b

a

f(x) dx ≈ b− a2

(f(a) + f(b)) (6.14)

It is exact if f is a polynomial of degree ≤ 1 because then the interpolating polynomial has no error.

6.2. DERIVATIONS AND ERROR 91

• Trapezoid Rule: Local Truncation ErrorFrom the approximation error theorem assume f ∈ C2[a, b]

f(x)− P1(x) =f ′′(η)

2(x− a)(x− b) where η ∈ (a, b)

and integrating both sides yields∫ b

a

f(x) dx− b− a2

(f(a) + f(b)) =1

2

∫ b

a

f ′′(η)(x− a)(x− b) dx.

If we can show that f ′′(η) ∈ C[a, b] and that (x− a)(x− b) does not change sign on [a, b] then we canexploit the weighted mean value theorem for integrals (6.22). This is done by noticing from above that

f ′′(η) = 2f(x)− p1(x)

(x− a)(x− b).

Letting x → a and using L’Hopital’s rule we see that the limit exists. Likewise for x → b. Obviously(x−a)(x− b) does not change sign on [a, b]. So the right hand side (truncation error) is by the weightedmean value theorem for integrals

Etrunc =f ′′(ξ)

2

∫ b

a

(x− a)(x− b) dx

=f ′′(ξ)

2

−1

6(b− a)3

=−f ′′(ξ)

12h3 where ξ ∈ (a, b) (6.15)

• Trapezoid Rule: Composite Form and Composite Error Let N be a fixed positive integer ≥ 1.Let h = (b − a)/N and xk = a + k h for 0 ≤ k ≤ N . Then the composite form of the trapezoid rulewith equally spaced nodes is∫ b

a

f(x) dx =

∫ xN

x0

f(x) dx ≈ h

2

N∑i=1

(f(xi−1) + f(xi)) =h

2

[f(a) + 2

N−1∑i=1

f(a+ i h) + f(b)

]. (6.16)

Here the far right term minimizes function evaluations and floating point operations. To find thecomposite form error, let Ei denote the closed form truncation error on each subinterval:

Ei =−f ′′(ηi)

12h3 where ηi ∈ (xi−1, xi).

Summing these and denoting this by E;

E =N∑i=1

Ei =−h3

12

N∑i=1

f ′′(ηi) =−Nh3

12(N∑i=1

f ′′(ηi))/N.

This last term is an average which is ≤ the maximum value of f ′′(x) and ≥ the minimum of f ′′(x) on[a, b] and thus by the intermediate value theorem is equal to f ′′(η) for some η ∈ (a, b). Additionally Ncan be written as (b− a)/h and the final form for the composite error is:

Ecomposite =−(b− a)h2

12f ′′(η) for η ∈ (a, b) (6.17)

Note: It is not necessary that each subinterval have equal width to use a trapezoid method. If h varies,the composite form of the truncation error is more complex.


• Simpson’s 1/3 Rule: Closed-Form The derivation of the closed form rule and the closed-form trun-cation error follows in a similar fashion to that of the trapezoid rule where the interpolating polynomialis a quadratic and interpolates f at 3 equally spaced nodes. However, deriving the closed form trun-cation error is much more complicated. These are restated without proof. Let x0 = a, x1 = (a + b)/2,x2 = b, and h = (b− a)/2.

∫ b

a

f(x) dx ≈ h

3[f(x0) + 4f(x1) + f(x2)] (6.18)

• Simpson’s 1/3 rule: Local Truncation Error

Etrunc = − 1

90h5f (4)(η) where η ∈ (a, b) (6.19)

Once these are obtained, getting the composite form and composite error is fairly straight forward

• Simpson’s 1/3 rule composite form and composite error

Requires that N is even, h = (b− a)/N and xk = a+ k h for 0 ≤ k ≤ N. Then

∫ b

a

f(x) dx =

∫ x2

x0

f(x) dx+

∫ x4

x2

f(x) dx+ . . .+

∫ xN

xN−2

f(x) dx =

n/2∑i=1

∫ x2i

x2i−2

f(x) dx

and apply the closed form of Simpson’s rule to each subinterval∫ b

a

f(x) dx ≈ h

3

N/2∑i=1

[f(x2i−2) + 4f(x2i−1) + f(x2i)]

In order to minimize function evaluations and floating point operations, the following form is desirable

∫ b

a

f(x) dx ≈ h

3

f(x0) + 2

N/2∑i=2

f(x2i−2) + 4

N/2∑i=1

f(x2i−1) + f(xN)

(6.20)

and the composite error is given by

Ecomposite = − 1

180(b− a) h4f (4)(η) for η ∈ (a, b) (6.21)

• Weighted Mean Value Theorems for Integrals

If f ∈ C[a, b], g is integrable on [a, b], and g(x) does not change sign on [a, b], then there exists a numberc ∈ (a, b) such that ∫ b

a

f(x) g(x) dx = f(c)

∫ b

a

g(x) dx. (6.22)

Chapter 7

Differential Equations

Differential equations come in many forms and orders. We will stick with ordinary differential equations here(no partial differential equations).

• Initial Value Problems

– First Ordery′ = f(x, y), y(x0) = y0

– Second Ordery′′ = f(x, y, y′), y(x0) = y0 and y′(x0) = y′0

– Second Order Linearay′′ + by′ + c = f(x) y(x0) = y0 and y′(x0) = y′0

• Boundary Value Problems

ay′′ + by′ + c = f(x) y(a) = ya and y(b) = yb

• Specifically we will deal with ordinary, first-order, initial value problems like:

y′ = f(y) such as y′ = 3y, y(0) = 10

y′ = f(x) such as y′ = ex2, y(0) = 3

y′ = f(x, y) such as y′ = −2x− y, y(0) = −1

• Example: y′ = −2x− y, y(0) = −1

Analytic Solution: y(x) = −3e−x − 2x+ 2Verify this is indeed a solution to the differential equation.

• Euler’s Method to solve y′ = f(x, y), y(x0) = y0

Let xk = x0 + kh. Euler’s method constructs a sequence yk where yk ≈ y(xk) by

yk+1 = yk + hy′k where y′k = f(xk, yk) and y0 = y(x0) (7.1)

• Derivation of Euler’s Method

1. Taylor Series Perspective

2. Graphical Perspective

3. Numerical Integration Perspective (left-hand approximation)

93

94 CHAPTER 7. DIFFERENTIAL EQUATIONS

7.1 Euler’s Method - Example

We start with an example of Euler’s method used to approximate the solution to a differential equation. Forthe sake of comparison, we start by giving the analytic (exact) solution and its graph.

Example Differential Equation:

y′ = −2x− y, y(0) = −1

Analytic Solution:

y(x) = −3e−x − 2x+ 2

Euler’s Approximation: Here we will approximate the analytic solution with Euler’s method using a stepsize h = 0.5. This will create a sequence of x values x0 = 0, xk = x0 + hk until xn = 2.

The approximation and the analytic solution will be denoted by

• Exact Solution: y(xk)

• Approximation: yk

The sequence of approximations is given by

• y0 = y(x0) given in the differential equation

• yk+1 = yk + hy′k where y′k = f(xk, yk) = −2xk − yk

This generates the following values and graph →xk yk (process) yk y(xk) (Exact)x0 = 0 y0 -1 -1x1 = 0.5 y0 + hf(x0, y0) -0.5000 -0.8196x2 = 1.0 y1 + hf(x1, y1) -0.7500 -1.1036x3 = 1.5 y2 + hf(x2, y2) -1.3750 -1.6694x4 = 2.0 y3 + hf(x3, y3) -2.1875 -2.4060

Obviously we have a lot of error here but that can be reduced by reducing the step size

7.1. EULER’S METHOD - EXAMPLE 95

• Homework Problem:Approximate solutions to the differential equation

y′ = − cos(x) y, y(0) = 2

using the step sizes (h) given below. Plot the graph of the approximation points and the exact solutiongiven by y = 2e−sin(x).

– Use a step size h = 0.20.

– Use a step size h = 0.10.

Your graphs should look like the ones below.


7.2 Euler’s Method

This is a much more elaborate description and derivation of Euler’s method.

• Theory on Initial Value Problems of the form:dy

dx= f(x, y), y(xo) = yo.

– Lipschitz Condition

A function f(t, y) is said to satisfy a Lipschitz condition in the variable y on a set D ⊂ R2 if aconstant L > 0 exists with

|f(t, y1)− f(t, y2)| ≤ L |y1 − y2|whenever (t, y1) and (t, y2) ∈ D. The constant L is called the Lipschitz constant for f .

Better than continuous in y, less than differentiable in y.

– If fy(t, y) ≤ L for all (t, y) ∈ D then f satisfies the Lipschitz condition with respect to y on D(MVT)

– If f is differentiable with respect to y on a closed interval with respect to y then f is Lipschitz iny. (MVT)

– However f(x, y) = |y| is lipschitz but not differentiable.

– Theorem (Existence and Uniqueness)

Assume f(t, y) is continuous on D = {(t, y) | to ≤ t ≤ b,−∞ < y <∞}. If f satisfies a Lipschitzcondition on D in y then the initial value problem

y′(t) = f(t, y), to ≤ t ≤ b, y(to) = yo

has a unique solution y(t) for to ≤ t ≤ b.

counter-example: y′ = y1/3, y(0) = 0 has two solutions: y = ±(

23x) 3

2

• Euler’s Method for an IVP of the form:dy

dx= f(x, y), y(x0) = y0. With conditions ensuring a

unique solution.

– Let xk = x0 + kh

– Euler’s method constructs a sequence yi where yi ≈ y(xi) by

yk+1 = yk + hy′k where y′k = f(xk, yk) and y0 = y(x0)

– Local Discretization Error (Local Truncation Error)Assume y ∈ C2[a, b]. If yk = y(xk), then the local error εk in going from yk to yk+1 by yk+1 = yk+hy

′k

isεk = y(xk+1)− yk+1 = O(h2)

ory(xk+1) = yk + hy′k +O(h2) = yk+1 +O(h2)

proven by expanding y in a Taylor Series about xk.

y(xk+1) = yk + hy′k +y′′(ξk)

2h2 where ξk ∈ (xk, xk+1)

We say, Euler’s method has a local discretization error of order h2.

7.2. EULER’S METHOD 97

– Global Discretization Error The Global discretization error is valid for all k:

Ek = y(xk)− yk for all k

– Theorem Euler’s method has a global discretization error of order h.

– Intuitive perspective:

If you are going from x0 to xN then h = (xN − x0)/N , you are taking N steps with local errorequal to order h2 = h(xN − x0)/N adding this N times yields a global discretization error of orderh. Not a formal proof because at each step the local error assumes that you are exactly correct atthe current x position .. not true. You could get accumulated error as well.

• Formal Proof: Consider dy/dt = f(t, y), y(t0) = y0 with a unique solution, Lipschitz constant L.

– Lemma 1 For all x ≥ 0 and any positive m, 0 ≤ (1 + x)m ≤ emx

proof: Taylor on ex

ex = 1 + x+1

2x2eξ

and clearly

0 ≤ 1 + x ≤ 1 + x+1

2x2eξ = ex ξ > 0

so0 ≤ (1 + x)m ≤ (ex)m = emx

– Lemma 2 If s and t are positive real numbers, {ai}ki=0 is a sequence satisfying a0 = 0, and

ai+1 ≤ (1 + s)ai + t, for each i = 0, 1, 2, . . . , k,

then

ai+1 ≤t

s

(e(i+1)s − 1

).

proof

ai+1 ≤ (1 + s)ai + t

≤ (1 + s) [(1 + s)ai−1 + t] + t

≤ (1 + s) {(1 + s) [(1 + s)ai−2 = t] + t}+ t...

≤ (1 + s)i+1a0 +[1 + (1 + s) + (1 + s)2 + . . .+ (1 + s)i

]t

≤[1 + (1 + s) + (1 + s)2 + . . .+ (1 + s)i

]t

but the term in the brackets is a geometric series summing to

i∑j=0

(1 + s)j =1− (1 + s)i+1

1− (1 + s)=

1

s

[(1 + s)i+1 − 1

]Thus

ai+1 ≤1

s

[(1 + s)i+1 − 1

]t ≤ t

s

(e(i+1)s − 1

)by Lemma 1


– Theorem Suppose the IVP satisfies the conditions ensuring a unique solution and y′′(t) ≤ M ,for all t ∈ [a, b]. Then the sequence generated by Euler’s method has a global discretization errorsastisfying

|Ek| = |y(tk)− yk| ≤hM

2L

[eL(tk−t0) − 1

]for all k

∗ When i = 0 the result is clearly true.

∗ Take Taylor of y(tk+1) and subtract yk+1 from Euler’s Method.

Ek+1 = y(tk+1)− yk+1 = y(tk)− yk + h [f(tk, y(tk))− f(tk, yk)] +h2

2y′′(ξk) (7.2)

|Ek+1| = |y(tk+1)− yk+1| ≤ |y(tk)− yk|+ h|f(tk, y(tk))− f(tk, yk)|+h2

2|y′′(ξk)| (7.3)

|Ek+1| = |y(tk+1)− yk+1| ≤ (1 + hL)|y(tk)− yk|+h2

2M (7.4)

|Ek+1| = |y(tk+1)− yk+1| ≤ (1 + hL)|Ek|+h2

2M (7.5)

letting s = hL, t = h2M/2, and ak = |Ek| (E0 = 0) in Lemma 2

|Ek+1| ≤hM

tL

(e(k+1)hL − 1

)(7.6)

Now (k + 1)h = tk+1 − t0|Ek+1| ≤

hM

2L

[eL(tk+1−t0) − 1

]In other word:

|Ek| = |y(tk)− yk| = O(h) for all k

7.2. EULER’S METHOD 99

• Stability of Euler’s Method: Consder the IVP

y′ = λy y(x0) = y0 6= 0

y0 = given (7.7)

y1 = y0 + λy0h (7.8)

= y0(1 + λh) (7.9)

y2 = y1 + λy1h (7.10)

= y1(1 + λh) (7.11)

= y0(1 + λh)2 (7.12)... =

... (7.13)

yn = (1 + λh)ny0 (7.14)

• Notice, if λ < 0 the true solution → 0.

• However, yn → 0 ⇐⇒ |1 + λh| < 1 ⇐⇒ −2 < λh < 0.

• if λh < −2 Euler’s method oscillates with greater and greater amplitude.

• Example

dy

dx= −20y y(0) = 1

If h > 0.1 you get instability.

• Example

dy

dx= −2xy

requires−2 < −2xh < 0

0 < xh < 1

This inequality determines h if you are given a domain of x-values, and determines the domain of validityif given h.


7.3 Runge Kutta Methods

• Consider Initial Value Problems of the form:dy

dx= f(x, y), y(xo) = yo.

• Modidified Euler Method (Euler Predictor-Corrector Method), Trapezoid Method,

– Local Discretization error = O(h3)

– Global Discretiziation error = O(h2)

– Average of Slopes

– FTC, Trapazoid Rule

– Compare with Euler’s Method on

dy

dx= −2x− y, y(0) = −1

– Predicting and Correcting

• Runge-Kutta Methods

Runge and Kutta are two German mathematicians who developed algorithms for numerically solvinga differential equation with approximations based on matching the first n terms of the Taylor-seriesexpansion of the solution.

• RK2: Second Order Runge Kutta Method

– The increment to y is a weighted average of two estimates of the increment called k1 and k2.

yn+1 = yn + ak1 + bk2 (7.15)

k1 = hf(xn, yn)

k2 = hf(xn + αh, yn + βk1)

NOTES

∗ k1 and k2 are estimates of the change in y when x changes by h.

∗ The first estimate is always the Euler estimate

∗ The second estimate is taken with x stepped up by α of h, and y stepped up by β of k1.

∗ Four Parameters: a, b, α, β.

∗ Chosen so that the approximation matches the Taylor expansion of y(x) best.

∗ The derivatives of y in the Taylor expansion are give by one lower TOTAL derivative of f .

Note:df

dx= fx + fy

dy

dx= fx + fyf .

– Derivation:

∗ Expand (1) with all terms included

∗ Expand bk2 in a two-variable Taylor series about (xn, yn).

∗ Expand y(xn + h) in in Taylor series about (xn, yn). Where y′ = f and y′′ = f ′.

∗ Solve for a, b, α, β

a+ b = 1 αb = 1/2 βb = 1/2

∗ let a = 1/2 you get the modified Euler method.

7.3. RUNGE KUTTA METHODS 101

• RK2 (Modified Euler, Trapazoid) Stability

dy

dx= λy = f(x, y)

yn+1 = yn +h

2(f(xn, yn) + f(xn+1, yn+1)) (7.16)

yn+1 = yn +h

2(f(xn, yn) + f(xn+1, yn + hf(xn, yn))) (7.17)

yn+1 = yn +h

2(λyn + λ(yn + hf(xn, yn))) (7.18)

yn+1 = yn +h

2(λyn + λ(yn + hλyn)) (7.19)

yn+1 = yn

(1 + λh+

(λh)2

2

)(7.20)

yields

yn = y0

(1 + λh+

(λh)2

2

)ntherefore

−1 < 1 + λh+(λh)2

2< 1

and−2 < λh < 0 same as with Euler’s method

• RK4

– Same technique only you compare terms through h4.

– LDE = O(h5)

– GDE = O(h4)

– Very Popular

– Algorithm

yn+1 = yn +1

6(k1 + 2k2 + 2k3 + k4) (7.21)

k1 = hf(xn, yn) (7.22)

k2 = hf

(xn +

1

2h, yn +

1

2k1

)(7.23)

k3 = hf

(xn +

1

2h, yn +

1

2k2

)(7.24)

k4 = hf(xn + h, yn + k3) (7.25)

– Show graphical interpretation of 4 slopes.

– 4 function evaluations (as compared to 2 for RK2)

If h for RK2 < 14

then RK4 with twice the h size is better for the same number of functionevaluations.


• Step Size

– keep decreasing by 1/2 until two consecutive approximations are about the same

– RK45 uses two different methods: order 4 and order 5. Benefit: many of the same calculations areused. If the order 4 approximation is much different than the order 5 approximation, reduce h.

– Matlab RK45 (ode45)

7.4. MULTI-STEP METHODS 103

7.4 Multi-Step Methods

dy

dx= f(x, y), y(x0)− y0

Idea: Use past values of y and/or y′ to construct a polynomial that interpolates the derivative functionf(x, y) and extrapolate this into the next interval.

• Adams Methods: ∫ xn+1

xn

dy = yn+1 − yn =

∫ xn+1

xn

f(x, y) dx

to get

yn+1 = yn +

∫ xn+1

xn

f(x, y) dx.

Then approximate the definite integral by interpolating {(xk, fk)}nk=n−j with a j’th degree polynomialand integrating this from xn to xn+1.

– Order h3

Approximate the definite integral by interpolating {(xk, fk)}nk=n−2 with a quadratic and integratingthis from xn to xn+1. Yields:

yn+1 = yn +h

12[23fn − 16fn−1 + 5fn−2] +O(h4)

– Order h4

Approximate the definite integral by interpolating {(xk, fk)}nk=n−3 with a cubic and integratingthis from xn to xn+1. Yields:

yn+1 = yn +h

24[55fn − 59fn−1 + 37fn−2 − 9fn−3] +O(h5)

– Notes

∗ h must be constant: xi’s are equally spaced.

∗ You must get started with a technique of the same order.

• Milne’s Method (you need 4 starting values for xn and fn) Order h4.∫ xn+1

xn−3

dy = yn+1 − yn−3 =

∫ xn+1

xn−3

f(x, y) dx ≈∫ xn+1

xn−3

P2(x) dx

where P2(x) interpolates {(xk, fk)}nk=n−2. This yields a predicted value for yn+1 called yn+1,p by

yn+1,p = yn−3 +4h

3[2fn − fn−1 + 2fn−2] +O(h5)

Now interpolate (xn−1, fn−1) , (xn, fn), and (xn+1, f(xn+1, yn+1,p) with a new P2 and get a correctedvalue for yn+1 called yn+1,c by Simpsons 1/3 method. This yields


yn+1,c = yn+1,p +h

3[fn+1,p + 4fn + fn−1] +O(h5)

Note: Instability Problems for some problems that cannot be overcome with step size changes

• Adams-Moulton Method (need initial four values) order h4.

Predictor from Adams order h4:

yn+1,p = yn +h

24[55fn − 59fn−1 + 37fn−2 − 9fn−3] +O(h5)

and we can get an approximation for fn+1 now interpolate a cubic over (xn−2, fn−2) , (xn−1, fn−1),(xn, fn), and (xn+1, f(xn+1, yn+1,p)) and get the corrected value:

yn+1,c = yn +h

24[9fn+1 + 19fn − 5fn−1 + fn−2] +O(h5)

Note: Does not suffer from instabilities of Milne’s method.

contents · 2012-11-26 · fundamental theorem of algebra: if pis a polynomial of degree n 1, then...

Documents