gantep - ep 208 books

104
EEE484 COMPUTATIONAL METHODS Course Notebook February 3, 2009 ................................................................ .........aaaaaaaaaaaaaaaa..............AAAAAAAAAAAAAAAA......... .......aaaaaaaaaaaaaaaaaaaa..........AAAAAAAAAAAAAAAAAAAA....... .....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA..... ...aaaaaaaaaabbbbbbbbaaaaaaaa......AAAAAAAABBBBBBBBAAAAAAAAAA... ...aaaaaabbbbbbbbbbbbbbbbaaaa......AAAABBBBBBBBBBBBBBBBAAAAAA... ...aaaabbbbbbbbbbccbbbbbbbbaaaa..AAAABBBBBBBBCCBBBBBBBBBBAAAA... ...aaaabbbbccccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCCCBBBBAAAA... .aaaabbbbccccddddddddddccbbbbaa..AABBBBCCDDDDDDDDDDCCCCBBBBAAAA. .aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA. .aaaabbccddddeeffffffffeeddccaa..AACCDDEEFFFFFFFFEEDDDDCCBBAAAA. .aabbbbccddeeffgghhhhggffeeccbb..BBCCEEFFGGHHHHGGFFEEDDCCBBBBAA. .aabbccddeeffggiijjkkiiggeeddbb..BBDDEEGGIIKKJJIIGGFFEEDDCCBBAA. .aabbccddeeffhhjjmmoolliiffddbb..BBDDFFIILLOOMMJJHHFFEEDDCCBBAA. .aabbccddeeffhhkkpp--oojjggddbb..BBDDGGJJOO++PPKKHHFFEEDDCCBBAA. .aabbccddeeffhhjjmmoolliiffddbb..BBDDFFIILLOOMMJJHHFFEEDDCCBBAA. .aabbccddeeffggiijjkkiiggeeddbb..BBDDEEGGIIKKJJIIGGFFEEDDCCBBAA. .aabbbbccddeeffgghhhhggffeeccbb..BBCCEEFFGGHHHHGGFFEEDDCCBBBBAA. .aaaabbccddddeeffffffffeeddccaa..AACCDDEEFFFFFFFFEEDDDDCCBBAAAA. .aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA. .aaaabbbbccccddddddddddccbbbbaa..AABBBBCCDDDDDDDDDDCCCCBBBBAAAA. ...aaaabbbbccccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCCCBBBBAAAA... ...aaaabbbbbbbbbbccbbbbbbbbaaaa..AAAABBBBBBBBCCBBBBBBBBBBAAAA... ...aaaaaabbbbbbbbbbbbbbbbaaaa......AAAABBBBBBBBBBBBBBBBAAAAAA... ...aaaaaaaaaabbbbbbbbaaaaaaaa......AAAAAAAABBBBBBBBAAAAAAAAAA... .....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA..... .......aaaaaaaaaaaaaaaaaaaa..........AAAAAAAAAAAAAAAAAAAA....... .........aaaaaaaaaaaaaaaa..............AAAAAAAAAAAAAAAA......... ................................................................ http://www1.gantep.edu.tr/andrew/eee484/ Dr Andrew Beddall [email protected] Department of Electric and Electronic Engineering, University of Gaziantep, Turkey.

Upload: beyazit-koelemen

Post on 27-Oct-2014

37 views

Category:

Documents


0 download

DESCRIPTION

Computational Method

TRANSCRIPT

Page 1: Gantep - EP 208 Books

E E E 4 8 4

COMPUTATIONAL METHODS

Course NotebookFebruary 3, 2009

................................................................

.........aaaaaaaaaaaaaaaa..............AAAAAAAAAAAAAAAA.........

.......aaaaaaaaaaaaaaaaaaaa..........AAAAAAAAAAAAAAAAAAAA.......

.....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA.....

...aaaaaaaaaabbbbbbbbaaaaaaaa......AAAAAAAABBBBBBBBAAAAAAAAAA...

...aaaaaabbbbbbbbbbbbbbbbaaaa......AAAABBBBBBBBBBBBBBBBAAAAAA...

...aaaabbbbbbbbbbccbbbbbbbbaaaa..AAAABBBBBBBBCCBBBBBBBBBBAAAA...

...aaaabbbbccccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCCCBBBBAAAA...

.aaaabbbbccccddddddddddccbbbbaa..AABBBBCCDDDDDDDDDDCCCCBBBBAAAA.

.aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA.

.aaaabbccddddeeffffffffeeddccaa..AACCDDEEFFFFFFFFEEDDDDCCBBAAAA.

.aabbbbccddeeffgghhhhggffeeccbb..BBCCEEFFGGHHHHGGFFEEDDCCBBBBAA.

.aabbccddeeffggiijjkkiiggeeddbb..BBDDEEGGIIKKJJIIGGFFEEDDCCBBAA.

.aabbccddeeffhhjjmmoolliiffddbb..BBDDFFIILLOOMMJJHHFFEEDDCCBBAA.

.aabbccddeeffhhkkpp--oojjggddbb..BBDDGGJJOO++PPKKHHFFEEDDCCBBAA.

.aabbccddeeffhhjjmmoolliiffddbb..BBDDFFIILLOOMMJJHHFFEEDDCCBBAA.

.aabbccddeeffggiijjkkiiggeeddbb..BBDDEEGGIIKKJJIIGGFFEEDDCCBBAA.

.aabbbbccddeeffgghhhhggffeeccbb..BBCCEEFFGGHHHHGGFFEEDDCCBBBBAA.

.aaaabbccddddeeffffffffeeddccaa..AACCDDEEFFFFFFFFEEDDDDCCBBAAAA.

.aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA.

.aaaabbbbccccddddddddddccbbbbaa..AABBBBCCDDDDDDDDDDCCCCBBBBAAAA.

...aaaabbbbccccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCCCBBBBAAAA...

...aaaabbbbbbbbbbccbbbbbbbbaaaa..AAAABBBBBBBBCCBBBBBBBBBBAAAA...

...aaaaaabbbbbbbbbbbbbbbbaaaa......AAAABBBBBBBBBBBBBBBBAAAAAA...

...aaaaaaaaaabbbbbbbbaaaaaaaa......AAAAAAAABBBBBBBBAAAAAAAAAA...

.....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA.....

.......aaaaaaaaaaaaaaaaaaaa..........AAAAAAAAAAAAAAAAAAAA.......

.........aaaaaaaaaaaaaaaa..............AAAAAAAAAAAAAAAA.........

................................................................

http://www1.gantep.edu.tr/∼andrew/eee484/

Dr Andrew [email protected]

Department of Electric and Electronic Engineering,University of Gaziantep, Turkey.

Page 2: Gantep - EP 208 Books

Preamble

This notebook presents notes, exercises and example exam questions for the course EEE484.Fortran and C++ solutions can be found in the downloads section of the course web-site.The content of this document is automatically built from the course web-site( this build is dated Tue Feb 3 11:45:18 EET 2009 ).You can download the latest version, in postscript or pdf format, from the course web-site.Only 8 topics are present in this version (more coming soon):

• Lecture 1 - Numerical Truncation, Precision and Overflow

• Lecture 2 - Numerical Differentiation

• Lecture 3 - Roots, Maxima, Minima (closed methods)

• Lecture 4 - Roots, Maxima, Minima (open methods)

• Lecture 5 - Numerical Integration: Trapezoidal and Simpson’s formulae

• Lecture 6 - Solution of D.E.s: Runge-Kutta, and Finite-Difference

• Lecture 7 - Random Variables and Frequency Experiments

• Lecture 8 - Monte-Carlo Methods

Title page figure: Numerical solution for the potential around a dipole.

Page 3: Gantep - EP 208 Books

Contents

1 Numerical Truncation, Precision and Overflow 11.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Lecture Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Lab Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Lab Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Example exam questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Numerical Differentiation 102.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Lecture Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Lab Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4 Lab Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5 Example exam questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Roots, Maxima, Minima (closed methods) 213.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Lecture Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3 Lab Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 Lab Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.5 Example exam questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 Roots, Maxima, Minima (open methods) 314.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Lecture Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3 Lab Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.4 Lab Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.5 Example exam questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5 Numerical Integration: Trapezoidal and Simpson’s formulae 435.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.2 Lecture Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.3 Lab Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.4 Lab Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.5 Example exam questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6 Solution of D.E.s: Runge-Kutta, and Finite-Difference 556.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.2 Lecture Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.3 Lab Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.4 Lab Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.5 Example exam questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7 Random Variables and Frequency Experiments 707.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.2 Lecture Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.3 Lab Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827.4 Lab Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837.5 Example exam questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

i

Page 4: Gantep - EP 208 Books

8 Monte-Carlo Methods 868.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868.2 Lecture Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868.3 Lab Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938.4 Lab Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 948.5 Example exam questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

A Linux Tutorial 97

ii

Page 5: Gantep - EP 208 Books

1 Numerical Truncation, Precision and Overflow

1.1 Topics Covered

o Introduction to numerical methods; Taylor’s expansion and truncation errors; round-off errors, overflow;precision of data types in Fortran and C++.

1.2 Lecture Notes

IntroductionIt is important to understand that, in general, numerical methods are not exact; neither are the machines(computers) that perform the numerical calculations for us. In this lecture, we will look at the nature oftruncation errors and round-off errors. An understanding of these sources of errors in numerical methods isas important as an understanding of the methods themselves.

Numerical MethodsWe apply numerical techniques to solve numerical problems when analytical solutions are difficult or incon-venient. A simple example is the computation of the first derivative of a function f(x). Calculus gives usan analytical method for forming an expression for the derivative, however, such analysis for some func-tions may be difficult, impossible, or inconvenient. A simple numerical solution uses the Forward-DifferenceApproximation (FDA) that approximates the derivative by taking the gradient of the function f(x) in theregion x to x+h:

FDA = ( f(x+h) - f(x) ) / h

where h is small but not zero.For example if f(x) = 2x2 + 4x + 6 and we wish to determine the first derivative evaluated at x=3, theFDA (using h=0.01) gives:( (2x3.012 + 4x3.01 + 6) - (2x3.002 + 4x3.00 + 6 ) ) / 0.01 = 16.02.Of course this is only an approximation (the true value, by calculus, is 16).

gnuplot$>$ plot [0:4] 2*x**2+4*x+6

Truncation ErrorsThe error in the above approximation can be written as FDA - f’(x) = 16.02 - 16 = 0.02. This is called atruncation error as it is due to the truncation of higher orders in the exact expression for the first derivative.We can see the form of the truncation error in the FDA by considering Taylors expansion:

f(x+h) = f(x) + h.f‘(x)/1! + h^2.f‘‘(x)/2! + h^3.f‘‘‘(x)/3! + ....

Rearrange for the FDA:

( f(x+h) - f(x) ) / h = f‘(x) + h.f‘‘(x)/2 + h^2.f‘‘‘(x)/6 + ....

=> FDA = f‘(x) + (h/2).f‘‘(x) + O(h^2)

----- ---------------------

/ \

the derivative the truncation error in the FDA

1

Page 6: Gantep - EP 208 Books

We see that the FDA gives the first derivative plus some extra terms in the series. The error in the ap-proximation FDA - f’(x) is therefore (h/2).f“(x) + O(h2). This can be checked numerically with the aboveexample: (h/2).f“(x) = (0.01/2) x (4) = 0.02 (as found above). The truncation error in the FDA is propor-tional to h, the FDA is therefore called a first-order approximation. Higher-order methods have truncationerrors that are proportional to higher powers of h and therefore yield smaller truncation errors (when h isless than one).We will investigate the round-off error in the above calculation at the end of the next section.

Computer Precision (Round-off Errors)Numerical methods are implemented in computer programs where the numerical calculations can be per-formed quickly and conveniently. However, numbers are stored in computer memory with a limited precision;the loss of precision of a value is called a round-off error. Round-off errors can occur when a value is initiallyassigned, and can be compounded when values are combined in arithmetic operations. Iteration is commonin computational methods and so it is important to minimise compound roundoff.As round-off errors can be a significant source of error in a numerical method (in addition to the truncationerror) we will look more closely at the nature of the round-off error and how it can be reduced.A binary representation is used to store numbers in computer memory. For example the binary number11.011 represents exactly the decimal number 3.375:

1 1 . 0 1 1

1x2 + 1x1 + 0x1/2 + 1*1/4 + 1*1/8 = 3.375

Similarly the decimal value 0.3125 can be expanded to 0.25 + 0.0625 = 1/4 + 1/16 that can be stored exactlyin binary as 0.0101. However, given a limited number of binary digits, it is possible that even a rationaldecimal number might not be stored precisely in binary. For example there is no precise representation for0.3; the nearest representation with 8 bits is 0.01001101 that gives 0.30078125. The precision increases asmore binary digits are used, but there is always a round-off error. In general, the only real numbers thatcan be represented exactly in the computer’s memory are those that can be written in the form m/2k wherem and k are integers; however, again there is a limit to the set of numbers that are included in this groupdue to the limited number of binary digits used to store the value.

Floating-Point RepresentationComputers store REAL numbers (as opposed to INTEGER numbers) in the floating-point representationvalue = m be where m is the mantissa, b is the base (= 2 in computers) and e is the exponent. In Fortran,a type ”real” number is stored in 32 binary bits (4 bytes) [this is equivalent to a ”float” in C/C++]. Toallow for a large exponent range the binary bits available for storage are shared between the mantissa andthe exponent of the number. For example the number 413.26 is represented by a mantissa part and anexponent part as 0.41326x103 . The division of the 32 binary bits are as follows: 8 bits are used for to storethe exponent, 1 bit for the sign of the exponent, and 23 bits for the mantissa. The precision of the storageof real data is therefore limited by the 23 bits used to store the mantissa.In Fortran the number of binary bits used to store type real numbers can be increased from the default 32to 64 or 128 by declaring the type ”real” data with the kind specifier. The default single-precision data haskind=4 where each data is stored in 4 bytes (32 binary bits) of memory. Double-precision data (kind=8) isallocated 8 bytes (64 binary bits) [this is equivalent to a ”double” in C/C++] and quad-precision (kind=16)16 bytes (128 binary bits). Double precision has about twice the precision as single precision and a muchlarger range in the exponent, quad precision has more than four times the precision and a very large rangein the exponent. The three real kinds are illustrated in the table below.

2

Page 7: Gantep - EP 208 Books

+------------------+---------------------+------------------+---------+---------

| Type and Kind | Memory allocation | Precision | Range | C/C++

+------------------+---------------------+------------------+---------+---------

| real (kind=4 )* | 4 bytes ( 32 bits) | 7 s.f. (Single) | 10^38 | "float"

| real (kind=8 ) | 8 bytes ( 64 bits) | 15 s.f. (Double) | 10^308 | "double"

| real (kind=16) | 16 bytes (128 bits) | 34 s.f. (Quad) | 10^4931 | "long double"+

+------------------+---------------------+------------------+---------+---------

* default kind in Fortran.

s.f. = "significant figures".

+ only on 64 bit platforms.

A limitation is also placed on the range of values that be be stored, this is illustrated for single-precisiontype real data below:

underflow

overflow<------------------------->--<------------------------->overflow

-10^38 -10^-45 +10^-45 +10^38

If a number exceeds the permitted range, for example -1038 to +1038, then it cannot be stored; such asituation results in the program continuing with wrong values or terminating with an overflow error. Thereis also a limit to the representation of very small real numbers, the range for single precision real data isabout -10−45 to about +10−45; attempting to store a value smaller than this results in an underflow error.Similarly, integer type data can be stored in 1, 2, 4 or 8 bytes, each giving a larger range of values thatcan be represented by them. The default integer kind in Fortran is 4 bytes (kind=4) [”int” in C/C++].As an integer number is exact there is no corresponding precision, the only limitation is then that of range(integer overflow). The four kind types for integers, and the corresponding ranges, are summarised in thetable below.

+------------------+-------------------+----------------------------+-------

| Type and Kind | Memory allocation | Range | C/C++ (signed)

+------------------+-------------------+----------------------------+-------

| integer (kind=1) | 1 byte ( 8 bits) | -128 to 127 | "char"

| integer (kind=2) | 2 bytes (16 bits) | -32768 to 32767 | "short"

| integer (kind=4)*| 4 bytes (32 bits) | -2147483648 to 2147483647 | "int"

| integer (kind=8) | 8 bytes (64 bits) | about +- 9x10^18 | "long"+

+------------------+-------------------+----------------------------+

* default kind in Fortran.

+ only on 64 bit platforms.

kind specification in Fortran [you can investigate the C/C++ equivalent in your own time]Examples of the declaration of data of different kinds:

real :: A ! Default (single precision)

real (kind=4) :: B ! Single precision

real (kind=8) :: C ! Double precision

real (kind=16) :: D ! Quad precision

Examples of assignments:

A=1.2345678_4 or simply 1.2345678 (Single precision)

C=1.234567890123456_8 (Double precision)

D=1.2345678901234567890123456789012345_16 (Quad precision)

3

Page 8: Gantep - EP 208 Books

Note that the underscore symbol is used to define the precision of the constant; if this is not used then someprecision might be lost, or unpredictable values assigned to some of the least significant digits. For example:

real(kind=8) :: C = 1.11111111111111_8 assigns C with 1.11111111111111

whereas

real(kind=8) :: C = 1.11111111111111 assigns C with 1.11111116409302

and

real(kind=8) :: C = 1.111111 assigns C with 1.11111104488373

The last 7 or 8 digits have been assigned garbage from residual values in memory.The E symbol can be used for exponentiation:

A = 1.234568E38 or A = 1.234568E38_4

C = 1.23456789012346E308_8

D = 1.234567890123456789012345678901235E4931_16

We will see later in the course how double- and quad-precision can greatly reduce round-off errors in nu-merical methods.Remember: although double- and quad-precision can reduce round-off errors they have no effect on thesize of truncation errors; truncation errors are inherent to the numerical method and not to the internalrepresentation of numbers in a computer.

Examples1. The expression ( (a+b)2 - 2.a.b - b2 ) / a2 reduces to a2 / a2 = 1. But computed in a machine withlimited precision can give unexpected results:

[Fortran] [C++]

real(kind=8) :: a=0.00001_8, b=88888.0_8, c #include $<$iostream$>$

c = ( (a+b)**2 - 2*a*b - b**2 ) / a**2 main () {

print *, c double a=0.00001, b=88888, c;

end c = ( (a+b)*(a+b) - 2*a*b - b*b ) / (a*a);

std::cout << c << std::endl;

}

The result is 4.65661 in both cases! This is an extreme example of a calculation that is sensitive to round-off.Note that quad precision gives the correct result 1.0000000000000172. test for precisionThe following programs (in Fortran and C++) implement the Forward-Difference Approximation algorithmusing single-precision.

[Fortran] [C++]

real :: h = 0.1 #include $<$iostream$>$

print *, "FDA = ", (f(3.0+h)-f(3.0))/h float f(float x) { return 2*x*x + 4*x + 6; }

contains int main() {

real function f(x) float h = 0.1;

real :: x std::cout << "FDA = "

f = 2*x**2 + 4*x + 6 << (f(3.0+h)-f(3.0))/h

end function f << std::endl;

end }

running the above programs for decreasing values of h reveals a decreasing truncation error (t.e.) butincreasing round-off error (r.e.). Remember that the correct result should be 16.

4

Page 9: Gantep - EP 208 Books

h=0.1 FDA = 16.199990 t.e = 0.2, r.e. = -0.000011444092

h=0.01 FDA = 16.019821 t.e = 0.02, r.e. = -0.00017929077

h=0.001 FDA = 16.002655 t.e = 0.002, r.e. = 0.0006542206

h=0.0001 FDA = 15.983582 t.e = 0.0002, r.e. = -0.016618729

h=0.00001 FDA = 16.021729 t.e = 0.00002, r.e. = 0.021709442

h=0.000001 FDA = 15.258789 t.e = 0.000002, r.e. = -0.74121284

The optimal value occurs when h=0.001 where both truncation and round-off errors are relatively small.

3. tests for overflow and underflow:

! test integer overflow | Result:

integer :: i, j=1e9 | 1 1000000000

do i = 1, 5 | 2 2000000000

print *, i, j | 3 -294967296

j = j * 2 | 4 -589934592

end do | 5 -1179869184

end |

! test real overflow | Result: ! test real underflow | Result:

integer :: i | 1 1.E+37 integer :: i | 1 1.E-42

real :: r=1.0E37 | 2 1.E+38 real :: r=1.0E-42 | 2 1.E-43

do i = 1, 5 | 3 +Inf do i = 1, 6 | 3 1.E-44

print *, i, r | 4 +Inf print *, i, r | 4 1.E-45

r = r * 10. | 5 +Inf r = r / 10. | 5 0.

end do | end do | 6 0.

end | end |

Some compilers provide options that give different behavior with respect to overflow and underflow. Forexample in the g95 compiler (www.g95.org) the following environment variables can be set:

G95_FPU_OVERFLOW=1 and G95_FPU_UNDERFLOW=1

In this case the above two programs abort with a ”Floating point exception” message instead of continuingwith bogus values.

5

Page 10: Gantep - EP 208 Books

1.3 Lab Exercises

Task 1 - Truncation ErrorsThe first derivative (gradient) of a function can be approximated by the Forward Difference Approximation:

FDA = ( F(x+h) - F(x) ) / h

where h is small but not zero.In theory, the truncation error in this approximation is given by:

Error = h/2 F‘‘(x) + O(h^2)

Write a program that computes, using the FDA, at x=4.7, with h=0.01, the first derivative of the function:

F(x) = 3.4 + 18.7x - 1.6x^2

Hint: use double precision to avoid significant round-off errors confusing the results: for example

real(kind=8) :: h, and h=0.01_8

QuestionsCompare your result with the exact result determined by calculus. Compare the error in the result with thepredicted truncation error. Do your comparisons make sense?

Task 2 - Precision (Round-off Errors)1. What is the result of your FDA program with all the variables in single precision?2. Write a Fortran program that declares variables A, B, C and D as type double precision real, anddetermine the result of the assignments:

A = 1.11111111111111_8

B = 1.11111111111111

C = 1.111111

D = 1.111111_8

Explain your findings.3. What do you expect to be the output of the following program. Run the program to see if you are right,explain your findings. Hint: Press [Ctrl][C] to break out of a program that does not terminate.

real :: a=0.0

do

a = a + 0.1

print *, a

if ( a == 1.0 ) exit

end do

end

Task 3 - Series expansionWrite a program that computes ex by the series expansion: ex = 1 + x + x2/2! + x3/3! + x4/4! + ...+ xi/i! + ... Terminate the expansion when a term is less than 0.000001. Check you results against thelibrary function exp(x) or use your pocket calculator.Hint: Factorials can be problematic due to integer overflow; you can avoid factorials by observing that the(i+1)th term in the series is equal to the he (i)th term times x/i.

6

Page 11: Gantep - EP 208 Books

1.4 Lab Solutions

Task 1 - Truncation ErrorsThe first derivative (gradient) of a function can be approximated by the Forward Difference Approximation:

FDA = ( F(x+h) - F(x) ) / h

where h is small but not zero.In theory, the truncation error in this approximation is given by:

Error = h/2 F‘‘(x) + O(h^2)

Write a program that computes, using the FDA, at x=4.7, with h=0.01, the first derivative of the function:

F(x) = 3.4 + 18.7x - 1.6x^2

Hint: use double precision to avoid significant round-off errors confusing the results: for example

real(kind=8) :: h, and h=0.01_8

QuestionsCompare your result with the exact result determined by calculus. Compare the error in the result with thepredicted truncation error. Do your comparisons make sense?

Solution

Program: eee484ex1a (see the downloads page)

The output is:

fda = 3.644000

true = 3.660000

error_fda = -0.016000

true error = -0.016000

Note that double precision variables are used, real(kind=8), otherwise the results will include significantround-off errors making the analysis less clear.From the output, we can see that the fda value is similar to the true value, but not exactly the same as itis an estimate only. According to theory the truncation error in this estimate is

(h/2).F‘‘(x) = h/2.0*(-3.2) = -0.016

this is the same as the true error = fda - true = 3.644 - 3.660 = -0.016.The result is: the expression for the truncation error is correct.

Task 2 - Precision (Round-off Errors)1. What is the result of your FDA program with all the variables in single precision?

SolutionSimply replace kind=8 with kind=4, and underscore 8 with underscore 4 [or double with float] and rerunthe program, the result is

fda = 3.64423

true = 3.66000

error_fda = -0.01600

true error = -0.01577

7

Page 12: Gantep - EP 208 Books

The result for the fda is different in the fourth decimal place - as well as truncation error there is now anadditional round-off error.2. Write a program that declares variables A, B, C and D as type double precision real, and determine theresult of the assignments:

A = 1.11111111111111_8

B = 1.11111111111111

C = 1.111111

D = 1.111111_8

Explain your findings.

Solution

A = 1.11111111111111_8

correctly assigns the value 1.11111111111111 to A.B = 1.11111111111111 assigns 1.11111116409302 to B because the assignment is equivalent to

1.11111111111111_4 = 1.1111111

and so the last 7 digits contain garbage.C = 1.111111 assigns 1.11111104488373 for the same reason as in B.

D = 1.111111_8

assigns correctly 1.11111100000000 to D3. What do you expect to be the output of the following program. Run the program to see if you are right,explain your findings. Hint: Press [Ctrl][C] to break out of a program that does not terminate.

real :: a=0.0

do

a = a + 0.1

print *, a

if ( a == 1.0 ) exit

end do

end

SolutionYou might expect the program to output the numbers 0.1, 0.2, ...., 0.9, 1.0 and then terminate. But youmight actually find that, due to round of errors, A does not take exactly the value 1.0 and therefore theprogram fails the test (A==1.0) and continues to count without end (press [Ctrl][C] to stop the program).A fix for this would be to replace the equality ”==” with ”>=” (greater than or equal to); the loop mightthen end with A=1.0 or A=1.1.

Task 3 - Series expansionWrite a program that computes ex by the series expansion: ex = 1 + x + x2/2! + x3/3! + x4/4! + ...+ xi/i! + ... Terminate the expansion when a term is less than 0.000001. Check you results against thelibrary function exp(x) or use your pocket calculator.Hint: Factorials can be problematic due to integer overflow; you can avoid factorials by observing that the(i+1)th term in the series is equal to the he (i)th term times x/i.

Solution

Program: eee484ex1b (see the downloads page)

8

Page 13: Gantep - EP 208 Books

1.5 Example exam questions

Question

a) Explain the term ’truncation error’, and give an example.

How can truncation errors be reduced?

b) Explain the term ’round-off error’.

How can round-off errors be reduced?

c) Explain the terms ’overflow’ and ’underflow’.

How can overflow and underflow be avoided.

9

Page 14: Gantep - EP 208 Books

2 Numerical Differentiation

2.1 Topics Covered

o Numerical Differentiation:- Forward Difference Approximation (first derivative): FDA- Central Difference Approximation (first derivative): CDA- Richardson Extrapolation (first derivative): REA- Central Difference Approximation (second derivative): CDA2- Richardson Extrapolation (second derivative): REA2The student should be able to derive (or prove) the FDA, CDA, REA, CDA2 and REA2 (the formulae aregiven) from Taylor’s Expansion, and show the form of the error in each approximation. The student shouldbe able to use the formulae by hand, and implement them in a computer program. A basic understandingof the meaning of ”Truncation Error” and ”Round-off Error” is expected.

2.2 Lecture Notes

IntroductionWhen an analytical solution to the derivative of a given function is difficult or inconvenient then a numer-ical method can be used to provide an approximate solution. It is important however to understand thetruncation and roundoff errors involved in these numerical methods. We will look first at the most basicmethod for the first derivative of a function, the FDA, and then move onto higher order methods, the CDAand REA. We will also look at approximations for the second derivative: the CDA2 and REA2.

The Forward-Difference Approximation (FDA)The FDA method for the numerical differentiation of a function can be derived by considering differentiationfrom first principles, or alternatively by considering Taylor’s Expansion.1. Differentiation from first principles:

| f(x+dx) - f(x)

f‘(x) = limit | --------------

dx -> 0 | dx

As a computer cannot divide by zero the computed (finite) version of this expression is an approximationwhere dx is small but not zero, I denote this value by h. Now f‘(x) is approximated by: (f(x+h)-f(x))/hThis is the Forward-Difference Approximation for the numerical derivative of a function, it has the mostbasic form for a numerical derivative and is the least accurate :

+--------------------------------+

| FDA = ( f(x+h) - f(x) ) / h |

+--------------------------------+ h is small but not zero

Example:Compute the first derivative of f(x) = 3x3 + 2x2 + x at x=3 and x=10 using the FDA with h = 0.01

FDA(3) = ( f(3.01) - f(3) ) / 0.01 = 94.2903

by calculus f‘(3) = 94.0000 => error is 0.2903

(0.3%)

FDA(10) = ( f(10.01) - f(10) ) / 0.01 = 941.9203

by calculus f‘(10) = 941.0000 => error is 0.9203

(0.1%)

10

Page 15: Gantep - EP 208 Books

2. Taylor’s Expansion:The FDA can also be obtained by rearranging the Taylor Expansion:

f(x+h) = f(x) + h f‘(x)/1! + h^2 f‘‘(x)/2! + h^3 f‘‘‘(x)/3! + ....

Rearrange for the FDA:

( f(x+h) - f(x) ) / h = f‘(x) + h f‘‘(x)/2 + h^2 f‘‘‘(x)/6 + ....

the left hand side is the FDA

=> FDA = f‘(x) + (h/2) f‘‘(x) + O(h^2)

----- ---------------------

/ \

the derivative the truncation error in the FDA

Consider the example of the numerical first derivative of f(x) = 3x3 + 2x2 + x , at x=3 and x=10 , h =0.01 We obtained the results :

FDA(3) = 94.2903 and error = 0.2903

FDA(10) = 941.9203 and error = 0.9203

We can check that the error is (h/2).f“(x):f“(x) = 18x + 4 , the error(x) = h(9x+2), error(3) = 0.2900 as above and error(10) = 0.9200 as above. Thesmall difference between the results is due to the omission of the O(h2) term in the expression for the error.Summary:o The first derivative of a function f(x) is approximated by: FDA = ( f(x+h) - f(x) ) / h where h is smallbut not zero.o The error is approximately (h/2).f“(x) , i.e. proportional to h - to minimise the error choose a small valueof ho The error in the FDA is called a truncation error as it is due to the truncation of the higher-order termsin the Taylor expansion.Note that h should not be too small as round-off errors in the machine arithmetic increase as h decreases(always use double precision variables! the ”kind=8” specifier in Fortran, and the ”double” declaration inC/C++).

The Central-Difference Approximation (CDA)The CDA gives an improved (higher-order) method:

+--------------------------------------+

| CDA = ( f(x+h) - f(x-h) ) / (2h) |

+--------------------------------------+ h is small but not zero

It can be shown (see the lecture) from Taylor’s Expansion that

CDA = f‘(x) + (h^2/6) f‘‘‘(x) + O(h^4)

----- ------------------------

/ \

the derivative the truncation error in the CDA

The CDA is a higher-order method than the FDA as it gives an error which is proportional to h2 (the erroris therefore much smaller). Also, the error is proportional to the third derivative, F“‘(x) which may further

11

Page 16: Gantep - EP 208 Books

reduce the error with respect to the FDA error which has an f“(x) dependence.

Richardson Extrapolation Approximation (REA)A higher-order method is given by the Richardson Extrapolation Approximation:

+-------------------------------------------------+

| REA = (f(x-2h)-8f(x-h)+8f(x+h)-f(x+2h))/(12h) |

+-------------------------------------------------+ h is small but not zero

It can be shown from Taylor’s expansion that

REA = f‘(x) + - (h^4/30) f‘‘‘‘‘(x) + O(h^6)

----- ------------------------------

/ \

the 1st derivative the truncation error in the REA

The truncation error is proportional to the fifth derivative and h4.The results below compare the performance of the above three methods.

f(x) = 3x^3 + 2x^2 + x , first derivative at x=3 and x=10 , h = 0.01

+------------------------------------+--------------------------------------+

| FDA(3)=94.290300 (error=0.290300) | FDA(10)=941.920300 (error=0.920300) |

| CDA(3)=94.000300 (error=0.000300) | CDA(10)=941.000300 (error=0.000300) |

| REA(3)=94.000000 (error=0.000000) | CDA(10)=941.000000 (error=0.000000) |

+------------------------------------+--------------------------------------+

The results illustrate that the CDA can give reasonably accurate results and so is worth implementing as asimple method. The REA in this case is exact as the truncation error is proportional to the fifth derivativewhich is zero.

ImplementationImplementation of the above methods is simple. The program requires a definition of f(x) and the twoinputs h and x.

Algorithm 2

! Program to compute the first derivative of a function f(x)

! The FDA, CDA and REA methods are implemented for comparison.

input "input the value of x ", x

input "input the value of h ", h

fda = (f(x+h)-f(x))/h

cda = (f(x+h)-f(x-h))/(2*h)

rea = (f(x-2*h)-8*f(x-h)+8*f(x+h)-f(x+2*h))/(12*h)

output "FDA = ", fda

output "CDA = ", cda

output "REA = ", rea

function definition f(x) = 3x^3 + 2x^2 + x

Note: you should use double precision variables to avoid large round-off errors.

12

Page 17: Gantep - EP 208 Books

The Central-Difference Approximation for a Second Derivative (CDA2)The second derivative of a function f(x) can be approximated by:

+----------------------------------------------+

| CDA2 = ( f(x-h) - 2f(x) + f(x+h) ) / h^2 |

+----------------------------------------------+ h is small but not zero

It can be shown from Taylor’s Expansion that

CDA2 = f‘‘(x) + (h^2/12) f‘‘‘‘(x) + O(h^4)

------ --------------------------

/ \

the 2nd derivative the truncation error in the CDA2

The Richardson Extrapolation Approximation for a Second Derivative (REA2)The second derivative of a function f(x) can be approximated by:

+----------------------------------------------------------------+

| REA2 = (-f(x-2h)+16f(x-h)-30f(x)+16f(x+h)-f(x+2h)) / (12h^2) |

+----------------------------------------------------------------+

h is small but not zero

It can be shown from Taylor’s Expansion that

REA2 = f‘‘(x) + - (h^4/90) f‘‘‘‘‘‘(x) + O(h^6)

------ ------------------------------

/ \

the 2nd derivative the truncation error in the REA2

Summary of Methods

FDA = (f(x+h)-f(x))/h = f‘(x) + (h/2) f‘‘(x) + ....

CDA = (f(x+h)-f(x-h))/(2h) = f‘(x) + (h^2/6) f‘‘‘(x) + ....

REA = (f(x-2h)-8f(x-h)+8f(x+h)-f(x+2h))/(12h) = f‘(x) - (h^4/30) f‘‘‘‘‘(x) + ....

CDA2 = (f(x-h)-2f(x)+f(x+h))/h^2 = f‘‘(x) + (h^2/12) f‘‘‘‘(x) + ....

REA2 = (-f(x-2h)+16f(x-h)-30f(x)+16f(x+h)-f(x+2h))/(12h^2) = f‘‘(x) - (h^4/90) f‘‘‘‘‘‘(x) + ....

Errors - the truncation error and the round-off errorThe approximation methods FDA, CDA, and REA can be used to demonstrate the effect of truncation er-rors and round-off errors. The error, for example (h2/6).f“‘(x) inherent to the CDA method, is an exampleof a truncation error, i.e. by truncating higher order terms in the Taylor expansion the method becomesonly approximate. Another source of error exists when the FDA, CDA or REA are computed; this is theround-off error due to limited precision in numerical arithmetic (numerical values are stored in the computerwith a limited number of binary bits). Round-off errors are compounded in arithmetic operations. The totalerror is therefore a combination of the two error sources:Total Error = Truncation Error + Round-off Error

13

Page 18: Gantep - EP 208 Books

The important parameter here is the value of h; the truncation error increases with increasing h, the round-off error decreases with increasing hGiven a particular method, for example the CDA, the most accurate computed derivative is obtained byminimising the total error, this corresponds to finding the optimal value of h.This optimal value will differ depending on1. The numerical method (FDA, CDA, REA, etc).2. The function being differentiated, and the value of x.3. The precision of the arithmetic (single-, double-, quad-precision).To arrive at the optimal value some study of the output of your program is needed.The total error in the CDA is given by:

|------------------------|

| Error = CDA(x) - f‘(x) | where f‘(x) is the

|------------------------| unknown first derivative.

log(|Error|)

A plot of |Error| against h will have |

the form indicated qualitatively in | | /

the figure. The rise on the right | \ /

(as h increases) is due to the | \ /

truncation error which has the form | \ _ /

of h^2, the rise on the left (as h |____________________

decreases) is due to round-off errors. -10 -8 -6 -4 -2 log(h)

A minimum error exists at some intermediate value of h corresponding to a minimum in the plot. If f‘(x)is unknown we can only plot CDA versus h, but, as f’(x) is a constant the plot will have the same shape(only shift up or down). In this case again a minimum (or stationary) value in the plot will be observedcorresponding to a minimum error.The situation is less clear when the round-off error has the opposite sign to the truncation error, the CDAmay vary erratically about the true value, though again a relatively stable stationary value in a plot of CDAversus h corresponds to a solution close to a minimum error.This procedure of outputting the CDA with different values of h and then interpreting the results is anexample of step 8 in the section Errors and Problem Solving.The lab exercise will require you to perform such an analysis.

14

Page 19: Gantep - EP 208 Books

2.3 Lab Exercises

We will estimate the derivative of the function f(x) = -1/x at x=3 and check the result against the exactresult f‘(x) = 1/x2 which gives f‘(3) = 1/32 = 0.111 111 111 111...i.e, Error = Estimate - 0.111 111 111 111.

Task 1Compare the accuracy of FDA(3), CDA(3) and REA(3).For this use h = 0.01, and double precision data (”kind=8” or ”double”).

Task 2For CDA(3) investigate the effect of varying h, try h = 10−1, 10−2, 10−3, ...., 10−12. Use double precisiondata (”kind=8” or ”double”). Which value of h gives the most accurate estimate?

Task 3Repeat task 2 replacing CDA with REA. Which value of h gives the most accurate estimate?

Task 4Repeat task 2 with single-, double-, and quad-precision. Comment on the results.

Additional TasksInvestigate the CDA2 and REA2 expressions for finding the second derivative of a function.

15

Page 20: Gantep - EP 208 Books

2.4 Lab Solutions

We will estimate the derivative of the function f(x) = -1/x at x=3 and check the result against the exactresult f‘(x) = 1/x2 which gives f‘(3) = 1/32 = 0.111 111 111 111...i.e, Error = Estimate - 0.111 111 111 111.

Task 1Compare the accuracy of FDA(3), CDA(3) and REA(3).For this use h = 0.01, and double precision data (”kind=8” or ”double”).

SolutionProgram: eee484ex2a (see the downloads page).

FDA = 0.110741971207 Err = -0.000369139904

CDA = 0.111112345693 Err = 0.000001234582

REA = 0.111111111056 Err = -0.000000000055

Tru = 0.111111111111

For the same value of h and x, the accuracy increases as the order of the method increases. Remember thattruncation error for the FDA, CDA, and REA are proportional to h, h2, and h4 respectively.

Task 2For CDA(3) investigate the effect of varying h, try h = 10−1, 10−2, 10−3, ...., 10−12. Use double precisiondata (”kind=8” or ”double”). Which value of h gives the most accurate estimate?

SolutionProgram: eee484ex2b (see the downloads page).

h CDA(3) Error=CDA(3)-Tru

0.100000000000 0.111234705228 0.000123594117

0.010000000000 0.111112345693 0.000001234582

0.001000000000 0.111111123457 0.000000012346

0.000100000000 0.111111111235 0.000000000124

0.000010000000 0.111111111113 0.000000000002

0.000001000000 0.111111111123 0.000000000012

0.000000100000 0.111111110900 -0.000000000211

0.000000010000 0.111111110929 -0.000000000182

0.000000001000 0.111111129574 0.000000018463

0.000000000100 0.111111212868 0.000000101757

0.000000000010 0.111112045942 0.000000934831

0.000000000001 0.111108659166 -0.000002451946

As h decreases in size the truncation error is seen to decrease in the expected form; i.e. the error is propor-tional to h2 so each decrease in h by a factor of 10 decreases the error by a factor of 102. The minimumerror is obtained with h=0.00001 after which round-off error dominates. The round off error increases as hdecreases in size. Remember that the total error is the sum of the truncation error and the round off error.Further studies reveal that the optimal value of h depends on the function and the value of x.

Task 3Repeat task 2 replacing CDA with REA. Which value of h gives the most accurate estimate?

16

Page 21: Gantep - EP 208 Books

SolutionProgram: eee484ex2c (see the downloads page).

h REA(3) Error=REA(3)-Tru

0.100000000000 0.111110559352 -0.000000551759

0.010000000000 0.111111111056 -0.000000000055

0.001000000000 0.111111111111 -0.000000000000

0.000100000000 0.111111111112 0.000000000000

0.000010000000 0.111111111113 0.000000000002

0.000001000000 0.111111111116 0.000000000005

0.000000100000 0.111111110725 -0.000000000386

0.000000010000 0.111111113438 0.000000002327

0.000000001000 0.111111107981 -0.000000003130

0.000000000100 0.111110996954 -0.000000114157

0.000000000010 0.111109886799 -0.000001224312

0.000000000001 0.111104541456 -0.000006569655

Replacing CDA with REA results in much small truncation errors. The round off errors, however, are similarto those generated in the CDA method. Consequently, the optimal value of h occurs earlier at about h =0.001.

Task 4Repeat task 2 with single, double, and quad precision. Comment on the results.

SolutionProgram: eee484ex2d (see the downloads page).

CDA(3) (float) (double) (long double)

h Error (kind=4) Error (kind=8) Error (kind=16)

0.100000000000 0.000123433769 0.000123594117 0.000123594116 9200346063527

0.010000000000 0.000000916421 0.000001234582 0.000001234581 6188081102136

0.001000000000 -0.000009402633 0.000000012346 0.000000012345 6803840879439

0.000100000000 -0.000079058111 0.000000000124 0.000000000123 4567902606310

0.000010000000 0.000647775829 0.000000000002 0.000000000001 2345679012483

0.000001000000 -0.003491610289 0.000000000012 0.000000000000 0123456790123

0.000000100000 -0.160781651735 -0.000000000211 0.000000000000 0001234567901

0.000000010000 -0.607816457748 0.000000000182 0.000000000000 0000012345679

0.000000001000 -5.078164577484 0.000000018463 0.000000000000 0000000123457

0.000000000100 -49.78163909912 0.000000101757 0.000000000000 0000000001235

0.000000000010 -496.8164062500 0.000000934831 0.000000000000 0000000000010

0.000000000001 -4967.164062500 -0.000002451946 0.000000000000 0000000000106

Single precision (”kind=4” or ”float”) is clearly not appropriate for numerical differentiation; the round offerror dominates early and so the optimal value of h is large resulting in a poor accuracy.Double precision (”kind=8” or ”double”) performs well but one should be careful not to choose a value ofh that is very small as this will result in significant round-off errors.Quad precision (”kind=16” or ”long double”) again dramatically reduces round off errors, the errors be-coming significant only at h = 10−12 for this case. The use of quad-precision, however, is not common asdouble precision is often sufficient and quad-precision arithmetic takes significantly longer to compute on32 bit platforms [64 bit becoming common? - this statement is outdated?].

17

Page 22: Gantep - EP 208 Books

ConclusionThe FDA method gives poor results and should not be used. The CDA method gives reasonable resultsif you require a simple (easy to remember) method and do not require a very high precision. The REAgives the best result of the three methods and should be used in applications where precision is important.In this kind of numerical work it is advisable to use double precision data to avoid large round off errors.Although quad precision is sometimes available (depending on the platform and compiler) it is not (yet)a commonly used precision. The choice of the value of h can be important, the optimal value will dependon the function, where it is being evaluated and the method used; one should not choose an arbitrary value.

Additional TasksInvestigate the CDA2 and REA2 expressions for finding the second derivative of a function.

SolutionThis is left to the student. Feel free to discuss your results with your teacher.

18

Page 23: Gantep - EP 208 Books

2.5 Example exam questions

Question

a) Write a computer program to evaluate the first derivative of a

function f(x) using the Central Difference Approximation method:

CDA = ( f(x+h) - f(x-h) ) / 2h

b) Using Taylor’s expansion show that the truncation error in this

approximation is given by: error = (h^2/6).f’’’(x) + O(h^4)

c) Theoretically, how can the error in the CDA be minimised?

In practice, what other type of error exists in this method.

d) i. Using the CDA with h=0.1, evaluate the first derivative of

f(x) = x^4 at x=3.2

ii. Using calculus, determine the value for the error in your result

and show that it equals (h^2/6).f’’’(x)

Question

a) Write a computer program to evaluate the first derivative of a

function f(x) using the Richardson Extrapolation Approximation method:

REA = ( f(x-2h) - 8f(x-h) + 8f(x+h) - f(x+2h) ) / 12h

b) Using Taylor’s expansion show that the truncation error in this

approximation is given by: error = -(h^4/30).f’’’’’(x) + O(h^6)

c) Theoretically, how can the error in the REA be minimised?

In practice, what other type of error exists in this method.

d) i. Using the REA with h=0.1, evaluate the first derivative of

f(x) = x^6 at x=3.2

ii. Using calculus, determine the value for the error in your result

and show that it equals -(h^4/30).f’’’’’(x)

Question

a) Write a computer program to evaluate the second derivative of a

function f(x) using the Central Difference Approximation method:

CDA2 = ( f(x-h) - 2f(x) + f(x+h) ) / h^2

b) Using Taylor’s expansion show that the truncation error in this

approximation is given by: error = (h^2/12).f’’’’(x) + O(h^4)

c) Theoretically, how can the error in the CDA2 be minimised?

In practice, what other type of error exists in this method.

d) i. Using the CDA2 with h=0.1, evaluate the second derivative of

f(x) = x^5 at x=3.2

ii. Using calculus, determine the value for the error in your result

and show that it equals (h^2/12).f’’’’(x)

19

Page 24: Gantep - EP 208 Books

Question

a) Write a computer program to evaluate the second derivative of a

function f(x) using the Richardson Extrapolation Approximation method:

REA2 = (-f(x-2h)+16f(x-h)-30f(x)+16f(x+h)-f(x+2h))/(12h^2)

b) Using Taylor’s expansion show that the truncation error in this

approximation is given by: error = -(h^4/90).f’’’’’’(x) + O(h^6)

c) Theoretically, how can the error in the REA2 be minimised?

In practice, what other type of error exists in this method.

d) i. Using the REA2 with h=0.1, evaluate the second derivative of

f(x) = x^7 at x=1.5

ii. Using calculus, determine the value for the error in your result

and show that it equals -(h^4/90).f’’’’’’(x)

20

Page 25: Gantep - EP 208 Books

3 Roots, Maxima, Minima (closed methods)

3.1 Topics Covered

gnuplot$>$ plot [0:10] exp(-x)*(x**3-6*x**2+8*x)

http://www1.gantep.edu.tr/~andrew/eee484/images/extrema-test-function.gif

o The sequential search method for finding roots; the student should remember the method, and be able toderive an expression for the number of iterations required to obtain a given accuracy.o The bisection method for finding roots; the student should remember the method, and be able to derivean expression for the number of iterations required to obtain a given accuracy.

http://www1.gantep.edu.tr/~andrew/eee484/images/bisection_method.png

o The sequential search method for maxima and minima; the student should remember the method, and beable to derive an expression for the number of iterations required to obtain a given accuracy.

http://www1.gantep.edu.tr/~andrew/eee484/images/extrema_example.png

3.2 Lecture Notes

IntroductionNumerical methods for finding roots and extremum (maxima and minima) of functions are used when ana-lytical solutions are difficult (or impossible), or when a calculation is part of a larger numerical algorithm.We will study a number of basic numerical methods starting from very simple (and inefficient) sequentialsearches to very powerful Newton’s methods. The algorithms are divided into two groups: closed meth-ods[this week] (where the solution is initially bracketed), and open methods[next week] (where the solutionis not bracketed).

Root finding (closed methods)Definition:The root, xo, of a function f(x) is such that f(xo)=0.For example if f(x) = x3-28, then the root xo is 281/3 = 3.0365889718756625194208095785...In general we will not find an exact solution especially given that roots tend to be irrational. Our strategywill be to define how accurate we want the solution to be and then compute the result approximately to thisaccuracy. This is called a tolerance, for example: Tolerance = 0.0001, this means that the root is requiredto be correct within plus or minus 0.0001 (four decimal place accuracy). For this to work we also need tobe able to determine an error estimate with which the tolerance is compared and the algorithm terminateswhen —error estimate— < tolerance

The sequential search method (closed method)In the sequential search first the position of the root is estimated such that a bracket a,b can be formedplacing a lower- and upper-bound on the root. For this some initial analysis of the function is required.Note that if a single root (or odd number of roots) is bracketed by a and b then there will be a sign changebetween f(a) and f(b). During this search (scan) of the function we can identify a root as follows:

21

Page 26: Gantep - EP 208 Books

Search the function in the range a ≤ x ≤ b in steps of dx until we see a sign change. An estimate of theroot can then be given as the center of the last inspected step with a maximum error of dx/2 (and meanerror of about dx/4).

/sign change

i=0 i=1 i=2 i=3 i=4 ... / i=n

-----|---|---|---|---|---|---|-o-|---|---|---|---|---|---|---|---- x

a /root dx b

/

/

if the sign change occurs between x and x+dx

then root estimate = x + dx/2, maximum error = dx/2

Algorithm 3aSequential search method for finding the root of f(x).All roots between a and b are found.

input a, b, tolerance

dx = 2*tolerance

n = nint((b-a)/dx)

do i = 1, n

x = a + i*dx

if ( f(x)*f(x+dx) < 0 ) output "root = ", x+dx/2

end do

function definition f = x**3-28

Results for a=3.0, b=3.1 and different values of tolerance.

x0 error tolerance n

root = 3.03 -0.66E-2 1.E-2 5

root = 3.037 0.41E-3 1.E-3 50

root = 3.0365 -0.89E-4 1.E-4 500

root = 3.03659 0.10E-5 1.E-5 5000

root = 3.036589 0.28E-6 1.E-6 50000

root = 3.0365889 -0.72E-7 1.E-7 500000

root = 3.03658897 -0.19E-8 1.E-8 5000000

root = 3.036588971 -0.88E-9 1.E-9 50000000 in 0.25 seconds!

We obtain a high accuracy (tolerance = 1e-9) in less than one second (50 million steps). However if theinitial bracket was a=0, b=10 then we would need 2 billion steps taking 30 seconds.We can see that the error is proportional to 1/n and the run-time is proportional to 1/tolerance. We cando this much more efficiently using the following bisection method.

The Bisection method (closed method)In the Bisection method first the position of the root is estimated such that a bracket can be formed placinga lower- and upper-bound on the root. For this some initial analysis of the function is required. A firstestimate of the root is then computed as the mid-point between the two bounds

22

Page 27: Gantep - EP 208 Books

/

LowerBound / UpperBound

------x----------/------o-----------------x------

/ MidPoint

/

/

MidPoint = ( LowerBound + UpperBound ) / 2

Consider the function F(x) = x3-28; the root lies somewhere between x=3.0 and x=3.1. This can be shownbe evaluating the function at these two values: F(3.0) = 27 - 28 = -1 and F(3.1) = 29.791 - 28 = +1.791;the function changes sign implying that the root is bracketed between x=3.0 and x=3.1. The first estimateof the root is then MidPoint = (3.0+3.1)/2 = 3.05.We can improve on this estimate by determining which side of MidPoint the root lies and then moving thebracket accordingly and re-evaluating MidPoint:

if F(LowerBound) . F(MidPoint) is negative

then the root is to the left of MidPoint => move UpperBound to MidPoint

else the root is to the right of MidPoint => move LowerBound to MidPoint

/

/

LowerBound / UpperBound

-----x---------/------o----------------x------

/ MidPoint

-ve / +ve +ve

/

/ <- root is this way

MidPoint is recalculated and the procedure iterated until HalfBracket is less than Tolerance, where Half-Bracket = (UpperBound-LowerBound)/2 is the maximum possible error in our estimate. Each iterationhalves (bisects) the bracket (and therefore halves the maximum possible error) hence the term ”Bisection”.The following algorithm represents a Bisection search for the root of F(x) = x3-28

Algorithm 3b

input lb, ub, tolerance

do

hb = (ub-lb)/2 ! the error estimate

mp = (ub+lb)/2 ! the new root estimate

output mp, hb

if ( hb < tolerance ) exit ! terminate if tolerance is satisfied

if ( f(lb)*f(mp) < 0 ) ub=mp else lb=mp

end do

function definition f = x**3-28

For inputs 3.0, 3.1, 0.001 the result of the algorithm is:

23

Page 28: Gantep - EP 208 Books

MidPoint HalfBracket

3.0500 0.0500

3.0250 0.0250

3.0375 0.0125

3.0313 0.0062

3.0344 0.0031

3.0359 0.0016

3.0367 0.0008

The algorithm terminates after six iterations when the value of HalfBracket (the error estimate) is smallerthan the value of Tolerance; i.e. 0.0008 is less than 0.001. The final value of MidPoint (the root estimate)for this tolerance is 3.037. A trace of values is shown below:

iteration F(MidPoi.) LowerB. MidPoint UpperB. F(L)*F(M) HalfBracket

0 0.3726 3.0000 3.0500 3.1000 -ve 0.0500

1 -0.3194 3.0000 3.0250 3.0500 +ve 0.0250

2 0.0252 3.0250 3.0375 3.0500 -ve 0.0125

3 -0.1474 3.0250 3.0313 3.0375 +ve 0.0062

4 -0.0612 3.0313 3.0344 3.0375 +ve 0.0031

5 -0.0180 3.0344 3.0359 3.0375 +ve 0.0016

6 0.0036 3.0359 *3.0367* 3.0375 -ve 0.0008

Notice that each iteration halves the size of the search region, hence the term Bisection. The table belowgives the number of iterations required to satisfy a given tolerance (the values in brackets are explainedlater).

Tolerance MidPoint(root) HalfBracket true error iterations

10^-1 3.0500000000 0.0500000000 +0.0134110281 0 (-1.0)

10^-2 3.0312500000 0.0062500000 -0.0053389719 3 ( 2.3)

10^-3 3.0367187500 0.0007812500 +0.0001297781 6 ( 5.6)

10^-4 3.0366210937 0.0000976562 +0.0000321219 9 ( 9.0)

10^-5 3.0365905762 0.0000061035 +0.0000016043 13 (12.3)

10^-6 3.0365882874 0.0000007629 -0.0000006845 16 (15.6)

10^-7 3.0365889549 0.0000000954 -0.0000000170 19 (18.9)

10^-8 3.0365889728 0.0000000060 +0.0000000009 23 (22.3)

10^-9 3.0365889720 0.0000000007 +0.0000000002 26 (25.6)

As expected, a greater number of iterations is required to achieve a greater accuracy, for this method theconvergence is exponential (3 or 4 iterations increases the accuracy by a factor of ten). The value of theHalfBracket is the largest possible error in the calculated root. This is illustrated in the table by comparingthis value with the true error = MidPoint - 281/3, the true error is similar to but always smaller than thevalue of HalfBracket. An expression for the relationship between the error and the number of iterations canbe derived as follows:Given an initial error ei, after one iteration the error is ei/2 and after n iterations the error is ei/2

n = ef

(the final error), and so taking logs and rearranging for n we have:n = log(ei / ef ) / log(2). This is the number of iterations required to achieve an accuracy of ef given aninitial accuracy of ei.In the above example the initial accuracy is (3.1-3.0)/2 = 0.05 , the final accuracy en must be less than

24

Page 29: Gantep - EP 208 Books

the tolerance. The expression becomes: n = log(0.05/Tolerance) / log(2) The results from this expressionare shown in the brackets in the above table. The number of iterations performed by the algorithm is thesame as that indicated by the above expression (rounded up to the nearest integer). We see that the erroris proportional to 2¡sup¿-n¡/sup¿ and the run-time is proportional to log(1/tolerance).The Bisection method is similar to the way we search for a word in a dictionary. The upper and lowerbounds are the first and last page respectively and we open the book at the centre page. The word lieseither to the left or the right of the current page, if it is to the right then we turn to the page half waythrough the book to the right (bisecting the pages to the right). We continue the search in the appropriatedirection converging exponentially towards required page. In this way a page can be found in a 1000 pagedictionary in only n = log(500/1) / log(2) = 9 bisections (the tolerance here is 1 page, the initial HalfBracketis 1000/2=500 pages). Try it for your self.

Maxima and Minima [extremum] (closed methods)

[See the figure give in the lecture (URL)]

For some functions we can use differential calculus to find extremum; we know that a minimum occurs whenf’(x)=0 and f”(x)>0, and a maximum when f’(x)=0 and f”(x)<0. For example f(x) = x2 - 8x + 19 and sof’(x) = 2x - 8 = 0 and so an extremum occurs at x = 4. And, f”(x) = 2 (+ve) and so this is a minimum.Also by inspection f(x) = x2 - 8x + 19 = (x-4)2 + 3 and so f(4) is a minimum.However, often it may be difficult, or impossible, to treat a function analytically and we must use a numericalmethod for finding extremum. Also, we must be careful not to mistake local extremum for global extremum.

[See the figure given in the lecture (URL)]

We can attempt to avoid making this mistake by inspecting the function graphically (or equivalently per-forming a sequential search) or re-running our algorithm a number of times with a broad variety of differentinputs.For investigating methods for finding extremum, out test function is: f(x) = e−x(x3 - 6x2 + 8x) and we areinterested in x≥0.

http://www1.gantep.edu.tr/~andrew/eee484/images/extrema-test-function.gif

Sequential Search (closed method)If we plot our test function, say in the range 0<x<10, then we are performing a sequential search. Duringthis search (scan) of the function we can identify extremum as follows:Search the function in the range a ≤ x ≤ b in steps of dx, with x as the current position, the followingconditions are tested:

if [f(x) > f(x-dx) and f(x) > f(x+dx)] then f(x) is a local or global maximum

if [f(x) < f(x-dx) and f(x) < f(x+dx)] then f(x) is a local or global minimum

[See the figure give in the lecture (URL)]

here dx defines the tolerance, and the number of points inspected is nint[(b-a)/dx] + 1.i.e. we loop over i = 0 to n and define xi = a + i dx.

Algorithm 3cSequential search method for finding minimum and maximum of f(x).All global and local minimum and maximum between a and b are found.

25

Page 30: Gantep - EP 208 Books

input a, b

input dx

n = nint((b-a)/dx)

do i=0,n

x = a + i*dx

if [f(x) < f(x-dx) and f(x) < f(x+dx)] output "minima ", f(x), "at x = ", x

if [f(x) > f(x-dx) and f(x) > f(x+dx)] output "maxima ", f(x), "at x = ", x

end do

end

define f(x) = exp(-x) (x^3 - 6x^2 + 8x)

With a=0, b=10 and dx=10−6 the output of this algorithm is:

maxima 1.592547 at x = 0.510 711 (actual error is 0.4x10^-6)

minima -0.165150 at x = 2.710 831 (actual error is 0.4x10^-6)

maxima 0.120121 at x = 5.778 457 (actual error is 0.1x10^-6)

Note that the results are given to 6 decimal places as this is the limit of the accuracy defined by dx = 10−6.To achieve this accuracy the algorithm needs to inspect 10,000,001 points, this takes about 5 seconds onmy 2.4 GHz CPU which may be considered much to slow for general purposes (to increase the accuracy to10−9 the algorithm will take 5000 seconds!).We can see that the error is proportional to 1/n and the run-time is proportional to 1/tolerance. A moreefficient method is the Golden Section search; this however is much more complex to derive and implement.We will look at another powerful (but simple) extremum finder in the next lecture (open methods).

26

Page 31: Gantep - EP 208 Books

3.3 Lab Exercises

Task 1Implement algorithms 3a, 3b, 3c given in the lecture into computer programs. Check the outputs of yourprograms against the solutions given in the lecture.

Task 2Using the bisection method, evaluate to at least 6 decimal place accuracy the root of the following function:f(x) = x2 + loge(x) - 3.73

gnuplot> plot [1:2] x**2 + log(x) - 3.73

http://www1.gantep.edu.tr/~andrew/eee484/images/lab3-fig1.gif

For each method write down:- The evaluated root (to the appropriate number of decimal places).- The estimated error, explain how you arrive at your value.- The number of iterations performed.- The theoretically expected number of iterations for the required accuracy.

Task 3Using any of your computer programs, find all extremum and roots of the function f(x) = ex - 3x2 (for 0 >

x > 4) to 6 decimal place accuracy.

gnuplot> plot [0:4] exp(x) - 3*x**2

http://www1.gantep.edu.tr/~andrew/eee484/images/lab3-fig2.gif

If you have time, experiment with some more functions.

27

Page 32: Gantep - EP 208 Books

3.4 Lab Solutions

Task 1Implement algorithms 3a, 3b, 3c given in the lecture into computer programs. Check the outputs of yourprograms against the solutions given in the lecture.

SolutionsSee eee484ex3a, eee484ex3b, eee484ex3c in the course downloads page.

Task 2Using the bisection method, evaluate to at least 6 decimal place accuracy the root of the following function:f(x) = x2 + loge(x) - 3.73

gnuplot> plot [1:2] x**2 + log(x) - 3.73

http://www1.gantep.edu.tr/~andrew/eee484/images/lab3-fig1.gif

For each method write down:- The evaluated root (to the appropriate number of decimal places).- The estimated error, explain how you arrive at your value.- The number of iterations performed.- The theoretically expected number of iterations for the required accuracy.

SolutionsWith an initial approximate analysis, the root is determined to be between 1.0 and 2.0i.e. F(1.0) = -2.73 and F(2.0) = +0.96Program eee484ex3b (see the downloads page)

MidPoint HalfBracket

1.500 000 00 0.500 000 00 - initial estimate

1.750 000 00 0.250 000 00 - iteration 1

1.875 000 00 0.125 000 00

1.812 500 00 0.062 500 00

1.781 250 00 0.031 250 00

1.765 625 00 0.015 625 00

1.773 437 50 0.007 812 50

1.777 343 75 0.003 906 25

1.775 390 62 0.001 953 12

1.776 367 19 0.000 976 56

1.775 878 91 0.000 488 28

1.776 123 05 0.000 244 14

1.776 245 12 0.000 122 07

1.776 306 15 0.000 061 04

1.776 336 67 0.000 030 52

1.776 351 93 0.000 015 26

1.776 359 56 0.000 007 63

1.776 355 74 0.000 003 81

1.776 353 84 0.000 001 91

1.776 354 79 0.000 000 95 - iteration 19

The program terminates after 19 iterations because HalfBracket (the error estimate) is less than 0.000 001

28

Page 33: Gantep - EP 208 Books

(the tolerance). The result for the root is the final value of MidPoint = 1.776 355 (6dp accuracy). The theo-retical number of iterations required is log(ei/ef)/log(2) = log(0.5/0.000001)/log(2) = 18.9 = 19 (as above).

Task 3Using any of your computer programs, find all extremum and roots of the function f(x) = ex - 3x2 (for 0 >

x > 4) to 6 decimal place accuracy.

gnuplot> plot [0:4] exp(x) - 3*x**2

http://www1.gantep.edu.tr/~andrew/eee484/images/lab3-fig2.gif

SolutionsThe plot indicates that there is a maximum at about 0.2, a root at about 1.0, minimum at about 2.8 anda second root at about 3.7. We will use eee484ex3a and eee484ex3c to perform a sequential search in therange 0,4 with a tolerance of 0.000 001.Results:

eee484ex3a

root = 0.910007

root = 3.733079

check with eee484ex3b with two brackets

{0.90, 0.92} root = 0.910008

{3.73, 3.74} root = 3.733079

eee484ex3c

maxima at x = 0.204481

minima at x = 2.833148

29

Page 34: Gantep - EP 208 Books

3.5 Example exam questions

Question 1 (Bisection Method)

a) Show that, for the bisection root-finding method, the number of

iterations, n, required to reduce the error from an initial value

of e_i to a final value of e_f is given by: n = log( e_i / e_f ) / log(2)

b) Given that a root of the function f(x) = e^x - 3x^2 is near 1.0,

estimate the number of iterations required to achieve an accuracy

of at least 6 decimal places.

Question 2 (Sequential search: root)

Explain how a sequential search can be performed to find roots

of a function f(x). Include in your answer an explanation of

the relationship between the number of function evaluations and the

accuracy of the solution.

Question 2 (Sequential search: extremum)

Explain how a sequential search can be performed to find maxima and

minima of a function f(x). Include in your answer an explanation of

the relationship between the number of function evaluations and the

accuracy of the solution.

30

Page 35: Gantep - EP 208 Books

4 Roots, Maxima, Minima (open methods)

4.1 Topics Covered

gnuplot$>$ plot [0:10] exp(-x)*(x**3-6*x**2+8*x)

http://www1.gantep.edu.tr/~andrew/eee484/images/extrema-test-function.gif

o The Newton-Raphson root-finding method; the student should be able to derive the iterative formula,write a computer program implementing it, and use the formula to find the root of a function by hand(using a pocket calculator).o Newton’s Square-root; the student should be able to derive the Newton’s Square-root iterative formulafrom the the Newton-Raphson iterative formula, write a computer program, and use the method to calculateby hand (using a pocket calculator) the square-root of a positive number.o The Secant root-finding method.o Newton’s method and modified Newton’s method for finding extremum.

4.2 Lecture Notes

IntroductionContinuing from last week, we now look at open methods (not requiring the solution to be bracketed) forfinding roots and extrema of functions.

The Newton-Raphson root finding method (open method)The Newton-Raphson method for finding the root of a function f(x) uses information about the first deriva-tive f ’(x) to estimate how far (and which direction) the root lies from the current position.

Theory:Let x be an approximation to the root xo, the error e is defined as: e = x - xo, and so we can write the rootas xo = x - e.

Taylor’s Expansion gives:f(xo) = f(x-e) = f(x) - e f’(x)/1! + e2 f”(x)/2! - e3 f”’(x)/3! + ....

Ignoring powers of e2 and higher we arrive at an approximation to the root:f(xo) = f(x) - e f’(x) (approximately).f(xo) = 0 and so we can write: 0 = f(x) - e f’(x) and so e = f(x) / f’(x) (approximately) i.e. we have anestimate of the error in the approximation x. We can now correct the root estimate x for this error andarrive at a value closer to the root: xo = x - e and so xo = x - f(x) / f’(x) (approximately).This is the Newton-Raphson improved estimate of the root xo given an initial estimate x.This improved estimate is still not exact as we have not included all the terms in the Taylor expansion (ithas a truncation error), but by iterating the procedure we can repeat the improvements; the iteration isillustrated as follows:xi+1 = xi - f(xi) / f’(xi) where xi is the current estimate and xi+1 is the next, improved, estimate. Thisis the Newton-Raphson iterative formula for the root of a function; the method can be represented by thefollowing algorithm.

31

Page 36: Gantep - EP 208 Books

Algorithm 4a

input x ! input the initial root estimate

input Tolerance ! input the tolerance (required accuracy)

do

Error = f(x) / f’(x) ! the error estimate

output x, Error ! output current estimates

if ( |Error| < Tolerance ) terminate ! terminate if tolerance is satisfied

x = x - Error ! subtract the error estimate

end do

define f(x) = x^3 - 28

define f’(x) = 3 x^2

Notes:- The algorithm is very simple!- It terminates when a tolerance is satisfied.- No bracket is required, only an initial estimate of the root.- The algorithm requires the first derivative of the function.- The error estimate can be negative so the absolute value is compared to the tolerance.- If f ’(x) is close to zero ( i.e. near a turning point in the function f(x) ) then the estimate of the error, f(x)/ f’(x), can be very large launching the solution far away from the root. The algorithm might crash withan overflow or error or take a long time to recover.Convergence for Newton-Raphson is very rapid. The error in the error estimate is proportional to the squareof the error; this vanishes quickly for an error << 1: on each iteration the number of correct significantfigures doubles. This is demonstrated by executing the above algorithm for:f(x) = x3-28, f ’(x)=3x2, Tolerance = 10−12, and an initial estimate of x=3.0:

iteration Root estimate (x) Error estimate

0 3.000000000000000000 -0.037 037 037 037 037 037

1 3.037037037037037037 0.000 447 999 059 936 399

2 3.036589037977100638 0.000 000 066 101 436 680

3 3.036588971875663958 0.000 000 000 000 001 439

The program converges in just 3 iterations giving an accuracy of the order of 10−15.

Newton’s Square-RootA special case of the Newton-Raphson method can be written for the square-root of a positive number p:let x = p1/2 then the root of the function f(x) = x2 - p gives the root of p. In the iterative method ofNewton-Raphson xi+1 = xi - f(xi) / f’(xi), the first derivative is simply 2x and so the formula can be writtenas: xi+1 = xi - (x2

i- p) / 2xi or xi+1 = xi - (xi- p/xi)/2

Algorithm 4b

input p ! input the number we want the sqrt of

input Tolerance ! input the tolerance (required accuracy)

x = p ! initial sqrt estimate

do

Error = (x-p/x)/2 ! the error estimate

output x, Error ! output current estimates

if ( |Error| < Tolerance ) terminate ! terminate if tolerance is satisfied

x = x - Error ! subtract the error estimate

end do

32

Page 37: Gantep - EP 208 Books

Notes:1. The estimate x is initially set to p though this could be p/2 (but not zero).2. We need a tolerance to provide a termination condition.3. The functions f(x) = x2 - p and f ’(x) = 2x do not need to be defined, they are absorbed directly into theerror expression.

ExampleFor p=2 and a tolerance of 10−9 the output of the algorithm is:

2.000 000 000 000 0.500 000 000 000

1.500 000 000 000 0.083 333 333 333

1.416 666 666 667 0.002 450 980 392

1.414 215 686 275 0.000 002 123 900

1.414 213 562 375 0.000 000 000 002

The square root of 2 can therefore be written as 1.414213562 (9 dp) or subtracting the final error estimateas 1.414213562373 (12 dp).

The Secant MethodThe main disadvantage of the Newton-Raphson method is that it requires a knowledge of the first derivative.If the first derivative is not known, or is inconvenient to implement, then it can be approximated numericallyby the iterative form of the Forward Difference Approximation, this leads to the Secant Method:we have the Newton Raphson iterative formula xi+1 = xi - f(xi) / f’(xi) and replacing f ’(xi) with( f(xi) - f(xi−1) / ( xi - xi−1) we have xi+1 = xi - f(xi) ( xi- xi−1) / ( f(xi) - f(xi−1) where xi−1 is the previousestimate, xi is the current estimate, and xi+1 is the next, improved, estimate. This is the Secant iterativeformula for the root of a function; the method can be represented by the following algorithm.

Algorithm 4c

input x0 ! input the lower bracket (previous estimate)

input x1 ! input the upper bracket (current estimate)

input Tolerance ! input the tolerance (required accuracy)

do

Error = f(x1) * (x1-x0) / (f(x1)-f(x0)) ! the error estimate

output x1, Error ! output the current values

if ( |Error| < Tolerance ) terminate ! terminate if the tolerance is satisfied

x0 = x1 ! reassign the previous

x1 = x1 - Error ! subtract the error estimate

end do

define f(x) = x^3 - 28

Notes:- The algorithm is similar to Newton-Raphson, but does not require a knowledge of the first derivative.- As with the Bisection method a lower and upper bracket is required but this bracket does not necessarilyneed to contain the root.- Unlike the Bisection method, convergence is not guaranteed.Convergence for the Secant method is very rapid, almost as fast as Newton-Raphson. This is demonstratedby executing the above algorithm for:f(x) = x3-28, Tolerance = 10−12 and initial bracket x0=3.0 and x1=3.1:

33

Page 38: Gantep - EP 208 Books

iteration Root estimate (x) Error estimate

0 3.100000000000000000 0.064 170 548 190 612 684

1 3.035829451809387316 -0.000 743 875 474 020 715

2 3.036573327283408032 -0.000 015 648 505 989 373

3 3.036588975789397405 0.000 000 003 913 755 049

4 3.036588971875642356 -0.000 000 000 000 020 164

The program converges in 4 iterations giving an accuracy of the order of 10−14.ConclusionBelow is a comparison of the Bisection, Newton-Raphson, and Secant methods for finding the root of f(x)= x3-28 with a tolerance of 10−12.

method root error true number of

estimate estimate error iterations

Bisection 3.036 588 971 876 336 728E-15 673E-15 36

Secant 3.036 588 971 875 642 20E-15 20E-15 4

Newton-Raphson 3.036 588 971 875 664 1E-15 1E-15 3

True root 3.036 588 971 875 663

The Newton-Raphson method has the fastest convergence with the Secant method a close second. An ad-ditional advantage of the Newton-Raphson method over the Bisection and Secant methods is that it doesnot require upper and lower bounds as inputs. However, a disadvantage is that it requires a knowledge ofthe first derivative (which is not always available). Also, the Newton-Raphson method can fail at or closeto turning points in the function (why?). The Bisection method guarantees convergence whereas both theNewton-Raphson and Secant methods can fail to converge on a root.More than one root?Functions may contain more than one root, the algorithms discussed above will only find one root at a timeand so the user will need to guide the root-finder to find the other roots. This involves giving differentbrackets (Bisection and Secant cases) or initial values of x (Newton-Raphson case) until all expected rootsare found.Hybrid algorithmsBy combining the two methods a hybrid algorithm can be constructed which contains the rapid convergenceof the Newton-Raphson or Secant method with the robustness of the bisection method. Try this for yourselfand think about the advantages of your hybrid program.

Extrema (open methods)Last week we studied the Sequential Search method for finding extremum (minima and maxima) of a func-tion; this is a closed method, i.e. it requires the extremum to be bracketed. This week we continue thestudy using open methods.

Newton’s method (open method)This method is very closely related to the Newton-Raphson root finding method. It provides very fastconvergence. However, it is an open method and so involves some uncertainty about which extremum isfound.In the Newton-Raphson method we have a target root x0 = x - e where x is a root estimate and e is theerror. An estimate of e is obtained by truncating Taylor’s expansion f(x+e) = f(x) + e f’(x) = 0 (conditionfor a root) and so e = f(x)/f’(x). The estimate x is then improved iteratively with xi+1 = xi - f(xi)/f’(xi).Similarly, differentiating the truncated Taylor expansion we have f ’(x+e) = f’(x) + e f”(x) = 0 (conditionfor an extremum) and so e = f’(x)/f”(x). The estimate x is then improved iteratively withxi+1 = xi - f ’(xi)/f”(xi). This is Newton’s iterative formula for finding extremum of f(x).

34

Page 39: Gantep - EP 208 Books

Clearly we need the first and second derivatives of f(x); for example for our test function:f(x) = e−x (x3-6x2+8x)and so [using d/dx(u.v)=u.dv/dx+v.du/dx]f ’(x) = e−x (-x3+9x2-20x+8)f”(x) = e−x (x3-12x2+38x-28)

Algorithm 4dNewton’s method for finding accurately and quickly one minimum or maximum of f(x).

input x, tol ! initial estimate and required accuracy

do

e = f’(x) / f’’(x) ! The error estimate

output x, e ! output the current values

if ( |e| < tol ) exit ! terminate if tolerance is satisfied

x = x - e ! subtract the error estimate

end do

if [f’’(x) < 0] output "maxima ", f(x), "at x = ", x

if [f’’(x) > 0] output "minima ", f(x), "at x = ", x

end

define f(x) = exp(-x) (x^3 - 6x^2 + 8x)

define f’(x) = exp(-x) (-x^3 + 9x^2 -20x + 8)

define f’’(x) = exp(-x) (x^3 - 12x^2 + 38x - 28)

with dx=10−9 the output of this algorithm is:

[x=0.5] maxima 1.592547 at x = 0.510 711 428 189 916

[x=2.7] minima -0.165150 at x = 2.710 831 453 551 690

[x=5.7] maxima 0.120121 at x = 5.778 457 118 251 383

The solutions converge in just 4 iterations with an error << 10−9.

Newton’s modified method (an open method)The requirement that we need to know the first and second derivative is a major disadvantage for Newton’smethod. However, as with the Secant root finding method, derivatives can be replaced with numericalapproximations.The CDA approximation for the first derivative requires two x values x0 and x1, The CDA2 approximationof the second derivative requires three values; in this case the third value m is taken as the mean of the firsttwo, i.e m = (x0+ x1)/2

[See the figure give in the lecture (URL)]

We can write CDA = ( f(x+dx) - f(x-dx) ) / (2dx) = ( f(x1) - f(x0) ) / (x1-x0)and CDA2 = ( f(x-dx) + f(x+dx) - 2f(x) ) / dx2 = 4 ( f(x0) + f(x1) - 2 f(m) ) / (x1-x0)

2

The error estimate in Algorithm 4d can be rewritten with the above approximations as:

( x1 - x0 )( f(x1) - f(x0) )

e = -----------------------------

4 ( f(x0) + f(x1) - 2 f(m) )

Applying m = m - e improves the estimate, and then after the reassignments x0 = m - e and x1 = m + e,the procedure is iterated causing m to converge quickly on an extremum.

35

Page 40: Gantep - EP 208 Books

|<- e ->|<- e ->|

----+-------+-------+---> x

x0 m x1

Algorithm 4eNewton’s modified method for finding accurately and quickly one minimum or maximum of f(x).

input x0, x1 ! initial estimates of the root

input tol ! required accuracy

do

m = (x0+x1)/2

e = (x1-x0)*(f(x1)-f(x0))/(f(x0)+f(x1)-2*f(m))/4

output f(m), m, e ! output the current values

if ( |e| < tol ) exit ! terminate if tolerance is satisfied

m = m - e ! improve the extremum estimate

x0 = m - e ! modify x0

x1 = m + e ! and x1

end do

define f(x) = exp(-x) (x^3 - 6x^2 + 8x)

You can defined f”(x) to determine whether the solution is a maximum or minimum.With dx=10−9 the output of this algorithm is:

[x0=0.4 x3=0.6] f(m) = 1.592547 x = 0.510 711 428 296 [actual error 0.1x10^-9]

[x0=2.7 x3=2.8] f(m) = -0.165150 x = 2.710 831 454 653 [actual error 1.1x10^-9]

[x0=5.7 x3=5.8] f(m) = 0.120121 x = 5.778 457 122 806 [actual error 4.6x10^-9]

The solutions converge in about 7 iterations (slightly slower than the Newton’s method) with an error ofthe order of 10−9. The modified Newton’s method avoids the need for derivatives but at the expense of lessaccuracy.

36

Page 41: Gantep - EP 208 Books

4.3 Lab Exercises

Task 1Implement the Newton-Raphson and Secant root-finding algorithms given in the lecture into computer pro-grams. Using your computer programs, evaluate to at least 6 decimal place accuracy the root of the followingfunction:f(x) = x2 + loge(x) - 3.73

gnuplot> plot [1:2] x**2 + log(x) - 3.73

http://www1.gantep.edu.tr/~andrew/eee484/images/lab4-fig1.gif

For each method write down:- The evaluated root (to the appropriate number of decimal places).- The estimated error, explain how you arrive at your value.- The number of iterations performed.- The theoretically expected number of iterations for the required accuracy.

Task 2Implement the Newton’s Square-Root algorithm given in the lecture into a fortran program. Use your pro-gram to evaluate, to at least 9 decimal place accuracy, the square root of 2, 3, 4, ...., 10.Check the results against your pocket calculator.

Task 3Implement Newton’s method and Modified Newton’s method for finding extrema of a function given in thelecture into computer programs. Using your computer programs, find all extremum of the function f(x) =ex - 3x2 (for 0 > x > 4) to 6 decimal place accuracy.

gnuplot> plot [0:4] exp(x) - 3*x**2

http://www1.gantep.edu.tr/~andrew/eee484/images/lab4-fig2.gif

If you have time, experiment with some more functions.

37

Page 42: Gantep - EP 208 Books

4.4 Lab Solutions

Task 2Implement the Bisection, Newton-Raphson and Secant algorithms given in the lecture into computer pro-grams. Use your programs to evaluate, to at least 6 decimal place accuracy, the root of the following function:f(x) = x2 + loge(x) - 3.73

gnuplot> plot [1:2] x**2 + log(x) - 3.73

http://www1.gantep.edu.tr/~andrew/eee484/images/lab4-fig1.gif

For each method write down:- The evaluated root (to the appropriate number of decimal places).- The estimated error, explain how you arrive at your value.- The number of iterations performed.- The theoretically expected number of iterations for the required accuracy.

SolutionsWith an initial approximate analysis, the root is determined to be between 1.0 and 2.0i.e. f(1.0) = -2.73 and f(2.0) = +0.96

Newton-Raphson MethodProgram eee484ex4a (see the downloads page)Here the root estimate is the current value of x and the error estimate is f(x)/f’(x).

x Error estimate

1.500 000 000 -0.293 054 971 - initial estimate

1.793 054 971 0.016 643 345 - iteration 1

1.776 411 626 0.000 056 771

1.776 354 855 0.000 000 001 - iteration 3

With an initial root estimate of 1.5 and a tolerance of 10−6 the program terminates after 3 iterations, thefinal root estimate is 1.776355 (6 dp accuracy). Double precision (kind=8) is used to avoid round-off errors.The theoretical number of iterations is estimated as follows: for the Newton-Raphson method the numberof correct significant figures is said to double on each iteration, we start with one correct s.f. and want 7correct s.f.; 22.8 = 7 which implies about 3 iterations (as above).

Secant MethodProgram eee484ex4b (see the downloads page)Here we provide an initial bracket x0=1, x1=2; the initial root estimate is x1=2.

x2 Error estimate

2.000 000 000 0.260 793 067 - initial estimate

1.739 206 933 -0.035 492 822 - iteration 1

1.774 699 754 -0.001 667 737

1.776 367 491 0.000 012 641

1.776 354 850 -0.000 000 004 - iteration 4

With an initial root estimate of 2.0 and a tolerance of 10−6 the program terminates after 4 iterations, the

38

Page 43: Gantep - EP 208 Books

final root estimate is 1.776355 (6 dp accuracy). Double precision (kind=8) is used to avoid round-off errors.Theoretical the number of required iterations is similar to that from the Newton-Raphson method - plusone or two more iterations.

DiscussionThe Newton-Raphson method converges much more rapidly than the bisection method and tends to over-shoot the required tolerance giving a much higher accuracy than requested. Also, this method does notrequire an initial bracket and so an initial estimate of the root is not necessary. However, the bisectionmethod does not need a knowledge of the first derivative - this is a advantage when the first derivative isdifficult to derive. The Secant method also does not require the first derivative and is much faster than theBisection method.

Task 2Implement the Newton’s Square-Root algorithm given in the lecture into a fortran program. Use your pro-gram to evaluate, to at least 9 decimal place accuracy, the square root of 2, 3, 4, ...., 10.Check the results against your pocket calculator.

Solution

Newton’s Square-RootProgram eee484ex4c (see the downloads page)With a tolerance of 10−9 the results are summarised below:

P Estimate True square-root

2.0 1.414213562 1.414213562

3.0 1.732050808 1.732050808

4.0 2.000000000 2.000000000

5.0 2.236067977 2.236067977

6.0 2.449489743 2.449489743

7.0 2.645751311 2.645751311

8.0 2.828427125 2.828427125

9.0 3.000000000 3.000000000

10.0 3.162277660 3.162277660

All final estimates are accurate when compared to the true root values. This is not surprising as the com-putations are based on the powerful Newton-Raphson algorithm. To obtain a 9 decimal place accuracy, 11significant figures are required; single precision is not sufficient for this and so double precision (kind=8) isemployed.

Task 3Implement Newton’s method and Modified Newton’s method for finding extrema of a function given in thelecture into computer programs. Using your computer programs, find all extremum of the function f(x) = ex

- 3x2 (for 0 > x > 4) to 6 decimal place accuracy.

gnuplot> plot [0:4] exp(x) - 3*x**2

http://www1.gantep.edu.tr/~andrew/eee484/images/lab4-fig2.gif

SolutionsThe plot indicates that there is a maximum at about 0.2 and a minimum at about 2.8. We will use eee484ex4dand eee484ex4e with a tolerance of 0.000 001.Results:

39

Page 44: Gantep - EP 208 Books

eee484ex4d (Newton’s method)

input 0.2: maxima at x = 0.204481 (in 2 iterations)

input 2.8: minima at x = 2.833148 (in 3 iterations)

eee484ex4e (modified Newton’s method)

inputs 0.2,0.3: maxima at x = 0.204481 (in 3 iterations)

inputs 2.8,2.9: maxima at x = 2.833148 (in 3 iterations)

40

Page 45: Gantep - EP 208 Books

4.5 Example exam questions

Question 1 (Newton-Raphson Method)

a) Using Taylor’s expansion derive the following Newton-Raphson iterative

formula for finding the root of a function f(x):

x[i+1] = x[i] - f(x[i]) / f’(x[i])

b) Write a computer program to implement the Newton-Raphson method for

the evaluation of the root of f(x) = e^x - 3x^2. Your program should

include a tolerance as an input.

c) Using the Newton-Raphson method evaluate the root of the

function f(x) = e^x - 3x^2, which is near 1.0, to an accuracy

of at least 6 decimal places. Show the result of each iteration.

Question 3 (Secant Method)

a) Using Taylor’s expansion derive the following Secant iterative

formula for finding the root of a function f(x):

x[i+1] = x[i] - f(x[i]) ( x[i] - x[i-1] ) / ( F(x[i]) - F(x[i-1]) )

b) Write a computer program to implement the Secant method for

the evaluation of the root of f(x) = e^x - 3x^2. Your program should

include a tolerance as an input.

c) Using the Secant method evaluate the root of the

function f(x) = e^x - 3x^2, which is near 1.0, to an accuracy

of at least 6 decimal places. Show the result of each iteration.

Question 5 (Newton’s Square-root)

a) Using the Newton-Raphson iterative formula for the root of a function f(x):

x[i+1] = x[i] - f(x[i]) / f’(x[i]) show that the iterative formula:

x[i+1] = x[i] - (x[i]-p/x[i])/2 converges to the square-root of p.

b) Write a computer program to implement the above formula. Your program should

include a tolerance as an input.

c) Using the above formula evaluate the square-root of 45.6 to an accuracy of at

least 6 decimal places. Show the result of each iteration.

d) Generalise Newton’s Square root to compute the n’th root of a number p.

Question 6

a) Using Taylor’s expansion, derive Newton’s iterative formula

for finding the extremum of a function f(x):

x[i+1] = x[i] - f’(x[i]) / f’’(x[i])

41

Page 46: Gantep - EP 208 Books

b) Write a computer program to implement Newton’s method.

Your program should include a tolerance as an input.

c) Using the Newton’s iterative formula find the maximum of the

function f(x) = e^x - 3x^2, which is near 0.2, to an accuracy

of at least 6 decimal places. Show the result of each iteration.

42

Page 47: Gantep - EP 208 Books

5 Numerical Integration: Trapezoidal and Simpson’s formulae

5.1 Topics Covered

o Numerical Integration: the Extended Trapezoidal Formula (ETF) and the Extended Simpson’s Formula(ESF). The student should remember the formulae for the ETF and ESF, be able to compute results byhand, and implement the formulae in computer programs. The student should understand the significanceof the 1/n2 and 1/n4 terms in the truncation error.

5.2 Lecture Notes

IntroductionIn this lecture we investigate numerical integration using the Newton-Cotes formulas (also called the Newton-Cotes rules). These are a group of formulas for numerical integration based on evaluating the integrand atn+1 equally-spaced points. They are named after Isaac Newton and Roger Cotes. From the Newton-Cotesgroup of rules the two most simplest will be studied, they are the Trapezoid rule and Simpson’s rule. Theaim is to perform, numerically, the integral of a function F(x) over the limits a to b. The basic idea is toevaluate the function at equally space locations between the limits, summing the values in an appropriatemanner will give an approximation to the integral.The approach taken here is to take simple (and therefore less accurate) formulae and implement them in anintelligent way to form basic but practical integration algorithms.

The Trapezoidal Rule (the building block)Consider integrating a known function F(x) over the interval x = a to b. The Trapezoidal Rule gives thefollowing expression for the exact integral I:

I = (h/2) ( F(a) + F(b) ) - (h^3/12) F’’(z) + higher order terms

where h is the interval b-a, F”(z) is the second derivative of the function evaluated at some unknown pointz between a and b. You can find the derivation of this rule elsewhere.Rearranging the Trapezoidal Rule gives:

(h/2) ( F(a) + F(b) ) = I + (h^3/12) F’’(z) + higher order terms

The expression to the left of the equality can be calculated numerically, it approximates the true integralwith a truncation error O(h3). The value of F”(z) is generally not known (z is unknown though boundedbetween a and b) and so this term is omitted from the solution when we perform the numerical calculation.

The Extended Trapezoidal Rule (ETF)We will now extend the Trapezoidal Rule to increase the accuracy of the numerical integral.The expression on the left-hand side gives our numerical approximation for the integral I (which is what wewant to know). This expression is simply the area under the straight line between the points [a,F(a)] and[b,F(b)]. Note that two function evaluations are performed. Of course we do not expect this straight line togive an exact representation of the curve F(x) (unless the curve is a straight line, then the function has theform F(x) = m x + c , and so F”(x) = 0 and so the LHS is exact). The right hand side is the exact integralI plus the unknown term which represents the truncation error in the approximation.To increase the accuracy of the numerical integration we can divide the single interval up into n intervalsand perform n Trapezoidal Rules (n+1 function evaluations). For example for n = 5 intervals:

43

Page 48: Gantep - EP 208 Books

(h/2) ( F(x0) + F(x1) )

+ (h/2) ( F(x1) + F(x2) )

+ (h/2) ( F(x2) + F(x3) )

+ (h/2) ( F(x3) + F(x4) )

+ (h/2) ( F(x4) + F(x5) ) = I1 + (h^3/12) F’’(z1)

+ I2 + (h^3/12) F’’(z2)

+ I3 + (h^3/12) F’’(z3)

+ I4 + (h^3/12) F’’(z4)

+ I5 + (h^3/12) F’’(z5)

where h = (b-a)/n = (b-a)/5

x0 = a, x1 = a+h, x2 = a+2h, x3 = a+3h, x4 = a+4h, x5 = a+5h = b

(i.e. xi = a + i*h , for i = 0 to 5)

The expression reduces to:

h ( F(x0)/2 + F(x1) + F(x2) + F(x3) + F(x4) + F(x5)/2 ) = I + 5 (h^3/12) F’’(z)

Where I = I1+I2+I3+I4+I5 is the exact integral over the full range, and F(z)” = F”(z1) + F”(z2) + F”(z3)+ F”(z4) + F”(z5) is unknown and represents the error in the numerical integration.For n intervals the expression becomes:

+-------------------------------------------------------------------+

| ETF = h ( F(x0)/2 + F(x1) + F(x2) + .... + F(x[n-1]) + F(xn)/2 ) |

| |

| = I + n.(h^3/12).F’’ + higher order terms |

| |

| where h = (b-a)/n , xi = a + i*h , for i = 0 to n |

+-------------------------------------------------------------------+

Extended (Compound) Trapezoidal Formula (ETF) for n intervals.

It is useful to rearrange the formula as follows:

(h/2) ( F(x0) + F(xn) ) + h.( F(x1) + F(x2) + ... + F(x[n-1]) )

| |

a b

Replacing h with (b-a)/n in the right hand side gives:

n (h^3/12) F’’ = (b-a)^3 F’’/(12n^2) + higher order terms

giving for n intervals the numerical integral

+-------------------------------------------------------+

| ETF = h ( F(a) + F(b) ) / 2 |

| + h ( F(x1) + F(x2) + ... + F(x[n-1]) ) |

| |

| = I + (b-a)^3 F’’/(12 n^2) + higher order terms |

| |

| where h=(b-a)/n , xi=a+i*h , for i = 1 to n-1 |

+-------------------------------------------------------+

Extended (Compound) Trapezoidal Formula for n intervals

Inspecting the truncation error term we can now see that the accuracy of the numerical integral can beincreased by increasing the number of intervals n; the truncation error in the ETF is inversely proportional

44

Page 49: Gantep - EP 208 Books

to the the square of the number of intervals, i.e. doubling the number of intervals gives four times theaccuracy. Note that the value of (b-a) is constant (the region of integration). Also, if the second derivativeof F(x) is small then the error is small, if F” is zero then the formula is exact, i.e. for the form F(x) = m x+ c the error is zero (as expected).

ExampleUsing the Extended Trapezoidal Formula (ETF) integrate the functionF(x) = x3 - 3x2 + 5 over the range x = 0.0 to 2.5 using 5 intervals.Solution:First we can see that the second derivative = 6x-6 is not zero and so we expect the ETF to contain anon-zero truncation error.

h = (b-a)/n n=5 i x F(x)

= (2.5-0.0)/5 ---------------------

= 0.5 x0=a 0 0.0 5.000

x1 1 0.5 4.375

xi = 0.0 + i*0.5 x2 2 1.0 3.000

x3 3 1.5 1.625

x4 4 2.0 1.000

x5=b 5 2.5 1.875

ETF = 0.25*( 5.000 + 1.875 ) + 0.5*( 4.375 + 3.000 + 1.625 + 1.000 ) = 6.718750.

Compare with the analytical result 6.640625, we see the error E5 = 0.078125.Repeating the integral over 10 intervals (n=10) gives ETF = 6.660156, the error is E10 = 0.019531 and soE5/E10 = 4.0000512 as predicted by the form of the error term in the Trapezoidal formula. Note that the1/n2 relation is not exact due to higher order terms in the truncation error.

ImplementationFirst consider a basic implementation. The inputs to the numerical integration are:1. The function, F(x), to be integrated2. The limits of the integration, a and b3. The number of intervals of the integration, n.

Algorithm 5a

input a, b ! input the lower and upper limits

input n ! input the number of intervals

h = (b-a) / n ! the interval size

etf = ( f(a) + f(b) ) / 2 ! sum the end points

do i = 1 to n-1

x = a + i*h ! calculate the evaluation position

etf = etf + f(x) ! sum over the remaining points

end do

etf = etf*h ! complete the ETF

print ’The integral = ’,etf ! output the result

end

define the function f(x) = x^3 - 3*x^2 + 5

45

Page 50: Gantep - EP 208 Books

Note: use double precision variables to avoid round-off errors.This algorithm gives the following outputs for the indicated inputs (the true error is given in the brackets):

a=0.0, b=2.5, n= 5 integral = 6.718750 (0.08)

a=0.0, b=2.5, n= 10 integral = 6.660156 (0.02)

a=0.0, b=2.5, n= 100 integral = 6.640820 (0.0002)

a=0.0, b=2.5, n=1000 integral = 6.640627 (0.000002)

As expected, the error in the approximation reduces as the square of the number of intervals, n. Here theerror is calculated by comparing the numerical result with the analytical evaluation; of course the analyticalresult might not be available in practice.

Tolerance and Error EstimationA desirable property of a numerical integration algorithm is that the accuracy of the result is determinedby the user i.e. a tolerance is an input to the algorithm. In our algorithm (above):

replace:

input n ! input the number of intervals

with:

input tolerance ! the required accuracy of the result

The algorithm then has to decide what value of n corresponds to the required accuracy (tolerance). Forthis the algorithm needs to form an estimate E for the error in the ETF and terminate the algorithm whenabs(E) is less than Tolerance. We use abs(E) because E can be negative.

An error estimateAn error estimate can be formulated by making use of the fact that the error has a 1/n2 form. Considern intervals giving an approximation ETFn and error En. Now run the algorithm again with 2n intervals.The result is ETF2n with an error E2n = En/4 (to a good approximation). The difference between thetwo results is (ETFn - ETF2n) = (En - E2n) = (4E2n - E2n) = 3 E2n and so we can write E2n = (ETFn -ETF2n)/3. We therefore have an estimate of the error in the final result ETF2n. This error estimate canbe used as a termination condition in the algorithm: repeat the ETF doubling the number of intervals untilabs(E2n) less than the value of Tolerance.Also we can use the error estimate to improve our final approximation:ETFimproved = ETF2n - E2n (this is applied after the termination condition).The estimated error is subtracted from the final approximation. This works well as long as the assumptionthat E2n = En/4 is accurate.Example using the above data:E10 = (ETF5 - ETF10)/3 = (6.718750-6.660156)/3 = 0.019531ETFimproved = ETF10 - E10 = 6.660156 - 0.019531 = 6.640625 (exact to 6dp).The modification of Algorithm 5a for the input of a tolerance is left to the student (see lab exercise).

Simpson’s Rule (a higher-order method)The second method in Newton-Cotes group is Simpson’s Rule which provides a higher-order method:

I = (h/3) ( F(a) + 4 F((a+b)/2) + F(b) ) - (h^5/90) F’’’’(z) + higher order terms

Rearranging Simpson’s Rule gives:

(h/3) ( F(a) + 4 F((a+b)/2) + F(b) ) = I + (h^5/90) F’’’’(z) + higher order terms

The left hand side can be calculated, there are three function evaluations (i.e. n=2).Extending Simpson’s Rule for n intervals (n must be even) gives:

46

Page 51: Gantep - EP 208 Books

[n=2] (h/3) ( F(x0) + 4 F(x1) + F(x2) ) = I + (h^5/90) F’’’’(z) + higher order terms

[n=4] (h/3) ( F(x0) + 4 F(x1) + F(x2) ) + (h/3) ( F(x2) + 4 F(x3) + F(x4) )

= (h/3) ( F(x0) + 4 F(x1) + 2 F(x2) + 4 F(x3) + F(x4) ) = I + 2(h^5/90) F’’’’(z)

[n even] (h/3) ( F(x0) + 4 F(x1) + F(x2) +

F(x2) + 4 F(x3) + F(x4) +

F(x4) + 4 F(x5) + F(x6) +

.

.

F(x[n-2]) + 4 F(x[n-1]) + F(x[n]) ) = I + (n/2)(h^5/90) F’’’’(z)

and combining terms gives:

+-------------------------------------------------------------+

| ESF = (h/3) ( F(x0) + 4 F(x1) + 2 F(x2) + 4 F(x3) + .... |

| + 2 F(x[n-2]) + 4 F(x[n-1]) + F(x[n] ) |

| |

| = I + (n/2)(h^5/90) F’’’’ + higher order terms |

| |

| where h = (b-a)/n , xi = a + i*h , for i = 0 to n |

+-------------------------------------------------------------+

Extended Simpson’s Formula (ESF) for n(even) intervals.

Replacing h with (b-a)/n

(n/2)(h^5/90) F’’’’ = (b-a)^5 F’’’’/(180 n^4) + higher order terms

and forming summation series, we have

+-------------------------------------------------------------+

| ESF = (h/3) ( F(x0) + F(x[n] ) |

| + 4 (h/3) ( F(x1) + F(x3) + .... + F(x[n-1]) ) |

| + 2 (h/3) ( F(x2) + F(x4) + .... + F(x[n-2]) ) |

| |

| = I + (b-a)^5 F’’’’/(180 n^4) + higher order terms |

| |

| where h = (b-a)/n , xi = a + i*h , for i = 0 to n |

+-------------------------------------------------------------+

Extended Simpson’s Formula (ESF) for n(even) intervals.

Inspecting the truncation error for the ESF we see that the error is proportional to 1/n4 and the fourthderivative of F. We can therefore expect a much smaller truncation error than the ETF.

Example (using the previous ETF example)Using the Extended Simpson’s Formula (ESF) integrate the functionF(x) = x3 - 3x2 + 5 over the range x = 0.0 to 2.5 using 2 intervals.Solution:First we can see that the fourth derivative is zero and so even with just n=2 we expect the truncation errorto be zero.

47

Page 52: Gantep - EP 208 Books

h = (b-a)/n n=2 i x F(x)

= (2.5-0.0)/2 ---------------------

= 1.25 x0=a 0 0.00 5.000

x1 1 1.25 2.265625

x2=b 2 2.50 1.875

ESF = (h/3) ( F(x0) + 4 F(x1) + F(x2) )

= 1.25/3 * ( 5.000 + 4*2.265625 + 1.875 ) = 6.640625

The result is exact as expected.To test the ESF further we need a function that has a non-zero fourth derivative.For example integrate the function F(x) = x5 over the same interval:

h = (b-a)/n n=2 i x F(x)

= (2.5-0.0)/2 ---------------------

= 1.25 x0=a 0 0.00 0.00

x1 1 1.25 3.0517578125

x2=b 2 2.50 97.65625

ESF = (h/3) ( F(x0) + 4 F(x1) + F(x2) )

= 1.25/3 * ( 0 + 4*3.0517578125 + 97.65625 ) = 45.7763671875

The exact results is 2.56/6 = 40.690104166666667 and so the error E2 = 45.7763672 - 40.690104166666667= 5.086263020833333Repeating for n=4 we get ESF = 41.00799560547 and so E4 = 41.00799560547 - 40.690104166666667 =0.3178914388021We expect an error proportional to 1/n4 i.e. E2/E4 = 24 = 16, and 5.086263020833333/0.3178914388021 =16.00000 which is consistent with the 1/n4 expectation.

ImplementationThe inputs to the numerical integration are: 1. The function, F(x), to be integrated; 2. The limits of theintegration, a and b; 3. The number of intervals of the integration, n.

Algorithm 5b

input a, b ! input the lower and upper limits

input n ! input the number of intervals

h = (b-a) / n ! the interval size

esf = f(a) + f(b) ! sum the end points

do i = 1, n-1,2

x = a + i*h ! calculate the evaluation position

esf = esf + 4 * f(x) ! sum the odd points

end do

do i = 2, n-2,2

x = a + i*h ! calculate the evaluation position

esf = esf + 2 * f(x) ! sum the even points

end do

esf = esf * h/3

print ’The integral = ’,esf ! output the result

end

define the function f(x) = x^5

48

Page 53: Gantep - EP 208 Books

This ESF algorithm is compared with the ETF algorithm for F(x) = x5, a=0.0, b=2.5 :

intervals ETF error ESF error

n= 4 46.968460083 6.278355916 41.007995605 0.317891439

n= 8 42.274594307 1.584490140 40.709972382 0.019868215

n= 16 41.087158024 0.397053858 40.691345930 0.001241763

n= 32 40.789425838 0.099321672 40.690181777 0.000077610

n=320 40.691097575 0.000993409 40.690104174 0.000000008

The ESF method is clearly more accurate than the ETF method.

Tolerance and Error EstimationAs discussed earlier for the ETF algorithm, it is desirable to replace the input n with a tolerance; for this thealgorithm needs to form an estimate E for the error in the ESF and terminate the algorithm when abs(E)is less than the value of Tolerance.

An error estimateAn error estimate can be formulated by making use of the fact that the truncation error has a 1/n4 form.Consider n intervals giving an approximation ESFn and error En. Now run the algorithm again with 2nintervals. The result is ESF2n with an error E2n = En/16 (to a good approximation). The difference betweenthe two results is (ESFn - ESF2n) = (En - E2n) = (16E2n - E2n) = 15 E2n and so we can write E2n = (ESFn

- ESF2n)/15. We therefore have an estimate of the error in the final result ESF2n.This error estimate can be used as a termination condition in the algorithm: repeat the ESF doubling thenumber of intervals until abs(E2n) is less than the value of Tolerance.Also we can use the error estimate to improve our final approximation: ESFimproved = ESF2n - E2n (this isapplied after the termination condition). The estimated error is subtracted from the final approximation.Example using the above data:E32 = (ESF16 - ESF32)/15 = (40.691345930-40.690181777)/15 = 0.000077610.ESFimproved = ESF32 - E32 = 40.690181777 - 0.000077610 = 40.690104167 (exact to 9dp).However in some cases, where the higher derivatives of a function are significant, this error estimate is notaccurate. In these cases it is better to use an alternative error estimate formed simply as E2n = ESFn -ESF2n, in this way we repeat the calculations until the difference between the previous result and the newresult is smaller than a tolerance. This avoids the possible underestimation of the above error estimate (seethe lab exercise).

Adaptive methodsThe Newton-Cotes formulas we have considered in this lecture perform numerical integration based on eval-uating the integrand at n+1 equally-spaced points. This may not be an effective approach in cases were thefunction varies greatly over the region of integration. For example and Gaussian function has long (infinite)tails that reduce asymptotically towards zero. In such cases the spacing between function evaluations needto vary. This is the subject of adaptive numerical integration methods. You can read about these methodselsewhere.

SummaryWe have investigated two numerical methods for integrating a function F(x) over the range x = a to b basedon evaluating the integrand at n+1 equally-spaced points. The Extended Trapezoidal Formula (ETF) hasa truncation error proportional to the second derivative and 1/n2. The Extended Simpson’s Formula (ESF)has an truncation error proportional to the fourth derivative and 1/n4.Error estimates can be formed by considering the 1/n2 and 1/n4 forms for the truncation error or simplyby comparing the difference between the previous result and the new result as the number of intervalsis increased. These error estimates can be used as a termination condition in the numerical integrationalgorithm.

49

Page 54: Gantep - EP 208 Books

5.3 Lab Exercises

The taskWrite a computer program to integrate the following functions to an accuracy of at least 6 decimal placesusing the Extended Trapezoidal Formula and the Extended Simpson’s Formula.F(x) = ( 1-x2 )1/2 over the range x=-1 to x=1F(t) = 894 t / ( 1.76 + 3.21 t2 )3 over the range t=0 to t=1.61Estimate the integrals from a rough sketch of the function.

Questions1. What are the results of your program - do they look reasonable?2. What is the approximate error in the numerical integral?Note: To determine the integral to at least 6 decimal places you should run the program with say 16 inter-vals, then run it again with 32 intervals and 64 and 128 etc until the error estimate is less than 0.000001.But be careful with the way you choose the error estimate - you might run into problems! Also, if you canthink of a way to instruct the computer to perform this procedure automatically then you will save yourself a lot of time (and have a useful program!).

50

Page 55: Gantep - EP 208 Books

5.4 Lab Solutions

Write a computer program to integrate the following functions to an accuracy of at least 6 decimal placesusing the Extended Trapezoidal Formula and the Extended Simpson’s Formula.F(t) = 894 t / ( 1.76 + 3.21 t2 )3 over the range t=0 to t=1.61F(x) = ( 1-x2 )1/2 over the range x=-1 to x=1Estimate the integrals from a rough sketch of the function.1. What are the results of your program - do they look reasonable?2. What is the approximate error in the numerical integral?

Solutions:Programs: eee484ex5a and eee484ex5b (see download page).To determine the integral to at least 6 decimal places you should run the program with say 16 intervals, thenrun it again with 32 intervals and 64 and 128 etc until the error estimate is less than 0.000001. The resultsshown below use the following error estimates: E2n = (ETFn-ETF2n)/3 and E2n = (ESFn-ESF2n)/15. Wehave already seen in the lecture notes that these estimates can be very accurate. However, we will see thatthey are not so good for the second function given in this exercise and so later we will try again with theerror estimates E2n = ETFn-ETF2n and E2n = ESFn-ESF2n.

First it is good practice to estimated the integral from a rough sketch to make sure that computed re-sult is not completely wrong (due to a programming error).Results for F(t) = 894 t / ( 1.76 + 3.21 t2 )3

First a rough sketch of the integral shows that we expect the result to be approximately 21.8.Next, computing the ETF and ESF for n=16 and then doubling until the tolerance is satisfied, gives:

n etf Error Estimate True E. esf Error estimate True E.

16 21.650237 (ETFn-ETF2n)/3 21.795657 (ESFn-ESF2n)/15

32 21.756923 -0.035562 -0.035367 21.792484 0.000211 0.000195

64 21.783457 -0.008845 -0.008833 21.792302 0.000012 0.000012

128 21.790082 -0.002208 -0.002208 *21.792290 0.0000007 0.0000007

256 21.791738 -0.000552 -0.000552 21.792290 0.00000005 0.00000005

512 21.792152 -0.000138 -0.000138 21.792290 0.000000003 0.000000003

1024 21.792255 -0.000034 -0.000034 21.792290 0.000000000 0.000000000

2048 21.792281 -0.000009 -0.000009

4096 21.792287 -0.000002 -0.000002

8192 *21.792289 -0.0000005 -0.0000005

Here the Error Estimate and True Error are compared; in this case to 6dp the error estimates are accurate.Iterations are terminated(*) when the error estimate is less than 10−6.For the ETF a 6 decimal place accuracy is achieved for n=8192 (the error estimate is -0.000 000 5) withthe numerical integral = 21.792 289. The actual error by calculus is also 0.000 000 5, in this case the errorestimate is accurate. The ESF gives the same result but in fewer iterations. Again the error estimate isaccurate.

51

Page 56: Gantep - EP 208 Books

Results for F(x) = ( 1-x2 )1/2

First a rough sketch of the integral shows that we expect the result to be approximately 1.57.

n etf Error estimate True esf Error estimate True error

16 1.544910 (ETFn-ETF2n)/3 1.560595 (ESFn-ESF2n)/15

32 1.561627 -0.005572 -0.009170 1.567199 -0.000440 -0.003597

64 1.567551 -0.001975 -0.003245 1.569526 -0.000155 -0.001270

128 1.569648 -0.000699 -0.001148 1.570348 -0.000055 -0.000449

256 1.570390 -0.000247 -0.000406 1.570638 -0.000019 -0.000159

512 1.570653 -0.000087 -0.000144 1.570740 -0.000007 -0.000056

1024 1.570746 -0.000031 -0.000051 1.570777 -0.000002 -0.000020

2048 1.570778 -0.000011 -0.000018 *1.570789 -0.0000009 -0.0000070

4096 1.570790 -0.000004 -0.000006 1.570794 -0.0000003 -0.0000025

8192 1.570794 -0.000001 -0.000002 1.570795 -0.0000001 -0.0000009

16384 *1.570796 -0.0000005 -0.0000008 1.570796 -0.00000004 -0.00000031

For the ETF 6 decimal place accuracy is achieved with n=16384 giving a numerical integral = 1.570796The error estimate is -0.000 000 5 while the actual error by calculus is 0.000 0009 about twice the estimatederror but still less than 10−6. For this function the error estimate is not very accurate but still serves wellfor the termination condition.The ESF performs only slightly better than the ETF for this function, except for the error estimate whichis of the order of 10 times less than the true error. Consequently the algorithm terminates too soon givingthe result 1.570 789 that has only 5 dp accuracy! This problematic behavior can be explained by the factthat the higher derivatives of this function are significant. The solution to this problem is to define the errorestimate in a more reliable way as E2n = ESFn - ESF2n, in this case the algorithm terminates at n=16384giving the result 1.570796 that is now correct to 6 decimal places (see below).

DiscussionThe error estimates (ETFn-ETF2n)/3 and (ESFn-ESF2n)/15 in some cases give very good estimates of thetruncation error and in other cases not so good; this varies from function to function (and where the functionis being evaluated). A much more reliable error estimate is E2n = ESFn - ESF2n. This will guarantee thecorrect result!It would be practical to start with, for example, n=1024; the solution is then only a three or four program-runs away. Alternatively you could write the program such that it automatically repeats the ETF (doublingn each time) until the termination condition is met. This is implemented in eee484ex5a-auto and eee484ex5b-auto (see the downloads page). The more reliable error estimate of E2n = ESFn - ESF2n is also used inthese programs.Results for F(t) = 894 t / ( 1.76 + 3.21 t2 )3.

n ETF Error ESF Error

32 21.756923 -0.106686 21.792484 0.003172

64 21.783457 -0.026534 21.792302 0.000183

128 21.790082 -0.006625 21.792290 0.000011

256 21.791738 -0.001656 *21.792290 <0.000001

512 21.792152 -0.000414

1024 21.792255 -0.000103

2048 21.792281 -0.000026

4096 21.792287 -0.000006

8192 21.792289 -0.000002

16384 *21.792289 -0.000000

52

Page 57: Gantep - EP 208 Books

Results for F(x) = ( 1-x2 )1/2 over the range x=-1 to x=1.

n ETF Error ESF Error

32 1.561627 -0.016717 1.567199 -0.006604

64 1.567551 -0.005925 1.569526 -0.002327

128 1.569648 -0.002097 1.570348 -0.000821

256 1.570390 -0.000742 1.570638 -0.000290

512 1.570653 -0.000262 1.570740 -0.000103

1024 1.570746 -0.000093 1.570777 -0.000036

2048 1.570778 -0.000033 1.570789 -0.000013

4096 1.570790 -0.000012 1.570794 -0.000005

8192 1.570794 -0.000004 1.570795 -0.000002

16384 1.570796 -0.000001 *1.570796 <-0.000001

32768 *1.570796 <-0.000001

In this case the 6 decimal place accuracy is guaranteed.

ConclusionWe can integrate functions numerically to a predefined accuracy. Higher-order methods do not necessarilygive more accurate results! The Extended Trapezoidal Formula and Extended Simpson’s Formula are simpleto implement and work well when used intelligently with a termination condition based on an error estimateand a tolerance. Be careful when forming error estimates, E2n = ESFn - ESF2n is safer.

53

Page 58: Gantep - EP 208 Books

5.5 Example exam questions

Question

a) Using the ETF(or ESF) perform the following integral by dividing

the region of integration into ten equally spaced intervals:

Integral of ( 6 x^2 - e^x ) dx from 1.0 to 5.0

b) Write a computer program to implement this method for 100 intervals.

Answers

a) The ETF is given by: I = h ( f0/2 + f1 + f2 + ..... + fn-1 + fn/2 )

where fi = f(xi) and xi = a + i*h and h = (b-a)/n.

Note that there are (n+1) function evaluations.

For ten intervals (n=10), and a range of 1.0 to 5.0

Evaluating with a pocket calculator f(x) = 6x^2 - e^x gives:

sum = 252.51921 ; I = 0.4*sum = 101.00768

Repeating with the ESF: I = h/3 ( f0 + 4 f1 + 2 f2 + ..... + 2 fn-2 + 4 fn-1 + fn )

Evaluating with a pocket calculator f(x) = 6x^2 - e^x gives:

sum = 767.1359 ; I = 0.4/3*sum = 102.285

A quick check with the analytical solution 102.305 indicates that

the answer seems reasonable.

b) See the lab exercise.

54

Page 59: Gantep - EP 208 Books

6 Solution of D.E.s: Runge-Kutta, and Finite-Difference

6.1 Topics Covered

o Runge-Kutta; the student should be able to write down Euler (first-order RK) steps representing the timeevolution of a given (simple) physical system, and calculate the truncation error by considering Taylor’sseries. The student should be able to write a computer program implementing the formulae for the givenphysical system.o Laplace and Jacobi Relaxation: the student should be able to derive the finite difference form for Laplace’sequation using the ”CDA2”, and solve for the potential V(x,y,z).

6.2 Lecture Notes

IntroductionThe numerical solution of differential equations is a very large subject spanning many types of problems andsolutions. In this lecture we will look at just two simplified topics: the solution of ordinary differential equa-tions using Euler’s method (first-order Rung-Kutta), and the solution of partial differential equations usingthe finite-difference method and Jacobi Relaxation. You can read about many other types of problems andsolutions in your course text book (”Ordinary Differential Equations” and ”Partial Differential Equations”).

The Euler Method (first-order Rung-Kutta)Many physical systems can be expressed in terms of first-order or second-order differential equations. Thetime-evolution of these systems can be approximated with a Euler method. We will apply the Euler methodto simulate a body in free-fall, the charging and discharge of a simple R-C circuit, and the motion of amass-on-a-spring system.

To introduce the ideas of Euler methods we will first look at a simple ’freshman physics’ problem ofa body in free-fall; here air resistance is ignored and the acceleration due to gravity is a constant = 9.81m/s2. First the problem is solved analytically and then we will develop Euler methods to solve the problemnumerically. We will then study Euler methods further.

Free-fall - analytical solutionWe will determine the displacement of a body in free-fall. The boundary conditions for the solution is thatat t=0 the initial displacement, y, and the initial velocity, v, are both zero. The system is governed by thesecond-order differential equation y” = -g ; integrating with the above boundary conditions givesy’ = - g t and so y = - 0.5 g t2.For t = 10 seconds we therefore have a displacement of -0.5x9.81x102 = -490.500 m

Free-fall - numerical solution (by the Euler method)We have the second-order differential equation y” = -g this can be written as two first-order equations:dv/dt = -g and dy/dt = v, rearranging these equations we have: dv = -g dt and dy = v dt, which can bewritten as:

v(t+dt) = v(t) - g dt

y(t+dt) = y(t) + v(t) dt

These expressions simple state that the velocity a time dt later is the current velocity advanced by accelera-tion multiplied by dt, and the displacement a time dt later is the current displacement advanced by velocitymultiplied by dt. The expressions are exact if dt takes the calculus form of ”dt tends to zero”. However, for

55

Page 60: Gantep - EP 208 Books

a numerical solution dt is small but not zero; with a finite value for dt the above equations become Euler’smethod, the expressions now, in general, contain truncation errors (see later).Writing the expressions in the form of an algorithm:

v1 = v0 - g dt Euler step evolving velocity and

y1 = y0 + v0 dt displacement in a finite time dt

Three versions of the Euler step exist:

Simple Euler Euler-Cromer Improved Euler

v1 = v0 - g dt v1 = v0 - g dt v1 = v0 - g dt

y1 = y0 + v0 dt y1 = y0 + v1 dt y1 = y1 + (v0+v1)/2 dt

Each version uses a different velocity to evolve y:

Simple Euler : v0 - generally poor accuracy

Euler-Cromer : v1 - works well for oscillating systems

Improved Euler : (v0+v1)/2 - exact for the free-fall system.

Algorithm 6aImplementation of the Simple Euler method to evolve the displacement of a body under free-fall (g=9.81)for 10 seconds. The system evolves in time steps of 0.1 seconds (100 iterations).

dt = 0.1 ! Time step is 0.1 seconds

t = 0 ! Time is initially zero

y = 0 ! Displacement is initially zero

v1 = 0 ! Velocity is initially zero

do 100 iterations

v0 = v1 ! Record the previous velocity

t = t + dt ! Evolve time

v1 = v0 - 9.81*dt ! Evolve velocity accord

y = y + v0 dt ! Evolve displacement (Simple Euler method)

end do

output y ! Output the result

Result: displacement at t = 10.000 seconds is -485.595 m

The exact result (from calculus) is - g t2/2 = -9.81x102/2 = -490.500 m. This difference, +4.905 m, betweenthis numerical result and the analytical result is due to truncation errors in the Simple Euler method. Wecan investigate this by considering Taylor’s expansion: y(t+dt) = y(t) + y’(t) dt + y”(t) dt2/2 + y”’(t)dt3/6 + ...In free-fall y’(t)=v(t), y”(t)=-g, and y”’(t)=0, and so we can write: y(t+dt) = y(t) + v(t) dt - g dt2/2.The first two terms in the above equation is the Euler-step for evolving displacement (with the evolution ofvelocity v(t+dt) = v(t) - g dt being exact). The last term g dt2/2 represents the truncation error in eachSimple Euler step. In the above algorithm the Euler step is iterated 10s/0.1s = 100 times which implies thetotal error is 100 g dt2/2 = 100x9.81x0.12/2 = 4.905 m, as seen in the results of the algorithm.Again by considering Taylor’s expansion it can be shown that the Improved Euler method is exact for thecase of free-fall (see homework). Replace y = y + v0 dt with y = y + (v0+v1)/2 dt in the above algorithmto convert it to the Improved Euler method.Euler methods give a general tool for solving systems governed by first- or second-order differential equations- in many cases analytical solutions are not available. Euler methods are generally not exact though with

56

Page 61: Gantep - EP 208 Books

careful choice of the version of the method and by using a small enough value for dt Euler methods can givegood results.The general form for the Simple Euler method:

First-order: dy/dt = f(), Euler step: y1 = y0 + f() dt

where f() is any function of t and y and t,y can represent any parameters.

Example, a marble falling in oil: dv/dt = g - bv/m

Euler step: v = v + (g-bv/m) dt (only velocity is evolved)

Second-order: y’’ = f()

where f() is any function of t and y and t,y can represent any parameters.

Replace this with two first-order equations: dv/dt = f() and dy/dt = v

Euler steps: v1 = v0 + f() dt and y = y + v0 dt

Example, a simple pendulum:

let y = theta and v = omega => theta’’ = - g Sin(theta)/L

replace with two first-order equations:

d(omega)/dt = -g Sin(theta)/L and d(theta)/dt = omega

Euler steps: omega1 = omega0 - (g Sin(theta)/L) dt and theta = theta + omega0 dt

For systems governed by second-order differential equations the Euler-Cromer and Improved Euler methodscan be obtained with simple modifications of the above formulae.

Summary so farIt has been shown that a Euler method can be implemented to evolve, in time, the motion of a free-fallingbody. No analytical inputs are required and so this method can be employed in the study of more complexsystems where analytical solutions are difficult of impossible. The type of Euler method should be chosencarefully; the Improved-Euler method gives an exact result in the case of free-fall, the Euler-Cromer methodis more suitable for oscillating systems.

ExamplesThe following are examples of employing the Simple Euler method to solve some physical systems (eachsystem has an analytical solution with which you can check the result of the numerical solution).

A first-order system

A charging R-C circuit.

A simple R-C circuit is governed by the 1st-order D.E.:

i = dq/dt , where i is the current in the circuit: i = (V0-V)/R

V0 is the charging voltage, and V is the potential

difference across the capacitor V = q/C.

The analytical result for potential difference across the capacitor

as it charges is given by: V = V0 (1-exp(-t/RC))

The Simple Euler simulation is as follows:

The system should be initialised:

57

Page 62: Gantep - EP 208 Books

R = 1000 ! Circuit resistance (Ohms)

C = 1E-6 ! Circuit capacitance (Farads)

V0 = 12 ! Charging voltage (Volts)

V = 0 ! Initial potential of the capacitor (Volts)

q = 0 ! Initially uncharged (Coulombs)

dt = 1E-5 ! Euler time step (seconds)

t = 0 ! Start at t=0 seconds

The Simple Euler steps are

i = (V0-V)/R ! Calculate the circuit current

t = t + dt ! Advance the time

q = q + i dt ! Add a small amount of charge dq = i dt

V = q/C ! Recalculate the voltage

The system evolves by iterating the above Euler steps.

A second-order system

The displacement of a mass-on-a-spring

The restoring force of a mass on a spring is given by F = -k.x

=> the motion of the body is governed by the 2nd-order D.E.:

x’’ = -kx/m

where k is the spring constant (N/m),

x is the displacement,

m is the inertial mass.

two 1st-order D.E.s are formed:

dv/dt = -kx/m and

dx/dt = v

The analytical result for the displacement is: x = x0 Cos(w.t)

where x0 is the amplitude and w is the angular frequency = SQRT(k/m)

The Euler simulation is as follows (here I show the Euler-Cromer

method - it is more accurate for oscillating systems and is more

concise to write down):

The system should be initialised:

m = 0.1 ! Mass (kg)

k = 1.0 ! Spring constant (N/m)

x = 0.1 ! Initial displacement (amplitude) (m)

v = 0 ! The mass is initial at rest.

58

Page 63: Gantep - EP 208 Books

dt = 0.01 ! Time step (seconds)

t = 0 ! Start at t=0

The Euler-steps are

t = t + dt ! Advance the time

a = -kx/m ! Calculate the acceleration

v = v + a dt ! Advance the speed

x = x + v dt ! Advance the displacement (using the new speed)

The system evolves by iterating the above Euler steps.

Algorithm 6bAs an example, a simulation of the displacement of a mass-on-a-spring is implemented in the algorithmbelow. The 1 kg mass is initially at rest and at a displacement of 1 m. The spring constant is 1 N/mand the simulation evolves in time steps of 0.1 seconds. The algorithm terminates after 100 iterations (10seconds).

m = 0.1 ! Mass !

k = 1.0 ! Spring constant ! Parameters

x0 = 0.1 ! Amplitude !

dt = 0.01 ! Time step !

x = x0 ! Initial displacement !

v = 0.0 ! Initial velocity ! Initial state

t = 0.0 ! Initial time !

do 100 iterations

t = t + dt ! Advance time

a = -k x/m ! Calculate the acceleration

v = v + a dt ! Advance the velocity

x = x + v dt ! Advance the displacement (Euler-Cromer)

! Ouput, and compare the displacement with the exact expression

output x, x0*COS(SQRT(k/m)*t)

end do

4th-Order Runge-KuttaFor ”serious” calculations, higher-order methods are employed. One very popular method is the 4th-orderRunge-Kutta method; the truncation error is greatly reduced though at the expense of a more complexalgorithm. You can read more about this in your course text book.

Finite-Difference MethodsMany physical systems can be represented by partial differential equations. Such systems can be solved nu-merically using finite-difference methods; i.e. the PDEs are replaced with finite-difference approximations.For example dV/dt can be approximated by the forward-difference approximation (V(t+dt)-V(t))/dt, andd2V/dx2 can be approximated by the Central-Difference approximation ( V(x-dx) - 2V(x) + V(x+dx) ) /dx2. In this lecture we will look at just one such method; you can refer to the course text book for anintroduction to a number of other Finite-Difference methods.

59

Page 64: Gantep - EP 208 Books

Finite-Difference: LaplaceFor a region of space which does not contain any electric charge the electric potential, V(x,y,z) in thatregion must obey Laplace’s equation: d2V/dx2 + d2V/dy2 + d2V/dz2 = 0; here d2/dx2 etc are partialderivatives.Laplace’s equation can be solved analytically for simple (symmetric) configurations; however, if a morecomplex configuration is to be solved then a numerical method must be employed. The numerical solutioninvolves four steps: 1. The region of space is represented by a three-dimensional lattice where the potentialis defined at discrete points. 2. Laplace’s Equation is written numerically as a finite difference equation.3. The numerical equation is solved. 4. A relaxation algorithm is employed to apply the solution untilLaplace’s equation is satisfied. The simplest method for this is the Jacobi Method.

1. A 3-d latticeThe potential V(x,y,z) in a region of space can be mapped by a lattice V(i, j, k) where i, j, and k specifypoints in the lattice:

x = i.dx , y = j.dy , z = k.dz

where dx, dy, dz are the spacing between the lattice points in the x-, y-, and z-direction respectively.The lattice is initialised with the boundary conditions (which are fixed) and with an initial approximationto the solution (which will be relaxed to the solution). Note: as the lattice spacing tends to zero the latticebecomes continuous space. However, the lattice spacing must be finite in this computed solution; the modelis therefore not exact.

2. A Finite Difference equation for Laplace’s Equation.Recall that the central difference approximation for the second derivative of a function F(x) is given by:

CDA2 = ( F(x-h) - 2F(x) + F(x+h) ) / h^2

and so we can write (approximately):

d^2 V

----- = ( V(x-dx,y,z) - 2V(x,y,z) + V(x+dx,y,z) ) / dx^2

dx^2

d^2 V

----- = ( V(x,y-dy,z) - 2V(x,y,z) + V(x,y+dy,z) ) / dy^2

dy^2

d^2 V

----- = ( V(x,y,z-dz) - 2V(x,y,z) + V(x,y,z+dz) ) / dz^2

dz^2

and Laplace’s equation becomes

[ V(x-h,y,z) - 2V(x,y,z) + V(x+h,y,z) +

V(x,y-h,z) - 2V(x,y,z) + V(x,y+h,z) +

V(x,y,z-h) - 2V(x,y,z) + V(x,y,z+h) ] / h^2 = 0

where, for convenience, the lattice spacings are set equal

i.e. dx = dy = dz = h

3. SolutionSolving for V(x,y,z) [see homework] gives

60

Page 65: Gantep - EP 208 Books

V(x,y,z) = [ V(x-dx,y,z) + V(x+dx,y,z) +

V(x,y-dy,z) + V(x,y+dy,z) +

V(x,y,z-dz) + V(x,y,z+dz) ] / 6

The equation simply states that the value of the potential at any point is the average of the potential atneighboring points. The solution for V(x,y,z) is the function that satisfies this condition at all points simul-taneously and satisfies the boundary conditions.

4. Jacobi RelaxationApplying the above solution to the region of space modifies the field so that it is in better agreement withLaplace’s equation. The solution must be applied many times, each iteration giving a better agreement withLaplace’s equation. The solution is satisfied when further iterations yields insignificant modifications to thepotential field. The difference between the old and new lattice can be expressed as:

|V1-V2|

Delta = ----------------

number of points

Iteration can be terminated when Delta is less than some small value that corresponds to an insignificantchange in the solution. This method of relaxation of the potential field is one of many techniques whichcan be used to solve for V(x,y,z). The Jacobi relaxation method is the simplest form of relaxation; othermethods are employed to speed up the relaxation process especially for large lattices.

Algorithm 6cThe following algorithm implements Jacobi Relaxation for a dipole potential. It employs a 33-by-33 two-dimensional lattice and iterates until Delta is less than 10−6. Note that the matrix is centered at (0,0).

Declare matrix V1(-16:+16,-16:+16)

Declare matrix V2(-16:+16,-16:+16)

V2=0.0 ! Set all grid points to zero potential

V2(-6,0)=-1.0 ! Create the -ve pole

V2(+6,0)=+1.0 ! Create the +ve pole

do

V1=V2 ! Make a copy of the old lattice

! Apply the solution to Laplace’s equation

! but don’t modify the boundary.

do i=-15,+15

do j=-15,+15

V2(i,j) = ( V1(i-1,j) + V1(i+1,j) + V1(i,j-1) + V1(i,j+1) )/4

end do

end do

V2(-6,0)=-1.0 ! Reset the dipoles

V2(+6,0)=+1.0

61

Page 66: Gantep - EP 208 Books

! Compute the difference between the old and new solution

Delta = sum(abs(V1-V2)) / (33*33)

if (Delta < 0.000001) exit ! Terminate if Delta is small

end do

The Following are results for a dipole and a parallel-plate capacitor; values of potential are represented bycharacters:

-zyxwvutsrqponmlkjihgfedcba.ABCDEFGHIJKLMNOPQRSTUVWXYZ+

| | | | |

-1V -0.5V 0 V +0.5V 1.0V

62

Page 67: Gantep - EP 208 Books

A Dipole (dipole.f90)

Initial field Final field after 473 iterations

................................................................ ................................................................

................................................................ ................................................................

................................................................ ................................................................

................................................................ .........aaaaaaaaaaaaaaaa..............AAAAAAAAAAAAAAAA.........

................................................................ .......aaaaaaaaaaaaaaaaaaaa..........AAAAAAAAAAAAAAAAAAAA.......

................................................................ .....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA.....

................................................................ ...aaaaaaaaaabbbbbbbbaaaaaaaa......AAAAAAAABBBBBBBBAAAAAAAAAA...

................................................................ ...aaaaaabbbbbbbbbbbbbbbbaaaa......AAAABBBBBBBBBBBBBBBBAAAAAA...

................................................................ ...aaaabbbbbbbbbbccbbbbbbbbaaaa..AAAABBBBBBBBCCBBBBBBBBBBAAAA...

................................................................ ...aaaabbbbccccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCCCBBBBAAAA...

................................................................ .aaaabbbbccccddddddddddccbbbbaa..AABBBBCCDDDDDDDDDDCCCCBBBBAAAA.

................................................................ .aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA.

................................................................ .aaaabbccddddeeffffffffeeddccaa..AACCDDEEFFFFFFFFEEDDDDCCBBAAAA.

................................................................ .aabbbbccddeeffgghhhhggffeeccbb..BBCCEEFFGGHHHHGGFFEEDDCCBBBBAA.

................................................................ .aabbccddeeffggiijjkkiiggeeddbb..BBDDEEGGIIKKJJIIGGFFEEDDCCBBAA.

................................................................ .aabbccddeeffhhjjmmoolliiffddbb..BBDDFFIILLOOMMJJHHFFEEDDCCBBAA.

...................--......................++................... .aabbccddeeffhhkkpp--oojjggddbb..BBDDGGJJOO++PPKKHHFFEEDDCCBBAA.

................................................................ .aabbccddeeffhhjjmmoolliiffddbb..BBDDFFIILLOOMMJJHHFFEEDDCCBBAA.

................................................................ .aabbccddeeffggiijjkkiiggeeddbb..BBDDEEGGIIKKJJIIGGFFEEDDCCBBAA.

................................................................ .aabbbbccddeeffgghhhhggffeeccbb..BBCCEEFFGGHHHHGGFFEEDDCCBBBBAA.

................................................................ .aaaabbccddddeeffffffffeeddccaa..AACCDDEEFFFFFFFFEEDDDDCCBBAAAA.

................................................................ .aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA.

................................................................ .aaaabbbbccccddddddddddccbbbbaa..AABBBBCCDDDDDDDDDDCCCCBBBBAAAA.

................................................................ ...aaaabbbbccccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCCCBBBBAAAA...

................................................................ ...aaaabbbbbbbbbbccbbbbbbbbaaaa..AAAABBBBBBBBCCBBBBBBBBBBAAAA...

................................................................ ...aaaaaabbbbbbbbbbbbbbbbaaaa......AAAABBBBBBBBBBBBBBBBAAAAAA...

................................................................ ...aaaaaaaaaabbbbbbbbaaaaaaaa......AAAAAAAABBBBBBBBAAAAAAAAAA...

................................................................ .....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA.....

................................................................ .......aaaaaaaaaaaaaaaaaaaa..........AAAAAAAAAAAAAAAAAAAA.......

................................................................ .........aaaaaaaaaaaaaaaa..............AAAAAAAAAAAAAAAA.........

................................................................ ................................................................

................................................................ ................................................................

................................................................ ................................................................

A parallel plate capacitor (capacitor.f90)

For a capacitor simply replace: with:

V2(-6,0)=-1.0 V2(-6,-8:+8)=-1.0

V2(+6,0)=+1.0 V2(+6,-8:+8)=+1.0

Initial field Final field after 330 iterations

................................................................ ................................................................

................................................................ .....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA.....

................................................................ ...aaaabbbbbbccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCBBBBBBAAAA...

................................................................ .aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA.

................................................................ .aabbccccddeeffffggggffeeddccaa..AACCDDEEFFGGGGFFFFEEDDCCCCBBAA.

................................................................ .aabbccddffgghhiiiiiihhggeeddbb..BBDDEEGGHHIIIIIIHHGGFFDDCCBBAA.

................................................................ .aaccddeegghhjjkkllllkkiiggeebb..BBEEGGIIKKLLLLKKJJHHGGEEDDCCAA.

................................................................ .bbcceeffhhjjllnnppqqoolliiffcc..CCFFIILLOOQQPPNNLLJJHHFFEECCBB.

...................--......................++................... .bbcceeggiikkmmpptt--ssnnjjggcc..CCGGJJNNSS++TTPPMMKKIIGGEECCBB.

...................--......................++................... .bbddffhhjjlloorruu--ttookkggdd..DDGGKKOOTT++UURROOLLJJHHFFDDBB.

...................--......................++................... .bbddffhhkkmmppssvv--uuppllhhdd..DDHHLLPPUU++VVSSPPMMKKHHFFDDBB.

...................--......................++................... .bbddffiikknnppssww--uuqqllhhdd..DDHHLLQQUU++WWSSPPNNKKIIFFDDBB.

...................--......................++................... .bbddggiillnnqqttww--uuqqmmhhdd..DDHHMMQQUU++WWTTQQNNLLIIGGDDBB.

...................--......................++................... .bbeeggiillooqqttww--vvqqmmhhdd..DDHHMMQQVV++WWTTQQOOLLIIGGEEBB.

...................--......................++................... .bbeeggjjllooqqttww--vvqqmmiidd..DDIIMMQQVV++WWTTQQOOLLJJGGEEBB.

...................--......................++................... .bbeeggjjlloorrttww--vvqqmmiidd..DDIIMMQQVV++WWTTRROOLLJJGGEEBB.

...................--......................++................... .bbeeggjjlloorrttww--vvqqmmiidd..DDIIMMQQVV++WWTTRROOLLJJGGEEBB.

...................--......................++................... .bbeeggjjlloorrttww--vvqqmmiidd..DDIIMMQQVV++WWTTRROOLLJJGGEEBB.

...................--......................++................... .bbeeggjjllooqqttww--vvqqmmiidd..DDIIMMQQVV++WWTTQQOOLLJJGGEEBB.

...................--......................++................... .bbeeggiillooqqttww--vvqqmmhhdd..DDHHMMQQVV++WWTTQQOOLLIIGGEEBB.

...................--......................++................... .bbddggiillnnqqttww--uuqqmmhhdd..DDHHMMQQUU++WWTTQQNNLLIIGGDDBB.

...................--......................++................... .bbddffiikknnppssww--uuqqllhhdd..DDHHLLQQUU++WWSSPPNNKKIIFFDDBB.

...................--......................++................... .bbddffhhkkmmppssvv--uuppllhhdd..DDHHLLPPUU++VVSSPPMMKKHHFFDDBB.

...................--......................++................... .bbddffhhjjlloorruu--ttookkggdd..DDGGKKOOTT++UURROOLLJJHHFFDDBB.

...................--......................++................... .bbcceeggiikkmmpptt--ssnnjjggcc..CCGGJJNNSS++TTPPMMKKIIGGEECCBB.

................................................................ .bbcceeffhhjjllnnppqqoolliiffcc..CCFFIILLOOQQPPNNLLJJHHFFEECCBB.

................................................................ .aaccddeegghhjjkkllllkkiiggeebb..BBEEGGIIKKLLLLKKJJHHGGEEDDCCAA.

................................................................ .aabbccddffgghhiiiiiihhggeeddbb..BBDDEEGGHHIIIIIIHHGGFFDDCCBBAA.

................................................................ .aabbccccddeeffffggggffeeddccaa..AACCDDEEFFGGGGFFFFEEDDCCCCBBAA.

................................................................ .aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA.

................................................................ ...aaaabbbbbbccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCBBBBBBAAAA...

................................................................ .....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA.....

................................................................ ................................................................

The Fortran programs for these two simulations can be found in the downloads section of the course website.Results for various configurations of charges, and for larger matrix sizes can be found at:

http://www1.gantep.edu.tr/~andrew/eee484/downloads/laplace/

The program sources are also available at at that URL.

63

Page 68: Gantep - EP 208 Books

6.3 Lab Exercises

Task 1: Investigation of the potential fields for various configurationsFirst download the program source codes dipole.f90 and capacitor.f90 from the course web site downloadspage. Compile and run them. If you prefer then try to translate these programs to C or C++ etc. The twoprograms are the same except for the definitions of the dipole and capacitor plates. The dipole potential isdefined with the assignments:

V2(-6,0)=-1.0

V2(+6,0)=+1.0

The capacitor plates are defined with the assignments:

V2(-6,-8:+8)=-1.0

V2(+6,-8:+8)=+1.0

Note that the assignments appear twice in each program.To simulate other configurations of potentials the above assignments are simply replaced by the appropri-ate potential distribution (the rest of the program remains unchanged). For example a + (plus) shapearrangement can be achieved with the assignments:

V2(0,-8:+8)=-1.0

V2(-8:+8,0)=+1.0

----------------++----------------

----------------++----------------

----------------++----------------

----------------++----------------

----------------++----------------

----------------++----------------

----------------++----------------

----------------++----------------

++++++++++++++++++++++++++++++++++

----------------++----------------

----------------++----------------

----------------++----------------

----------------++----------------

----------------++----------------

----------------++----------------

----------------++----------------

----------------++----------------

Modify the dipole program for the following configurations, run your programs and check the outputs:1. A dipole with equally signed potentials2. A monopole3. A quadrupole4. A box potential

After you have investigated the above configurations try some others.

Task 2: Implementation of Euler simulations of a charging R-C circuit and a mass-on-a-springImplement the following Euler simulations given in the lecture into computer programs and compare theoutputs with the analytical results. If you have time compare the performance of the Simple Euler, Euler-Cromer and Improved Euler methods for these systems (the formulations can be tricky!).1. A charging R-C circuit governed by the 1st-order D.E. dq/dt = (Vo-V)/R, where R = 1000 Ohms, C =1 micro Farad, charging voltage Vo is 12 Volts. With the capacitor initially uncharged and a time step dt= 10−5 seconds evolve the system for 1 millisecond.Compare your result for the circuit voltage with the analytical solution: V(t) = Vo (1 - e−t/RC )2. A mass-on-a-spring governed by the 2nd-order D.E. x” = -kx/m, where m = 0.1 kg, k = 1 N/m. With an

64

Page 69: Gantep - EP 208 Books

initial displacement xo = 0.1 m and the body initially at rest, using a time step of 0.01 s evolve the systemfor 10 seconds.Compare your result for the displacement of the mass with the analytical solution: x(t) = xoCos(wt) wherew = sqrt(k/m).

65

Page 70: Gantep - EP 208 Books

6.4 Lab Solutions

Task 1: Investigation of the potential fields for various configurationsModify the dipole program for the following configurations, run your programs and check the outputs:1. A dipole with equally signed potentials2. A monopole3. A quadrupole4. A box potential

Solutions: (See the downloads page):

Sources: eee484ex6a1 eee484ex6a2 eee484ex6a3 eee484ex6a4

Outputs: eee484ex6a1.out eee484ex6a2.out eee484ex6a3.out eee484ex6a4.out

The potential definitions are:

A (+ +) dipole A monopole A quadrupole A box potential

V2(-6,0)=+1.0 V2(0,0)=+1.0 V2(-6,-6)=+1.0 V2(-8,-8:+8)=+1.0

V2(+6,0)=+1.0 V2(+6,+6)=+1.0 V2(+8,-8:+8)=+1.0

V2(-6,+6)=-1.0 V2(-8:+8,-8)=+1.0

V2(+6,-6)=-1.0 V2(-8:+8,+8)=+1.0

Task 2: Implementation of Euler simulations of a charging R-C circuit and a mass-on-a-spring1. A charging R-C circuit governed by the 1st-order D.E. dq/dt = (Vo-V)/R, where R = 1000 Ohms, C =1 micro Farad, charging voltage Vo is 12 Volts. With the capacitor initially uncharged and a time step dt =10−5 seconds evolve the system for 1 millisecond.Compare your result for the circuit voltage with the analytical solution V(t) = Vo (1 - e−t/RC)

Solution eee484ex6b1 (see the downloads page).For this Simple Euler simulation a 10 micro-second time step yields, initially, 0.5 percent error; in this casethe error reduces slightly as the system evolves (to 0.3 percent after 1ms). Reducing the time-step to 1micro-second reduces this initial error to 0.05 percent. Again, it can be shown that the error is proportionalto the size of the time-step. A much more accurate simulation is gained with the Improved Euler methodeee484ex6b1improved; here the initially error of 0.5 percent error is quickly reduced to near zero.2. A mass-on-a-spring governed by the 2nd-order D.E. x” = -kx/m, where m = 0.1 kg, k = 1 N/m. With aninitial displacement xo = 0.1 m and the body initially at rest, using a time step of 0.01 s evolve the systemfor 10 seconds.Compare your result for the displacement of the mass with the analytical solution x(t) = xoCos(wt) where w= SQRT(k/m).

Solution eee484ex6b2 (see the downloads page)With the Simple Euler simulation although the period of the system is well reproduced the amplitudeof the system increases (thereby not conserving energy); you can check this by comparing the simulateddisplacement with the expected theoretical displacement over a few periods. The Euler-Cromer method,eee484ex6b2cromer.f90, is much more accurate, this reproduces well both the period and amplitude of thesystem. Euler-Cromer is generally better for oscillating systems.

ConclusionFor the above simulations, and in general, we see that the Simple Euler method does not perform well. The

66

Page 71: Gantep - EP 208 Books

Improved Euler method can give a much greater accurate except in the case of oscillating systems wherethe Euler-Cromer method is the preferred choice. Errors are proportional to the size of the time-step, dt,i.e. the error in each simulation can be reduced by reducing the value of dt. However, reducing dt makesit necessary to perform more iterations increasing the run-time of the simulation, also round-off errors maybecome large (in this case double precision should be used).

Final noteIn practice higher-order methods such as the 4th-order Runge-Kutta method are often employed. See rk-shm.f90 in the downloads page.

67

Page 72: Gantep - EP 208 Books

6.5 Example exam questions

Question 1

Using the central-difference approximation to the second derivative

of a function F(x): CDA2 = ( F(x-h) - 2F(x) + F(x+h) ) / h^2

show that for a region of space that does not contain any

electric charge the following expression satisfies Laplace equation.

V(i,j,k) = [ V(i-1,j,k) + V(i+1,j,k) +

V(i,j-1,k) + V(i,j+1,k) +

V(i,j,k-1) + V(i,j,k+1) ] / 6

where the matrix V represents the potential at

discrete points i,j,k in a three-dimensional lattice.

Your answer should include an explanation of the mapping of

x, y, and z space onto i, j, and k points in the lattice.

Hint: d^2 V d^2 V d^2 V Laplace’s equation for

----- + ----- + ----- = 0 chargeless region of

dx^2 dy^2 dz^2 space

Question 2

Using the central-difference approximation to the second derivative

of a function f(x): CDA2 = ( F(x-dx) - 2F(x) + F(x+dx) ) / dx^2

and the forward-difference approximation to the first derivative

of a function f(t): FDA = ( F(t+dt) - F(t) ) / dt, show that the

finite-difference solution to the heat equation for a thin rod is

given by:

U(i,j+1) = r U(i-1,j) + (1-2r) U(i,j) + r U(i+1,j)

where the matrix U represents the temperature at

discrete points i,j in a two-dimensional lattice.

Your answer should include an explanation of the mapping of

x and t space onto i and j points in the lattice.

Hint: dU/dt = c d^2 U / dx^2

Heat equation for a one-dimensional conductor.

Question 3

a) Write down the formulae representing the following methods for the

numerical evolution of the displacement, y(t), of a body governed

by a second-order differential equation y’’(t) = -g

i. Simple Euler method

ii. Improved Euler method

iii. Euler-Cromer method

68

Page 73: Gantep - EP 208 Books

b) Show that the Improved Euler method yields an exact result.

c) Write a computer program implementing the Improved Euler

method for the above system.

Question 4

a) A simple R-C circuit is governed by the 1st-order D.E:

i = dq/dt , where i is the current in the circuit: i = V/R

and V is the p.d. across the capacitor V = q/C.

Write down Euler steps representing the time-evolution of the potential difference

across the capacitor. Implement your formulae in a computer program.

b) A simple R-C circuit is governed by the 1st-order D.E.:

i = dq/dt , where i is the current in the circuit: i = (V0-V)/R

V0 is the charging voltage, and V is the potential

difference across the capacitor V = q/C.

Write down Euler steps representing the time-evolution of the potential difference

across the capacitor. Implement your formulae in a computer program.

c) The restoring force in a mass-on-a-spring system is given by

F = -k.x, the motion of the body is governed by the 2nd-order D.E.:

x’’ = -k.x/m where k is the spring constant (N/m), x is the displacement, m is

the inertial mass. Two 1st-order D.E.s can be formed: dv/dt = -k.x/m and dx/dt = v

Write down Euler steps representing the time-evolution of the

displacement of the mass. Implement your formulae in a computer program.

d) (In this question you are not given the differential equations describing the

system, so you need to build them yourself). Write down Euler steps representing

the time-evolution of the voltage V recorded by the voltmeter in the system shown

below. Implement your formulae in a computer program.

The diagram is | | C = 1 micro Farad

an R-C circuit +------| |-------+

| | | | R = 1000 Ohms

| C |

| | At t = 0 V = 12 volts

| +------+ |

+----| R |----+

| +------+ |

| | (V) is a voltmeter, you

| | can assume it has infinite

+------(V)-------+ internal resistance.

69

Page 74: Gantep - EP 208 Books

7 Random Variables and Frequency Experiments

7.1 Topics Covered

o Review of Probability and Random Variables: the student should be familiar with the topics taught inEEE 283 (operations on pdfs and pmfs, probabilities and conditional probabilities);o Generation of pseudo-random numbers: the student should know how to generate random numbers (e.g.by using the rand() function) and write computer programs to perform frequency experiments.o Transformation of a uniform pdf to a non-uniform pdf: given a uniform pdf fX(x)=1 with 0<x<1 andthe transformation function y=T(x) the student should be able to determine and sketch the resultantnon-uniform pdf fY (y); the student should also be familiar with the ”rejection method” for generating anon-uniform pdf from a uniform pdf.

7.2 Lecture Notes

IntroductionProbability, random variables and random processes are important topics in science and engineering. Thesetopics are covered in the course EEE 283 (check out my web site for that course). Key to this subject are theideas of the random variable, probability density functions (pdf’s) and probability mass functions (pmf’s)and operations on them. This is treated theoretically in EEE 283. However, all the results given in thatcourse can be reproduced experimentally by taking a frequency interpretation of probability.[A brief summary of Probability and Random Variables is given in class].For example, consider the tossing of a coin; we know that the probability of the outcome being ”heads” is0.5, in probability theory this is written P(”heads”) = 0.5. The frequency interpretation is P(”heads”) =nheads/n in the limit n goes to infinity, where n is the number of tosses of the coin (the number of trials)and nheads is the number of outcomes that give ”heads”, i.e. we perform an experiment where the coin istossed an infinite times and we count the number of times the coin comes up ”heads”. In reality a goodapproximation to the probability can be obtained with a large finite value of n.Such frequency experiments allow us to verify that a theoretical result is true (a great help when writingexam questions for EEE 283!) and find results for cases where the theoretical calculations are difficult toevaluate (i.e. many problems in the real world).To perform a frequency experiment one needs to generate a large number of trials. Often we have to do thisby hand (e.g. to test the effect of a new drug a clinical trial is perform where a large number of patientsis given the drug while another large number of patients is not, the outcomes are analysed statistically).However if we know the underlying probabilities that govern a system (e.g. P(”heads”) = 0.5) then we cansimulate an experiment using a computer. For this we need to be able to generate a probability densityfunction (pdf), i.e. lists of numbers X = (x1, x2, x3, ..., xn) that are randomly distributed according to somefunction fX(x). The most basic pdf is fX(x)=1 with 0<x<1, i.e. a unform distribution.

Generating Random NumbersIn Fortran 90 the intrinsic subroutine ”random number” provides the programmer with lists of randomnumbers uniformly distributed in the range 0<r<1. In the following example, array r is filled with randomnumbers:

Algorithm 7a1 (Fortran syntax)

real :: r(8)

call random_number(r)

print *, r

70

Page 75: Gantep - EP 208 Books

Example result:

0.983900 0.699951 0.275312 0.661102 0.809842 0.910005 0.304463 0.484259

The equivalent in C++, using the intrinsic ”rand()” function, is:

Algorithm 7a2 (c++ syntax)

for (int i=1; i<=8; i++) {

double r = rand()/(double(RAND_MAX)+1);

cout << r << " ";

}

cout << endl;

Example result:

0.840188 0.394383 0.783099 0.798440 0.911647 0.197551 0.335223 0.768230

These programs output 8 pseudo-random numbers. Random number generators create a sequence of pseudo-random numbers usually distributed uniformly between 0 and 1. The numbers are not truly random, theyare created by an deterministic algorithm hence the term pseudo-random.There are various algorithms for producing large sequences of random numbers with varying qualities. Thequality of a random number generator relates to four main properties:1. The apparent randomness off the sequence.2. The size of the period of the sequence, i.e. how many numbers are generated before the sequence repeats;this varies from 109 in a minimal standard generator to 1043 or more in high quality generators.3. The uniformity of the distribution of random numbers; is the distribution flat? does it have gaps?4. The distribution should pass some statistical/spectral tests.

|_________________|

| | Uniform (flat) distribution

| | of a set of random numbers

+-----------------+- R

0 1

|________-___-____|

| | Distribution of a set of random

| | numbers with some non-uniformity

+-----------------+- R

0 1

|________ ___ ____|

| | Distribution of a set of random

| | numbers with gaps.

+-----------------+- R

0 1

71

Page 76: Gantep - EP 208 Books

A popular primitive algorithm is the multiplicative linear congruential generator first used in 1948; withcarefully chosen constants this generator provides a good basic generator:Ri+1 = ( a Ri + b ) MOD m, where MOD means modulo.Constants a, b and m are chosen carefully such that the sequence of numbers becomes chaotic and evenlydistributed. Park and Miller proposed a minimal standard with which more complex generators can becompared; the constants are taken as:a = 75 = 16807, b = 0, and m = 231 - 1 = 2147483647.The range of values is 1 to m (divide by m to convert to 0<r<1). The period of this generator is m-1, about2 billion.In a computer implementation of this algorithm using 32 bit integers is not straight forward as a times Rcan be out of the integer range; we have to apply a trick (approximate factorisation of m). The algorithmis implemented below in Fortran 90 (see also the downloads page on the course website for the Fortran77 and C and C++ versions of this program). The algorithm is in the form of a function ran() to whicha seed is passed. The function returns a random number, the seed returns modified so that the next callof the function returns the next random number in the sequence. Before the first call to the function theseed needs to be initialised with any value 1 to 2147483647 (not zero). Different initial seed values result indifferent sequences of random numbers.

ran.f90

integer :: i, iseed

real :: r

iseed=314159265 ! Initialise the seed

do i=1,10

r=ran(iseed) ! ran() is a function that

print *, r ! returns a random number.

end do

contains ! ran() is defined as an internal function:

real function ran(iseed)

!--------------------------------------------------------------

! Returns a uniform random deviate between 0.0 and 1.0.

! Based on: Park and Miller’s "Minimal Standard" random number

! generator (Comm. ACM, 31, 1192, 1988)

!--------------------------------------------------------------

implicit none

integer, intent(inout) :: iseed

integer, parameter :: IM=2147483647, IA=16807, IQ=127773, IR= 2836

real, parameter :: AM=128.0/IM

integer :: K

K = iseed/IQ

iseed = IA*(iseed-K*IQ) - IR*K

IF (ISEED < 0) iseed = iseed+IM

ran = AM*(iseed/128)

end function ran

end

For the given initial seed 314159265 the program gives the following sequence of values:

0.7264141 0.8427828 0.6508798 0.3372238 0.7214876

0.0418309 0.0521500 0.4857842 0.5747718 0.1894919

72

Page 77: Gantep - EP 208 Books

As mentioned above, the period of this algorithm is m-1 = 2147483647. This is actual not large, for examplemy 2.4 GHz cpu takes only 23 seconds to generate the complete sequence of random numbers! Compare thisto, for example, simulations of high energy particle reactions where farms of computers generate datasetsover days, then the period of this generator is clearly not sufficient. Improved algorithms are availableproviding uniform distributions of random numbers with periods of 1012, 1018, 1043 and even 10171; thesealgorithms, however, are much more complex.

Frequency ExperimentsWe now have a method for generating large numbers of (pseudo) random numbers ”call random number(r)”in Fortran 90, and ”double r = rand()/(double(RAND MAX)+1);” in C++; and we can return to thefrequency experiments. Consider again the tossing of a coin. We can create an experiment by generating alarge number of random values (uniformly distributed between 0 and 1), calling any value less than 0.5 a”head”, and count the number of times this occurs. This is illustrated in the algorithm below where a coinis tossed one million times (n=1000000) and outputs the fraction nheads/n.

Algorithm 7b (Fortran syntax [mostly])The second (concise) form of the algorithm makes use of Fortran whole-array processing and intrinsics.

n = 1000000 real :: r(1000000)

nheads=0 call random_number(r)

do i = 1, n print *, count(r<0.5)/1000000.

call random_number(r) end

if (r<0.5) nheads=nheads+1

end do

output nheads/n

Example output: 0.499687The result is close to, but not exactly, the expected value of 0.5 because the process of generating outcomesis random. Repeating the above experiment with different sample sizes, n, gives the following results:

n nheads/n

100 0.47

1000 0.499

10000 0.5030

100000 0.50067

1000000 0.499687

Note that the difference between the value of nheads/n and 0.5 gets smaller as n increases, i.e. the experi-ment becomes more accurate as the statistics increases.

Operations on Random VariablesContinuing the subject of random variables, a important topic is that of operations on random variables.Basic operations include the calculation of the expectation value and variance of a probability density func-tion. The expectation value E[X] is the first moment about the origin (denoted by m1); it can be viewedas the center of mass or arithmetic mean of a distribution and is defined as E[X] = m1 = the integral ofthe product ”x f(x)”. The variance is the second moment about the mean (denoted by mu2) it representsa measure of the size of the spread of the distribution about the mean m1, and is defined as E[(X-m1)

2]= mu2 = the integral of the product ”(x-m1)

2 f(x)”. The variance can also be equated by simple algebraicarguments as mu2 = m2 - m1

2 where m2 is the second moment about the origin defined as E[X2] = m2 = theintegral of the product ”x2 f(x)”. We will now calculate the expectation value and variance for the uniformpdf both theoretically and via a frequency experiment as follows.

73

Page 78: Gantep - EP 208 Books

Theory:We have the random variable X with a pdf f(x)=1 in the range 0<x<1. E[X] = m1 = the integral of theproduct ”x f(x)” = 1/2 (see class notes for the integral), and the variance E[(X-1/2)2] = mu2 = the integralof the product ”(x-1/2)2 f(x)” = 1/12 (see class notes for the integral). Alternatively mu2 = m2-m1

2; withm2 = the integral of the product ”x2 f(x)” = 1/3, mu2 = 1/3 - (1/2)2 = 1/12.Experiment:We now generate n random variables X = (x1, x2, x3, ...., xn) from a set of uniformly distributed values0<x<1, and calculate experimentally the expectation value and the variance. For this we can use directlythe random number generator intrinsic to the Fortran or C++ compiler. In this frequency experiment thecalculation of the expectation value E[X] = m1 becomes the sum of the values of x normalised to the numberof values; i.e. m1 = (x1+x2+x3+...+xn)/n which is simply the arithmetic mean. For the variance mu2, it isconvenient to use the equality mu2 = m2-m1

2 which requires m2 = (x12+x2

2+x32+...+xn

2)/n.

Algorithm 7c (Fortran syntax [mostly])The second (concise) form of the algorithm makes use of Fortran whole-array processing and intrinsics.

n = 1000000 integer, parameter :: n = 1000000

m1=0, m2=0 real(kind=8) :: x(1000000), m1, m2

do i = 1, n call random_number(x)

call random_number(x) m1 = sum(x)/n

m1 = m1 + x m2 = sum(x**2)/n

m2 = m2 + x^2 print *, " mean, m1 = ", m1

end do print *, "variance, mu2 = ", m2 - m1**2

m1 = m1/n end

m2 = m2/n

output " mean, m1 = ", m1

output "variance, mu2 = ", m2-m1^2

Example output (for n = 1000,000):

mean, m1 = 0.5000423 [1/1.9998]

variance, mu2 = 0.0832316 [1/12.015]

Increasing the number of trials to n = 1000,000,000 (the program take less than 1 minute to run!) gives:

mean, m1 = 0.500005976 [1/1.99998]

variance, mu2 = 0.083332809 [1/12.00008]

While we do not obtain the exact values, it is clear that the expectation value and variance tends towardthe theoretical values for large n.The above demonstration can be repeated for the triangular pdf: f(x) = 2x with 0<x<1. Here, E[X] = m1

= the integral of the product ”x 2x” = 2/3 (see class notes for the integral), and the variance E[(X-2/3)2] =mu2 = the integral of the product ”(x-2/3)2 2x” = 1/18 (see class notes for the integral). Alternatively mu2

= m2-m12; with m2 = the integral of the product ”x2 2x” = 1/2, mu2 = 1/2 - (2/3)2 = 1/18.

For the experiment, Algorithm 7c only needs to be modified such that random numbers are distributed inthe form of a triangular pdf. This is obtained with transformation x=sqrt(x); i.e. replace ”call randomnumber(x)” with ”call random number(x); x=sqrt(x)” [the transformation of distributions will be studiedlater in this topic]. The result for n = 1000,000,000 is:

mean, m1 = 0.6666715428590047 [2/2.999978]

variance, mu2 = 0.055555029965474394 [1/18.013]

Again the experimental results are in agreement with the theoretical results.

74

Page 79: Gantep - EP 208 Books

Total Probability and Conditional ProbabilityTotal probabilities are obtained by integrating the pdf. The probability of obtaining a value between a andb is defined as P(a<X<b) = integral of f(x) over the limits a to b. For example for the triangular pdf : f(x)= 2x with 0<x<1, the probability of obtaining a value between 0.5 and 0.9 = P(0.5<x<0.9) = the integralof 2x over the limits 0.5 to 0.9 = 0.56.Experimentally, the probability is obtained by simply counting the number of values that appear within thegiven range. This is illustrated below:

Algorithm 7d1 (Fortran syntax [mostly])

n = 1000000 integer, parameter :: n = 1000000

m = 0 real :: x(n), m

do i = 1, n call random_number(x); x=sqrt(x)

call random_number(x); x=sqrt(x) m = count(x>0.5 .and. x<0.9)/real(n)

if (x>0.5 .and. x<0.9) m = m+1 print *, "P{0.5$<$X$<$0.9} = ", m

end do

output "P{0.5$<$X$<$0.9} = " m/n

(remember that the statement ”x=sqrt(x)” transforms the uniform pdf to a triangular pdf)The result is shown below for various values of n.

n = 1000 P{0.5<X<0.9} = 0.574

n = 1000000 P{0.5<X<0.9} = 0.560993

n = 1000000000 P{0.5<X<0.9} = 0.560003037

For a large number of trials the the probability tends toward the theoretical result.From probability theory, the conditional probability P(A!B) = P(A intersect B)/P(B) where P(A!B) reads”the probability of A given that B has occurred”. For example P(0.5<X<0.9!X>0.6) = P(0.5<X<0.9 in-tersect X>0.6) / P(X>0.6) = P(0.6<X<0.9) / P(X>0.6) = 0.45/0.64 = 0.703125. See the lecture for thefull calculations.Experimentally, the probability is obtained by simply counting the number of values that appear within thegiven range after first requiring that x>0.6. This is illustrated below:

Algorithm 7d2 (Fortran syntax [mostly])

do

call random_number(x); x=sqrt(x) ! a triangular pdf

if ( x<0.6 ) cycle ! condition X>0.6 has occurred

n = n+1 ! increase the trial count

if ( x>0.5 .and. x<0.9 ) m = m+1 ! condition 0.5$<$X$<$0.9

if ( n==1000000 ) exit ! exit when there are enough trials

end do

output "P{0.5$<$X$<$0.9|X>0.6} = ", m/n

Here, ”cycle” means return the the top of the loop, and ”exit” means drop out of the loop.The result is shown below for various values of n.

n = 1000 P{0.5$<$X$<$0.9|X>0.6} = 0.691

n = 1000000 P{0.5$<$X$<$0.9|X>0.6} = 0.703951

n = 1000000000 P{0.5$<$X$<$0.9|X>0.6} = 0.703116218

As the number of trials increases the result moves closer to the theoretical value obtained from the ruleP(A!B) = P(A intersect B)/P(B).

75

Page 80: Gantep - EP 208 Books

The Binomial probability mass functionIf the probability of success for a single trial is p then the probability of k successes out of n trials is given bythe Binomial pmf f(k) = [n k] pk(1-p)n−k where the binomial coefficient is [n k] = n! / k!(n-k)!. A programto generate this pmf can be found on the download page (binomial-pmf.f90).For example if p=0.25 and n=6, the pmf is:

k P{k} Experimental

0 0.177979 0.177988

1 0.355957 0.356018

2 0.296631 0.296520

3 0.131836 0.131854

4 0.032959 0.032981

5 0.004395 0.004395

6 0.000244 0.000244

The column indicated by ”experimental” are obtained with the following algorithm.

Algorithm 7d3 (Fortran syntax [mostly])In this algorithm m is an integer vector with 7 elements indexed 0 to 6, and x is a real vector with 6 elements.

n = 100000000

real x(6); integer m(0:6) = 0

do i = 1, n

call random_number(x) ! generate 6 random values

k = count(x<0.25) ! count how many are < 0.25

m(k) = m(k)+1

end do

output "P{k} = " m/n

Note that the output statement contains 7 values from the vector array m (see the above table).For a large number of trials the the probability tends toward the theoretical result. This experimental resulthas two consequences. First, it demonstrates the correctness of the expression for the binomial distribution,and second it demonstrates a computational method for simulating stochastic processes.In the next topic we will look in more detail at performing computational simulations of systems that involvestochastic processes; this field of study is called Monte Carlo Simulation. Before we can do this we need toknow how to generate pdf’s of any required form.

Generation of Non-uniform Random DistributionsIn simulations of random processes we often require a non-uniform distribution of random numbers. Forexample, radioactive decay is characterised by an exponential pdf: fX(x) = a e−ax with x≥0. There are twouseful methods for generating such non-uniform distributions:1. The transformation method.2. The rejection methodThe aim of both these methods is to convert a uniform distribution of random numbers of the form fX(x)=1with 0<x<1, into a non-uniform distribution of the form fY (y) with a<y<b; this is illustrated below.

fx|

|_________________|

| | Uniform (flat) distribution

| | of a set of random numbers.

+-----------------+- x

0 1

76

Page 81: Gantep - EP 208 Books

fy| _____

| / \ Non-uniform distribution

| / \______ of a set of random numbers.

| / \ The shape of the pdf is arbitrary.

+-+-----------------+- y

a b

1. The Transformation MethodConsider a collection of variables X = (x1,x2,x3,...) that are distributed according to the pdf fX(x), thenthe probability of finding a value that lies between x and x+dx is fX(x) dx. If y is some function of x thenwe can write:

|fx(x)dx| = |fy(y)dy|

where fY (y) is the pdf that describes the collection Y = (y1, y2, y3, ....).Now let fX(x)=1 with 0<x<1, e.g the uniformly distributed random numbers that we generate via the”random number()” intrinsic subroutine in Fortran; then we can write

dx = |fy(y)dy| and so fy(y) = |dx/dy|

fx|

1|_________________|

| | fx(x)=1 and so fy(y) = |dx/dy|

| |

+-----------------+- x

0 1

And so in order to obtain a sequence characterised by the distribution fY (y) we must find a transformationfunction y = T(x) that satisfies:

|dx/dy| = fy(y)

Example 1:Consider that we want the exponential distribution fY (y) = a e−ay,the transformation function is then y = T(x) = -ln(x)/a.Proof: y = -ln(x))/a and so x = e−ay and so abs(dx/dy) = abs(- a e−ay) = a e−ay = fY (y)Example 2:Consider that we want the distribution fY (y) = 2y

fx| fy|

1|________ 2| /

| | /

| |/

+-------+ x +---+- y

0 1 0 1

the transformation function is then y = T(x) = x1/2

Proof: y = x1/2 and so x = y2 and so dx/dy = 2y = fY (y)A quick check for the correctness of the transformation is that the integral of the two distributions is equal:integral [fX(x) dx] = 1 times 1 = 1; integral [fY (y) dy] = 0.5 times 1 times 2 = 1 i.e. the integrals of the twodistributions are both 1 (remember that any pdf must always integrate to unity i.e. the total probability isone).

77

Page 82: Gantep - EP 208 Books

Example 3:We wish to obtain the pdf fY (y) = 0.5 Sin(y) with 0<y<π.The transformation function is y = T(x) = Cos−1(1-2x).Proof: y = Cos−1(1-2x) and so x = 0.5 (1-Cos(y)) and dx/dy = 0.5 Sin(y) = fY (y). The range of y isT(0)<y<T(1) = 0<y<π as required. One check for correctness is that the integral should be one: integralof 0.5 Sin(y) over the limits 0<y<π = 1

Algorithm 7e (Fortran syntax)The following algorithm implements the transformation given in Example 2.

real :: x(8000), y(8000)

call random_number(x); y = sqrt(x)

Array x is filled with random numbers from a uniform distribution in the range 0<x<1 Array y is assigned thetransformation of these numbers giving a distribution fY (y) = 2y with the range sqrt(0)=0<y<sqrt(1)=1.We can use the ”minval” and ”maxval” intrinsic functions to inspect the ranges:

print *, minval(x), maxval(x), minval(y), maxval(y)

gives:

0.0000973344 0.99984 0.0098658195 0.99992

The two distributions are illustrated in the histograms below. First the uniform distribution of randomnumbers from fX(x); here, 8000 random numbers have been generated and placed into a 20-bin histogram(the histogram is turned on its side).

1

0+-------------------+---> fx(x)

|#####################

|#################### The range of values is from 0 to 1.

|#################### Each ’#’ symbol represents 20 numbers.

|################### The average number of entries per bin

|#################### is 8000 values / 20 bins = 400 values

|###################### => 400/20 = 20 ’#’s.

|###################

|#################### Given that the values are created

|##################### randomly, statistical theory tells us

|################### that we expect a variation from

|#################### bin-to-bin of: sigma = sqrt(n)

|##################### = sqrt(400) = 20 = 1 ’#’

|####################

|#################### i.e. we expect a variation of 1 or 2 ’#’

|#################### as is seen in this histogram.

|######################

|###################

|###################

|#####################

|###################

1+

|

x

78

Page 83: Gantep - EP 208 Books

Next, each value x is transformed to y = T(x) = x1/2, the resultant distribution, fY (y), is shown in thehistogram below.

2

0+------------------------------------+----> fy(y)

|#

|###

|#####

|########

|#########

|############ The distribution is the required

|############# fy(y)=2y in the range 0 < y < 1.

|###############

|################ The number of entries is 8000

|################### (as each y value corresponds to

|####################### an x value).

|######################

|######################### The function is not exact due

|########################## to statistical variations in

|############################# the distribution fx(x).

|################################

|################################

|#####################################

|###################################

|#######################################

1+

|

y

2. The Rejection methodThe transformation method is useful when the transformation function y=T(x) can be derived easily. Therejection method provides an alternative to the transformation method, it has the advantage of being ableto create any required distribution. In this method a sequence of random numbers X = (x1, x2, x3, ...., xn)is generated with a uniform distribution in the range of interest, ymin to ymax. Now suppose that our goalis to produce a sequence of numbers distributed according to the function fY (y):

fy(y)

|

|_____________________________________ fmax

| / \

| _________/ \

| / \

| ____/ \

| / \__

| / \

| / \

+--+-------------------------------+--- y

y_min y_max

We proceed through the sequence (x1, x2, x3, ..., xn) and accept values with a probability proportional to

79

Page 84: Gantep - EP 208 Books

fy(x). This is achieved as follows: for each value of x a new random number, ptest, distributed uniformlyin the range 0<ptest<fmax, is generated. If fY (x) is greater than ptest then number x is kept (added to setY) otherwise it is removed (rejected) from the sequence. The probability of number xi passing the test isproportional to fY (xi). The resultant set Y = (y1, y2, y3, ...., ym) [m≤n] is therefore distributed accordingto the function fY (y).

Algorithm 7f (Fortran syntax)Consider that we want a distribution fY (y) = 0.5 + Sin2(y) in the range π<y<3π.fmax is therefore 0.5 + 1.02 = 1.5, and so ptest is generated from 0 to 1.5.

integer :: i, m=0, n=8000

real :: x(n), y(n), Pi=3.141593, fmax=1.5, Ptest

call random_number(x)

x = pi+2*pi*x ! x is random and uniform in the range pi < x < 3pi

do i = 1, n

call random_number(Ptest)

Ptest = Ptest*fmax ! range 0 < Ptest < fmax

if (0.5+sin(x(i))**2 > Ptest) then

m = m+1 ! entry x(i) passed the test

y(m) = x(i) ! so we record it in y(m)

end if

end do

Result:

Before Rejection

0

pi+----------------------> fx(x)

|#####################

|###################

|####################

|####################

|###################

|####################

|################### The original sequence contains

|#################### 8000 entries distributed uniformly

|##################### in the range pi < x < 3pi

|###################

|###################

|####################

|#####################

|####################

|#####################

|#####################

|####################

|###################

|#####################

|###################

3pi+

x

80

Page 85: Gantep - EP 208 Books

After Rejection

pi+---------------------> fy(y)

|#######

|#########

|############# The distribution of sequence y

|################# (after rejection of some entries)

|################### shows a sine-squared function of

|#################### amplitude 1.0 on a base of 0.5.

|################

|############# The number of entries is 5321

|########## which is approximately equal to:

|#######

|######## integral fy(y) [y=pi,3pi]

|########## 8000 ---------------------------

|############## fmax * (3pi-pi)

|#################

|#################### = 8000 2pi / (1.5*2pi) = 5333

|####################

|################# The number of rejected entries is

|############ 8000-5321 = 2679

|##########

|######

3pi+

y

Discussion:The rejection method is less efficient than the transform method as it requires two random numbers to begenerated for each entry, and some numbers are wasted (rejected). Implementation is also more difficult.However, the advantage is that any distribution can be generated, whereas the transformation method islimited to distributions where the transformation function can be calculated.

Monte CarloComputer simulations of systems that involve some random process are called Monte Carlo simulations.In such simulations random numbers are generated with the appropriate distribution corresponding to thephysical random process. There are a vast array of Monte Carlo applications in science and engineering.We will look at some basic Monte Carlo simulations next week.

81

Page 86: Gantep - EP 208 Books

7.3 Lab Exercises

1. Implement Algorithm 7b into a computer program, compile and run the program. Check that P(”heads”)= nheads/n tends to 0.5 for large n. Repeat this experiment replacing the coin with two dice. Determinethe probabilty that a double six is thrown. Compare this with the theoretically expected outcome.2. Implement Algorithm 7c (with the transformation to a triangular pdf) into a computer program, compileand run the program. Check that m1 = 2/3 and mu2 = 1/18. Modify the experiment to verify the identityE[20X+30] = 20 E[X] + 30.3. Modify the above program to compute the expectation value and variance of the pdf fX(x) = 0.5 Sin(x)for 0<x<π.Compare you results with the theoretical solutions. Hint: the transformation function can be found in thelecture notes.4. A nucleus of an atom has a probability of decaying that is described by the pdf fT (t) = 0.1 exp(-0.1t)where t≥0 is the time in seconds. Write a program to perform a stochastic experiment to determine a) theprobability that the nucleus survives 20 seconds, b) assuming the nucleus has already survived 10 secondswhat is the probability that the nucleus survives 20 seconds? Compare you results with the theoreticalsolutions.Hint: Algorithm 7d1 will be helpful for part a, and Algorithm 7d2 for part b; the required transformationfunction can be found in the lecture notes.

82

Page 87: Gantep - EP 208 Books

7.4 Lab Solutions

1. Implement Algorithm 7b into a computer program, compile and run the program. Check that P(”heads”)= nheads/n tends to 0.5 for large n. Repeat this experiment replacing the coin with two dice. Determine theprobability that a double six is thrown. Compare this with the theoretically expected outcome.

Solution: eee484ex7a (see the downloads page).This program counts the number of times two random numbers are both < 1/6 repeating this for 100,000,000trials. The result is 2780456/100000 = 0.02780456 = 1/35.965; the theoretical expectation is 1/6*1/6 =1/36.2. Implement Algorithm 7c (with the transformation to a triangular pdf) into a computer program, compileand run the program. Check that m1 = 2/3 and mu2 = 1/18. Modify the experiment to verify the identityE[20X+30] = 20 E[X] + 30.

Solution: eee484ex7b (see the downloads page).The result for [n=100000000 and call random number(x); x=sqrt(x)] is

E[X] = 0.6666696399773584

To find E[20X+30] for this pdf we simply transform each generated value of x as follows: x = 20 x + 30.The result for [n=100000000 and call random number(x); x=sqrt(x); x = 20*x + 30] is

E[20X+30] = 43.33339279955304

Note that 20 E[X] + 30 = 20 * 0.6666696399773584 + 30 = 43.33339279954717, and so E[20X+30] = 20E[X] + 30 is shown experimentally.If you are not convinced then you can repeat the experiment with different values and different pdfs.3. Modify the above program to compute the expectation value and variance of the pdf fX(x) = 0.5 Sin(x)for 0<x<π.Compare you results with the theoretical solutions.Hint: the transformation function can be found in the lecture notes.

Solution: eee484ex7c (see the downloads page).From the lecture notes, the transformation function is y = T(x) = Cos−1(1-2x); i.e. to generate the pdffX(x) = 0.5 Sin(x) for 0 < x < π the Fortran code is:

call random_number(x) ! x is uniform {0 < x < 1}

x = acos(1-2*x) ! x = 0.5 Sin(y) {0 < x < pi}

The output of the program is:

mean, m1 = 1.5707902846841562 = pi/2 - 0.000006

variance, mu2 = 0.4673394386775112 = pi/4-2 - 0.00006

From theory: m1=pi/2, m2=pi2/2 - 2, and so mu2 = pi2/2 - 2 - (pi/2)2 = pi2/4 - 2 = 0.4674011002723394;the experimental results are in agreement with the theory.4. A nucleus of an atom has a probability of decaying that is described by the pdf fT (t) = 0.1 exp(-0.1t)where t≥0 is the time in seconds. Write a program to perform a stochastic experiment to determine a) theprobability that the nucleus survives 20 seconds, b) assuming the nucleus has already survived 10 secondswhat is the probability that the nucleus survives 20 seconds? Compare you results with the theoretical solu-tions.Hint: Algorithm 7d1 will be helpful for part a, and Algorithm 7d2 for part b; the required transformationfunction can be found in the lecture notes.

83

Page 88: Gantep - EP 208 Books

Solution: eee484ex7d1, eee484ex7d2 (see the downloads page).From the lecture notes, the transformation is x=-log(x)/0.1.Part a asks for P(X>20 seconds), the theoretical result is 1/e2, the output of the program for n = 100000000is P(X>20) = 0.13533197 = 1/e2 - 0.000003Part b asks for P(X>20!X>10), the theoretical result is 1/e, the output of the program for n = 100000000is P(X>20!X>10) = 0.36785103 = 1/e - 0.00003The experimental results are in agreement with the theoretical.

84

Page 89: Gantep - EP 208 Books

7.5 Example exam questions

Question 1

For the following transformation functions transforming a uniform pdf

fx(x) in the range 0 < x < 1 into non-uniform pdf fy(y):

a) y=SQRT(3+x), b) y=ArcCosine(1-2x), c) y=1/(x+0.5)

i. Determine the transformed probability density functions fy(y).

ii. Write down the range of y values.

iii. Sketch the distribution fy(y).

iv. Show that the integral of the transformed pdf is equal

to the integral of the original uniform pdf.

Question 2

Write a computer program that performs a frequency experiment

to determine the probability that the outcome of throwing a

coin and die is (a "head" and a "3").

85

Page 90: Gantep - EP 208 Books

8 Monte-Carlo Methods

8.1 Topics Covered

o Monte Carlo integration; the student should be able to write a computer program to integrate a functionf(x) using the Monte Carlo method.o Monte Carlo simulation; the student should be able to write a simple Monte Carlo simulation to solve aproblem that involves some underlying random process.

8.2 Lecture Notes

IntroductionArmed with methods that allow us to generate any pdf we can now attempt to simulate less trivial physicalprocesses. Such simulations are call Monte Carlo Simulation. But first, we will use the Monte Carlo methodas an alternative method for integration (Monte-Carlo Integration).

Monte Carlo IntegrationIn Monte Carlo integration numerical integration of a function is performed by making use of randomnumbers. We will see that for simple one-dimensional functions MC integration is not as effective as othernumerical integration methods (though for high-dimensional integrals the MC method can be more efficient).To introduce Monte Carlo integration we will compute the area of a circle and hence a value for π as follows:generate n pairs of random numbers, x and y, each uniformly distributed between 0 and 1; count the numberof pairs m which satisfy the condition x2 + y2

< 1 (i.e. they lie inside a circle of unit radius); then the ration/m = 1/(π/4) and so π = 4m/n .

Algorithm 8a (Fortran syntax [mostly])

m=0, n=10000000

do i = 1, n

call random_number(x)

call random_number(y)

if ( x**2+y**2 < 1.0 ) m = m+1

end do

output n, 4*m/real(n)

Results for different values of n are tabulated below, estimates for π are given in column 2 and the errors incolumn 3.

n 4*m/n Error = 4*m/n -pi

1,000 3.156 0.014407259

10,000 3.1308 -0.010792741

100,000 3.14768 0.006087259

1,000,000 3.143304 0.001711259

10,000,000 3.1420207 0.000428059

100,000,000 3.1417096 0.00011685899

1,000,000,000 3.1415832 -0.000009637012

The method works, but clearly it is not computationally efficient as it requires one billion random numbersto achieve only a 4 or 5 decimal place accuracy. As n increases the accuracy of the computed value of π

improves (except for some statistical variations). It can be shown, by repeating the experiment many times(with different seeds), that the error is generally proportional to 1/n1/2.

86

Page 91: Gantep - EP 208 Books

We will now look at a simple MC method for the integral I of a function f(x), the method is as follows:1. Enclose the function in a box area A and determine ymax.

y=f(x)

|

| ______________________________ y_max

| | / \ |

| | A _________/ \ |

| | / \ |

| | ____/ \ |

| | / I \_ |

| |/ \|

0 +--+----------------------------+--- x

a b

2. Uniformly populate the box with n random points.:generate two random numbers r1, r2, a random point in the box is then x = a + (b-a) r1 and y = ymax r2

3. Count the number of points m that lie below the curve f(x).4. The integral is then estimated from: I/A = m/n and so I = A (m/n) and so I = (b-a)ymax m/n.Example: in a previous lecture ”Numerical Integration” we use the Extended Trapezoidal Formula to in-tegrate the function f(x) = x3 - 3x2 + 5 over the range x = 0.0 to 2.5. With n=1000 (1000 intervals) theresult is 6.640627 (error = 0.000002). The MC integration is as follows:For this function, in the integral range, we have turning points at x=0.0 (maximum) and x=2 (minimum)and so ymax = f(0.0) = 5. So we have a = 0, b = 2.5, ymax = 5.0.

Algorithm 8b (Fortran syntax [mostly])The second program is a complete concise Fortran source using whole-arrays.

m = 0

input a, b, Ymax, n integer, parameter :: n=1e8

do i = 1, n integer :: m

call random_number(r1) real :: a=0.0, b=2.5, Ymax=5.0

call random_number(r2) real :: x(n), y(n)

x = a + (b-a)*r1 call random_number(x); x = a + (b-a)*x

y = Ymax*r2 call random_number(y); y = Ymax*y

if ( y < f(x) ) m = m+1 m = count(y < x**3-3*x**2+5)

end do print *, (b-a)*Ymax*m/real(n)

output (b-a)*Ymax*m/real(n) end

define function F(x) = x**3 - 3*x**2 + 5

The result for n=1000 is 6.450 (error = -0.190625). Repeating for increasing values of n gives:

n m integral Error

1,000 516 6.450000 -0.190625

10,000 5320 6.650000 0.009375

100,000 53194 6.649250 0.008625

1,000,000 530692 6.633650 -0.006975

10,000,000 5309750 6.637187 -0.003437

100,000,000 53121655 6.640207 -0.000418

1,000,000,000 531243958 6.640550 -0.000076

87

Page 92: Gantep - EP 208 Books

Again the error reduces with n1/2 (you have to perform many experiments, with different seeds, to see thismore clearly).The accuracy of this simple Monte Carlo integration method is not good when compared to other numericalmethods such as Trapezoidal or Simpson integration, however, the MC integration method can be refinedto give improved accuracy (see text books for the details). Moreover, for high-dimensional integrationsthe MC method can be more efficient than other integration methods. MC integration is also useful fordiscontinuous shapes, for example a torus with a slice cut out of it; the shape is expressed more easily in aMC program than in Trapezoidal or Simpson integration algorithms.

Monte Carlo SimulationWe will look at two example simulations:1. a binary communication system, and 2. propagation of errors through a measurement system.

1. A binary communication systemA binary communication system consists of a transmitter that sends a binary ”0” or ”1” over a channel toa receiver. On average, the transmitter transmits a ”0” with a probability of 0.4 and a ”1” with a proba-bility of 0.6. The channel occasionally causes errors to occur flipping a ”0” to ”1” and a ”1” to a ”0”; theprobability of this error occurring is 0.1.Using this information, we wish to calculate the following:a) The probability of a ”1” being transmitted without error.b) The probability of a ”0” being transmitted without error.c) If a ”1” is observed at the receiver, what is the probability of it being correct.d) If a ”0” is observed at the receiver, what is the probability of it being correct.The first two calculations are trivial, but the second two requires Bayes’ theorem. The solutions to thesequestions are given below.

P(A0)=0.4 P(A1)=0.6 Bayes’ Theorem

P(Y|X) = P(X|Y) P(Y) / P(X)

A0 A1

+ + Transmitter (A) a) asks for P(B1|A1) = 0.90000

|\ /| b) asks for P(B0|A0) = 0.90000

| \ / |

| \ / | c) asks for P(A1|B1) =

| \ / | P(B1|A1) P(A1) / P(B1) =

| \ / | 0.9 0.6 / (0.9 P(A1) + 0.1 P(A0)) =

P(B0|A0) | \ / | P(B1|A1) 0.54 / (0.9 0.6 + 0.1 0.4) =

= 0.9 | Channel | = 0.9 0.54 / (0.54 + 0.04) = 0.93103

| /\ |

| / \ | d) asks for P(A0|B0) =

| / \ | P(B0|A0) P(A0) / P(B0) =

| / \ | 0.9 0.4 / (0.9 P(A0) + 0.1 P(A1)) =

| / \ | 0.36 / (0.9 0.4 + 0.1 0.6) =

| / \ | 0.36 / (0.36 + 0.06) = 0.85714

|/ \|

+ + Receiver (B) Answers: a) 0.90000 b) 0.90000

B0 B1 c) 0.93103 d) 0.85714

Note that the probability of observing a correct ”0” is less than that of a ”1” because more 1’s flip to 0’s asP(A1) > P(A0).

88

Page 93: Gantep - EP 208 Books

The above questions can be solved via a MC simulation; the simulation generates the correct fractionsof zeros and ones and then flips them with a probability of 10 percent and counts the resulting number ofzeros and ones and their history.

Algorithm 8c (Fortran syntax [mostly])

n = 10000000

correct0=0, correct1=0

incorrect0=0, incorrect1=0

do i = 1, n

call random_number(r) ! generate a binary "1" or "0"

if (r<0.4) then

bit=0 ! P{0}=0.4

else

bit=1 ! P{1}=0.6

end if

call random_number(r) ! give a 10% probability of an error

if (r<0.1) then ! flip the bit

if (bit==0) incorrect1=incorrect1+1

if (bit==1) incorrect0=incorrect0+1

else ! no error

if (bit==0) correct0=correct0+1

if (bit==1) correct1=correct1+1

end if

end do

output "a) P(of a 1 being transmitted without error)", correct1/(correct1+incorrect0)

output "b) P(of a 0 being transmitted without error)", correct0/(correct0+incorrect1)

output "c) P(1 is observed correctly)", correct1/(correct1+incorrect1)

output "d) P(0 is observed correctly)", correct0/(correct0+incorrect0)

The output for 10 million trials is

a) P(1 being transmitted without error) 0.90003

b) P(0 being transmitted without error) 0.90008

c) P(1 is observed correctly) 0.93116

d) P(0 is observed correctly) 0.85707

Note that ”correct1+incorrect0” is the number of 1’s at the transmitter, and ”correct1+incorrect1” is thenumber of 1’s at the receiver.The values obtained for part a and b provide a basic validation of the simulation, the values obtained forpart c and d provide a validation of Baye’s theorem.

89

Page 94: Gantep - EP 208 Books

2. Propagation of errors through a measurement systemConsider, for example, a pressure measurement system. Such a system may have several components thatprocess a signal that is initially generated by a pressure transducer. For example the system may containthe following components:

dT1 | dVs | dT2 |

| | |

v v v

+------------+ +------------+ +-----------+ +----------+

---->| Pressure |----->| Deflection |------>| Amplifier |----->| Recorder |----->

P | transducer | R | bridge | V1 | | V2 | | Pm

+------------+ +------------+ +-----------+ +----------+

A resistance R (Ohms) is output from the pressure transducer in response to an input pressure P (Pascals).The deflection bridge converts the resistance into a voltage V1 (mA) which in turn is amplified to the voltageV2 (mV) by the amplifier. Finally the recorder outputs a reading Pm (Pascals).Suppose that the response of the last three components is affected by random variations in the environment;variations from standard ambient temperature dT affects the gain of the deflection bridge and creates a biasin the recorder output, variations from the standard supply voltage dVS causes a bias in the output of theamplifier. The output of each component can be modeled as follows:

R = 0.0001 P

V1 = ( 0.04 + 0.00003 dT1 ) R

V2 = 1000 V1 + 0.13 dVs

Pm = 250 V2 + 2.7 dT2

Here dT has a Gaussian distribution centered at zero with a standard deviation sT =3.0 C, and dVS has aGaussian distribution centered at zero with a standard deviation sV s=0.23 V [see sketches in the class].We can first consider the output of the system for various inputs given standard environmental conditions;i.e with dT and dVS set to their average values of zero. The model of the measurement system becomes:

R = 0.0001 P

V1 = ( 0.04 ) R

V2 = 1000 V1

Pm = 250 V2 = 250 (1000 V1) = 250000 0.04 R = 10000 0.0001 P = P

i.e. Pm = P and so the system is perfectly calibrated.However, dT and dVS are randomly non-zero causing a random variation in the outputs of each component.These random variations propagate through the system to the final output. Assuming Gaussian randomvariations, the standard deviation sO in the output O of a component for an input I is given by (here d arepartial derivatives):sO

2 = (sI dO/dI)2 + (sA dO/dA)2 + (sB dO/dB)2 + (sC dO/dC)2 + ....Here, O is dependent on the input I and the random variables A, B, C... . Note that input I has a standarddeviation sI due to random errors in the output of the previous component; the random errors thereforepropagate through the system to the final output.Including these random errors the response of the system is shown below for an input of 5000 Pa.

R = 0.0001 P = 0.0001 5000 = 0.5 Ohms

V1 = ( 0.04 + 0.00003 dT1 ) 0.5

= 0.04 0.5 + 0.00003 dT1 0.5

= 0.02 mV + 0.000015 dT1

90

Page 95: Gantep - EP 208 Books

dT1 is Gaussian with sT = 3.0

and so sV1^2 = (sT dV1/dT1)^2 = ( 3.0 0.000015 )^2

and so sV1 = 0.000045 mV

with V1 = 0.02 mV (with dT1 = 0 on average)

V2 = 1000 V1 + 0.13 dVs

= 1000 0.02 + 0.13 dVs

= 20 mV + 0.13 dVs

dVs is Gaussian with sVs = 0.23 and we have also sV1 = 0.000045

and so sV2^2 = (sV1 dV2/dV1)^2 + (sVs dV2/dVs)^2

= (0.000045 1000)^2 + (0.23 0.13)^2

= (0.045)^2 + (0.029)^2

and so sV2 = 0.05403 mV

with V2 = 20 mV (with dVs = 0 on average)

Pm = 250 V2 + 2.7 dT2 [here we assume this dT2 is independent of the above dT1]

= 250 20 + 2.7 dT2

= 5000 Pa + 2.7 dT2

dT2 is Gaussian with sT = 3.0

and so sPm^2 = (sV2 dPm/dV2)^2 + (sT dPm/dT2)^2

= (0.05403 250)^2 + (3.0 2.7)^2

and so sPm = 15.75 Pa

with Pm = 5000 Pa (with dT2 = 0 on average)

Note that again the system is perfectly calibrated with an average output of 5000 Pa for an input of 5000Pa, but the output is Gaussian distributed with a standard deviation of 15.75 Pa.We can verify this calculation by performing a monte-carlo simulation of the system. For this we need to beable to transform a uniform random variable into Gaussian (normal) random variable. This can be achievedby applying the Box-Muller transformation (see rnrm.f90 and rnrm.c++ in the downloads page):

real function rnrm()

!----------------------------------------------------------------------------

! Returns a normally distributed deviate with zero mean and unit variance

! The routine uses the Box-Muller transformation of uniform deviates.

!----------------------------------------------------------------------------

real :: r, x, y

do

call random_number(x)

call random_number(y)

x = 2.0*x - 1.0

y = 2.0*y - 1.0

r = x**2 + y**2

if (r<1.0) exit

end do

rnrm = x*sqrt(-2.0*log(r)/r)

end function rnrm

91

Page 96: Gantep - EP 208 Books

The above measurement system is simulated in the following algorithm:

Algorithm 8d (Fortran syntax [mostly])

m1=0, m2=0, n = 100000000

sT=3.0, sVs=0.23, P=5000.

do i = 1, n

! random variables

dT1 = sT * rnrm() ! Gaussian pdf with standard deviation sT

dT2 = sT * rnrm() ! Gaussian pdf with standard deviation sT

dVs = sVs * rnrm() ! Gaussian pdf with standard deviation sVs

! model of the measurement system

R = 0.0001 * P ! the pressure transducer

V1 = ( 0.04 + 0.00003 * dT1 ) * R ! the deflection bridge

V2 = 1000 * V1 + 0.13 * dVs ! the amplifier

Pm = 250 * V2 + 2.7 * dT2 ! the recorder

! statistical variables

m1 = m1+Pm ! E[Pm]

m2 = m2+Pm**2 ! E[Pm^2]

end do

m1 = m1/n

m2 = m2/n

output "mean, m1 = ", m1

output "sd = sqrt(mu2) = ", sqrt(m2-m1**2)

The output for n = 100000000 is

mean, m1 = 4999.99994

sd = sqrt(mu2) = 15.75136

Notes:1. The simulation is very simple and can be expanded easily to include more components and environmentaleffects.2. The result is in very good agreement with the theoretical result (given large enough n).3. Although the environmental effects dT1, dVs, dT2 are included in the models, and they are non-zero,the net result is a zero contribution giving a mean output of 5000.4. In this treatment (and in the theoretical treatment) changes in the ambient temperature (dT1 anddT2) are considered independent of each other. However, in reality these quantities are the same, i.e.dT1=dT2, and so are fully correlated; the actual standard deviation should therefore be larger. It is trivialto incorporate this situation in the algorithm (replace dT2 with dT1, and remove dT2); see the lab exercise.

92

Page 97: Gantep - EP 208 Books

8.3 Lab Exercises

Part A:

1. MC estimation of the volume of a sphereThe following Fortran program estimates the area of a circle of unit radius.

implicit none

integer :: i, m=0, n=10000000

real :: x, y

do i = 1, n

call random_number(x)

call random_number(y)

if ( x**2+y**2 < 1.0 ) m=m+1

end do

print *, 4.0*m/real(n)

end

Copy the program (or rewrite in your language of choice), run and check it, and then modify it to estimatethe volume of a sphere of unit radius. How many random trials (n) does it take to acheive an accuracy ofthree decimal places?

2. MC integration of a function f(x)a) Write a Monte Carlo integration program to integrate the function f(x) = (1-x2)1/2 over the range x =-1 to x = 1.b) Sketch the integral and determine ymax.c) Write, compile and run your program.d) Compare your computed result with the analytical result: π/2.e) How many trials (n) are required to acheive an accuracy of three decimal places?

Part B:

Propagation of errors through a measurement systema) Code Algorithm 8d into a computer program, compile and run the program, check that the output agreeswith the theoretical result.b) Verify that the mean output of the measurement system is equal to the input for the input values: 1000Pa, 2000 Pa, 4000 Pa, and 8000 Pa; comment on the size of the standard deviation for each input value.c) In this model we assume that dT is independent for the deflection bridge and the recorder; modify yourprogram such that dT is not independent (this is more realistic), what is the effect of this on the standarddeviation of the output?

93

Page 98: Gantep - EP 208 Books

8.4 Lab Solutions

Part A:

1. MC estimation of the volume of a sphereThe following Fortran program estimates the area of a circle of unit radius.

implicit none

integer :: i, m=0, n=10000000

real :: x, y

do i = 1, n

call random_number(x)

call random_number(y)

if ( x**2+y**2 < 1.0 ) m=m+1

end do

print *, 4.0*m/real(n)

end

Copy the program (or rewrite in your language of choice), run and check it, and then modify it to estimatethe volume of a sphere of unit radius. How many random trials (n) does it take to achieve an accuracy ofthree decimal places?

Solution eee484ex8a (see the downloads page).We simply add another dimension Z and the test for X2 + Y2 + Z2

< 1This defines an 1/8th of the volume (an octant) of the sphere, so the volume 4π/3 is 8m/n.The estimate is 8m/n and so the error in the estimate is 8m/n - 4π/3The error can be inspected for increasing values of n:

n 8*m/n Error = 8*m/n - 4pi/3

10 2.4 -1.7887903

100 4.4 0.21120968

1000 4.104 -0.08479032

10000 4.1768 -0.011990322

100000 4.20456 0.01576968

1000000 4.191408 0.0026176786

10000000 4.1885104 -0.00027992134 | 3 d.p.

100000000 4.1886897 -0.00010080135 | accuracy

It appears that we need of the order of n = 107 trials to gain a 3 decimal place accuracy! The number oftrials required depends on the initial seed of the generator - there are significant statistical fluctuations.

2. MC integration of a function f(x)a) Write a Monte Carlo integration program to integrate the function f(x) = (1-x2)1/2 over the range x =-1 to x = 1.b) Sketch the integral and determine ymax.c) Write, compile and run your program.d) Compare your computed result with the analytical result: π/2.e) How many trials (n) are required to achieve an accuracy of three decimal places?

94

Page 99: Gantep - EP 208 Books

Solution eee484ex8b (see the downloads page).We have a=-1.0, b=+1.0, Ymax=1.0 (by simple inspection), and the integral estimate (b-a) Ymax m/n. Theerror in the estimate is (b-a) Ymax m/n - π/2; we can investigate the effect of varying n as follows:

n m Estimate Error

10 5 1.000000 -0.570796

100 66 1.320000 -0.250796

1,000 793 1.586000 0.015204

10,000 7855 1.571000 0.000204

100,000 78674 1.573480 0.002684

1,000,000 785723 1.571446 0.000650 | 3 d.p.

10,000,000 7856866 1.571373 0.000577 | accuracy

100,000,000 78547805 1.570956 0.000160

1,000,000,000 785407068 1.570814 0.000018

So it appears (after investigating other seeds) one requires of the order of n = 107 trials to gain a 3 decimalplace accuracy!

Part B:

Propagation of errors through a measurement systema) Code Algorithm 8c into a computer program, compile and run the program, check that the output agreeswith the theoretical result.b) Verify that the mean output of the measurement system is equal to the input for the input values: 1000Pa, 2000 Pa, 4000 Pa, and 8000 Pa; comment on the size of the standard deviation for each input value.c) In this model we assume that dT is independent for the deflection bridge and the recorder; modify yourprogram such that dT is not independent (this is more realistic), what is the effect of this on the standarddeviation of the output?

Solution eee484ex8c (see the downloads page).a) For n=10000000 and P=5000 the output is: mean = 5000.005 sd = 15.758b) For n=10000000P=1000 the output is: mean = 1000.001, sd = 11.251P=2000 the output is: mean = 2000.002, sd = 11.909P=4000 the output is: mean = 4000.004, sd = 14.237P=8000 the output is: mean = 8000.008, sd = 21.118The relative error reduces as the input increases; this can be seen by inspecting the mathematical model ofthe measurement system.c) To make dT not-independent, simple replace ”dT2 = sT * rnrm()” with ”dT2 = dT1”; the result isan increase in the output standard deviation from 15.75 to 20.75 (when dT is independent the two valuessometimes partially cancel, if they have opposite signs).

95

Page 100: Gantep - EP 208 Books

8.5 Example exam questions

Question 1

Write a computer program to perform a Monte Carlo integration of the

function y = 5 sqrt(x) - x over the interval x=1.0 to x=9.0.

96

Page 101: Gantep - EP 208 Books

A Linux Tutorial

Linux Tutorial - Simple data manipulation programs under Linux

Your central interactive computer account uses the Linux operating system. Linux is one of

several variants of Unix. Many basic though very useful data manipulation programs exist

under Linux. We will look at the following programs:

cat - concatenate files

sort - sort lines of text files

uniq - remove duplicate lines from a sorted file

diff - find differences between two files

echo - display a line of text

sed - a Stream EDitor

tr - translate or delete characters

grep - search for lines in a file matching a pattern

head - output the first part of files

tail - output the last part of files

wc - print the number of bytes, words, and lines in files

cut - remove sections from each line of files

For detailed information about these commands type:

man ’command’ or info ’command’ or ’command’ --help

Additionally, we can use "output redirection" (the ’>’ symbol) to redirect the output of

a program to a file instead of the screen, "append" (the ’>>’ symbol) to add(append) to

files, and "piping" (the ’|’ symbol) to "pipe" the output of one program into another

program.

Copy the following data files to your file space, you can do this with the

command get-EEE484-unix. Starting with files file1.dat file2.dat file3.dat we can

perform the following operations (you can view the contents of a file with the command

"cat filename"):

The file contents are:

file1.dat file2.dat file3.dat

--------- --------- ---------

cake house bed

hat pen fence

pool fish tool

ten cake one

tool comb

Join the contents into a single file:

$ cat file1.dat file2.dat file3.dat > file4.dat

the output is sent to the file file4.dat, view it with ’cat file4.dat’

Sort into alphabetical order:

$ sort file4.dat > file5.dat

output to file5.dat, view it with ’cat file5.dat’

97

Page 102: Gantep - EP 208 Books

Remove multiple entries:

$ uniq file5.dat > file6.dat

We can combine the last two operations with:

$ sort file4.dat | uniq > file6.dat

(note that there is no file5.dat required)

Look at the difference:

$ diff file5.dat file6.dat

Append the word ’dog’ to the file:

$ echo dog >> file6.dat

Again sort into alphabetical order:

$ sort file6.dat > file7.dat

Replace the word ’house’ with the word ’home’:

$ sed "s/house/home/g" file7.dat > file8.dat

Translate all letters between a and z with their upper case values:

$ cat file8.dat | tr a-z A-Z > file9.dat

Search for the word ’home’ in the file:

$ grep home file9.dat

Search for the word ’HOME’ in the file:

$ grep HOME file9.dat

Search for the word ’home’ ignoring case:

$ grep -i home file9.dat

And again giving the line number of any occurrences:

$ grep -i -n home file9.dat

Display the first 8 lines:

$ head -n8 file9.dat

Display the last 5 lines:

$ tail -n5 file9.dat

Count the number of lines, words and characters in the file:

$ wc file9.dat

Display the first three characters of each line:

$ cut -b1-3 file9.dat

Display the seconds to fourth character of each line:

$ cut -b2-4 file9.dat

98

Page 103: Gantep - EP 208 Books

Lab Exercise - Linux

Download the file lep.dat and perform the following analysis.

1. Use wc to determine how many particles are there in the list.

2. Use cut, sort and uniq to form a unique list of particle species;

how many species of particles are there?

3. Use grep and wc to determine how many particles there are of each species.

4. Use cut, sort, head, and tail to determine the maximum and minimum

particle momenta. Which particles are they?

5. Use piping; i.e. dont waste time creating intermediate files, repeat the exercise

(except part 3) with the file lep2.dat; the file contains many more particles.

Solution for Lab Exercise Linux

1. Use wc to determine how many particles are there in the list.

Answer:

$ wc lep.dat

26 52 442 lep.dat

26 lines implies 26 particles in the list.

2. Use cut, sort and uniq to form a unique list of particle species;

how many species of particles are there?

Answer:

$ cut -b1-7 lep.dat | sort | uniq

KAON-

NEUTRON

PHOTON

PION+

PION-

PROTON

There are six particle species in the list.

3. Use grep and wc to determine how many particles there are of each species.

Answer:

We search for each particle type (6 of them) and count.

$ grep KAON- lep.dat | wc

1 2 17

$ grep NEUTRON lep.dat | wc

2 4 34

$ grep PHOTON lep.dat | wc

10 20 170

$ grep PION+ lep.dat | wc

7 14 119

$ grep PION- lep.dat | wc

5 10 85

99

Page 104: Gantep - EP 208 Books

$ grep PROTON lep.dat | wc

1 2 17

The result is: 1 Kaon, 2 Neutrons, 10 Photons, 7 positive Pions,

5 negative Pions, and 1 proton; (26 in total)

4. Use cut, sort, head, and tail to determine the maximum and minimum particle momenta.

Which particles are they?

Answer:

$ cut -b11-16 lep.dat | sort -n | head -n1

0.120

$ cut -b11-16 lep.dat | sort -n | tail -n1

22.284

The minimum momentum is 0.120 GeV/c, the maximum is 22.284 GeV/c

$ grep 0.120 lep.dat

PHOTON 0.120

$ grep 22.284 lep.dat

PION- 22.284

The minimum momentum belongs to a Photon. The maximum momentum belongs to a

negative Pion.

5. Use piping; i.e. dont waste time creating intermediate files, repeat the exercise

(except part 3) with the file lep2.dat; the file contains many more particles.

Answer:

$ wc -l lep2.dat

980 lep2.dat

There are 982 particles in the list.

$ cut -b1-8 lep2.dat | sort | uniq | wc -l

87

There are 87 particle species in the list.

$ cut -b11-19 lep2.dat | sort -n | head -n1

0.001432

$ cut -b11-19 lep2.dat | sort -n | tail -n1

40.19147

The minimum momentum is 0.001 GeV/c, maximum is 40.191 GeV/c.

$ grep 0.001432 lep2.dat

GAMMA 0.00143232674

$ grep 40.191 lep2.dat

B*- 40.1914749

The minimum momentum belongs to a GAMMA. The maximum momentum belongs to a B*-.

100