copyright © 1978, by the author(s). all rights reserved ... · newton's divided difference...

Copyright © 1978, by the author(s).

All rights reserved.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. To copy otherwise, to republish, to post on servers or to redistribute to

lists, requires prior specific permission.

A METHOD FOR COMPUTING FUNCTIONS OF A MATRIX

BASED ON NEWTON'S DIVIDED DIFFERENCE FORMULA

by

A.C. McCurdy

Memorandum No. UCB/ERL M78/69

October 1978

ELECTRONICS RESEARCH LABORATORY

College of EngineeringUniversity of California, Berkeley

94720

A Method for Computing Functions of a Matrix Based on

Newton's Divided Difference Formula

Contents

Definitions and Representations of Matrix Functions

1.1 Definition of f(A)

1.2 Other representations of f{A)y the Newton polynomial

Divided Differences

2.1 Basic definitions and properties of divided differences

2.2 Divided differences for some special functions

2.3 Difficulties in computing divided differences

2.4 Power series for A0Y

2.5 Inadequacy of the series and recursive formulas, the question of "close" abscissae

2.6 The divided difference table as a matrix

2.7 A scaling and squaring method for the exponential

2.8 The complete divided difference table

Computing the Newton Polynomial of f(A)3.1 Complex matrices

3.2 Real matrices

Use of the Newton Polynomial Method

4.1 Other methods for computing f(A)

Appendix

A. Error bound for Algorithm 2.2

B. Error bound for Algorithm 2.4

References

Acknowledgment

The author is most grateful to Prof. Beresford Parlett for his comments and help in getting this report written.

Research sponsored by the Office of Naval Research Grant tt00014-76-C-0013.

A Method for Computing Functions of a Matrix Based onNewton's Divided Difference Formula

A function of a constant square matrix A, written f(A), may be defined as a polynomialin A. One particular polynomial representation for f(A) is based on Newton's divideddifference formula for interpolating polynomials. The coefficients of the polynomial are divideddifferences of / at the eigenvalues of A. Several methods for computing divided differenceswill be investigated, and it will also be shown that a table of divided differences is, itself, afunction of a very special matrix. Finally, some details in using the Newton polynomialapproach for computing f(A) will be discussed in order to point out possible ways to make thecomputation more efficient.

1. Definitions and Representations of Matrix Functions

1.1 Definition of f(A).

Let A be an w+1 x n+\ complex matrix andA^ = {A0, . • • ,A0,Ai, • • • ,A1# . . . ,A,, . . . ,A/} be the set of eigenvalues of A, /+1 of whichare distinct. Each distinct eigenvalue occurs w,+l times, /=0,1 /, and A,, has

£ Oi/+l) = n+l elements. The elements of A,, are the roots of the characteristic polynomial/•-o

ofy4,

XA(X) = (X-Xo^a-X,)"'*1 ••' (A-A,)"'*'. (l.l.DWhen the eigenvalues of Aare distinct, the definition of f(A) simply requires that /(A)

be defined for each \€AA. However for greater generality, we must be a bit more careful inorder to allow for multiple eigenvalues. With this in mind, /will be required to satisfy the following.

Definition: The function /is said to be "defined on the characteristic values of A" when /(A,),/(A/),..., /^(A,) for /=0,1 , /are all defined and finite. This set of values is denoted, forbrevity, by f(AA).

With /satisfying the above, f(A) may be defined as a polynomial in the matrix A.

Definition: Function of a matrix. If /is defined on the characteristic values of A and p isany polynomial such that

p(A^)-/(Ai4)

then

f(A) = p(A).

The polynomial p is an osculating interpolation polynomial for /on A^. That is,p(X.)=/(A.), /j'U/WU,),..., Jp("')(A/)=/',')(A/) for /=0,1 /. When the eigenvaluesare distinct, the definition of f(A) becomes particularly simple, as then p is just the ordinary

interpolation polynomial for /at the elements of A^. Some authors choose to define f(A) interms of a power series in A and A's Jordan canonical form [3,18]; however, the abovedefinition is most natural for the purposes of this paper.

The rationale behind the definition of f(A) is that for any two functions /and #, /(A) isindistinguishable from g(A) when f(\A) = g(AA)> The set of zeros of f(k)-g(k) includesthe roots ofx^(A), and xaU)=0 by the Cayley-Hamilton theorem. The interpolating polynomial phas degree at least «, as it must satisfy the w+1 conditions given in the definition.* Interpolating polynomials p may be chosen to satisfy additional conditions; however, the degree ofthe polynomial will be increased.

The characteristic polynomial xa is an annihilating polynomial for A because xa(A)~0.However, for some matrices A there are polynomials of smaller degree which are also annihilating polynomials. The minimal polynomial fxA is the non-trivial annihilating polynomial of leastdegree. If fjLA(\) has degree w+1, then m< n. It is possible to define f(A) in terms of apolynomial pm of degree mwhich interpolates /at the m+1 roots of the minimal polynomialILA. Gantmacher [4] uses this slightly more general approach in his definition of f(A). Theset of roots of nA (A) is contained in the set A^ of eigenvalues of A. For m< «, fewer derivatives of/need be specified; however, fxA and the multiplicities of its roots are generally difficultand costly to obtain. Thus, we shall not try to form f(A)=pm(A) for the smallest possibledegree m, but shall consider other polynomials. This may lead to polynomials of significantlyhigher degree than pm in a few cases, see Fig. 1.2.1, but it also leads to greater simplicity in thatless need be known in advance about the matrix A.

A =

0 1 2

0 1 0

•1 1 3

/^(A) = (A-l)(A-2)

x>4(A) = (A-l)2(A-2)

Fig. 1.1.1: Degree of fxA may be less than degree of xA.

That /CO is representable as a polynomial in A leads to two elementary, but very important, consequences.

(1). For any rt+1 x w+1 nonsingular matrix P,

f(PAP~l) = P/(A)-P-K (1.1.2)

In theory, this permits all work to be performed on the simplest matrix similar to A, e.g. theJordan canonical form. In practice, P may be difficult to compute accurately* and some lesssimple form may be required.

(2). Commutivity.

A'f(A) = f(A)'A (1.1.3)

Parlett [15] has developed a fast method for computing f(A) based on this property (§4.1).

|A polynomial of degree k can interpolate at, at most, k+l points. In general k+l points uniquely determine a polynomial of degree k through the points; higher degree polynomials are not uniquely determined.*Kagstrm [101 gives a method for computing the Jordan form. Golub and Wilkinson [6] discuss limitationsin computing the Jordan canonical form.

1.2 Other representations of f(A), the Newton polynomial.

Additional representations of f(A) are obtained from other ways of writing interpolatingpolynomials p which satisfy p{AA)=f{AA). One such p is the "Newton polynomial" p„ whichis derived directly from Newton's divided difference formula for the interpolating polynomial[12], namely1

/>»(A)=ZAoVff(A-A;)k~0 7-0

(1.2.1)

The first few terms of this polynomial are

/(Aq) + AJ/^A-Aq) + A02/-(A-A0)(A-A1) + Ao3/-(A-A0)(A-A,)(A-A2) + • • • .

This polynomial satisfies p„(AA)=f(AA) where Ai4 = {A0,Ai, . . . .A„}, the eigenvalues havingbeen renumbered.

Newton polynomial for f(A). If /is defined on the characteristic values of Ay then takingAA as the set of abscissae for the divided differences,

/go = imu(A -\jD. d.2.2)k~0 ./=0

/^(A) = (A-l)(A-2)

X/1(A) = (A-l)4(A-2)

B =

1 1

1 1

1 1

1

/tB(A) = (A-l)4(A-2)

Xb(A) =Mb(A)

pmU)=Al)I +[f(2)-fa)KA-I) pJB) =±<£^(B-I)k +^f(B-iy*=o K-

PM) =t,J^-(A-I)k+^f(A-l)*p„(B) =pm(B)*=o

Fig. 1.2.1: pm depends on the eigenspaces of the matrix.

Series representations of /CO are also possible. Following Gantmacher [4], when£w*(A) is defined (converges) on the characteristic values of A to /(A), that is*m° CO/(A,) =£«*(A,)>..., /^(A,) =£uk{ni\\i) for /=0,1, . . . ,/, denoted f(AA) =IMAA

*=0 A-0*-o

/C4) = 5>COft«0

In particular, if/can be expanded in a power series about A= a

/(A) = £y*(A-0**-o

tsee §2.1 for details on divided differences and divided difference notation.

(1.2.3)

(1.2.4)

-6

with circle of convergence |A - a\ < R, then when A^ c {A | R > |A - a \]

/0O-£y*C4-«/)*.k-0

Within the circle of convergence y^^iailkU the Taylor coefficients.

(1.2.5)

Taylor series representation for f(A). When /is representable by a Taylor series about aon a domain ft and A^ c ft, then

AA) =±^p-{A-aI)k. (1.2.6)

One other seldom seen representation derives from Newton's divided difference series [5]

/(A)=lA0Ayn(A-^) 0.2.7)A:=0 j~0

where the set of abscissae for the divided differences M= Wam»-1 is contained in thedomain of convergence of the series.

Newton series representation for f(A). When /is representable by a Newton divideddifference series about M on a domain ft and A,, c ft, then

*-i

/oo-ZAMI^-/*^-A=0 7-0

(1.2.8)

If the first rt+1 elements of M are the set A,,, i.e. mo=A0,..., ^„ = An, the Newton seriesn

(1.2.8) reduces to the Newton polynomial (1.2.2) because ]J(A-\jI) = 0 by the Cayley-

Hamilton theorem.

For some of the elementary functions, such as exp, sin, cosh, the Taylor series and theNewton polynomial representations of /CO recommend themselves for numerical computation. The Taylor coefficients are easy to compute and are independent of A. Unfortunately,the series may converge slowly for some A. The Newton polynomial is exact after only w+1terms; so convergence is no problem. This great savings in work is diminished by the need tocompute the eigenvalues of A. However, if the dimension of A is not large, say n< 100,efficient and accurate algorithms now exist for finding all the eigenvalues [17]. Until recentlythe eigenvalue problem was thought to be very difficult, but now the need to find the eigenvalues of a "small" matrix is no drawback for a method. The difficulty in employing the Newton polynomial lies in computing its coefficients (i.e. the divided differences). Often, divideddifferences of / can be obtained quickly and accurately. When this is the case, the Newtonpolynomial becomes a practical method for computing functions of a matrix. The next sectionwill study the divided difference problem.

2. Divided Differences

2.1 Basic definitions and properties of divided differences.

Divided differences were studied extensively in classical precomputer numerical analysisas part of the finite difference calculus. They saw great use in the tabulation of tables of function values. This subject, along with the general topic of interpolation, is treated in Milne-Thomson [12].

Quite different purposes are envisioned here; however, much of the classical theory is stillrelevant. Before proceeding to develop formulas for the calculation of divided differences, afew well-known definitions and consequences will be presented. The notation used will besomewhat different from that of other authors, but it is felt to be an improvement. Onceunderstood, it should cause no confusion to those already familiar with divided differences.

Most common notations for divided differences are cumbersome. For clarity we shallbegin with such a notation, but later reduce it to a more compact form by suppressingunneeded information. Let / be a function of a single variable C and let /(C) be defined, atleast, on the set Z= (Co.Ci. • • • »C«} of distinct complex numbers. Z is called the set of abscissae, orsometimes the set of data points or the set of nodes. The 0-th divided difference of /atCo is defined as

(A°/)(Co)=/(Co). (2.1.1)The first divided difference of/at Co is a function of the two variables (abscissae) Co and Si andis formed from the 0-th divided difference by the familiar formula

f..fy ,,. (A°/)(g1)-(A0/)a0) /(€,)-/(Co)(A /)(C0.O = £^; Ci_Co (2.1.2)

The fc-th order divided difference of /at Co is a function of the k+\ abscissae Co»Ci». . . ,Caand is defined recursively from A:-1-st order divided differences which are functions of kabscissae.

First recursive definition of divided differences. When /is defined on Z, the fc-th order,0< *:<«, divided differences of/at £, for 7=0,1, . . . , n-k are

U*-WUi tm) ~U*-1/)^, •. ..W-fc j 3)(A*/)(C,-,C,+i W =

£/+* ~ £/

(A*/)(Cy»Cy+i. • ••>tj+k) has no dependence on abscissae with indices <j or >j+k, and so nogenerality is lost by considering just (A"/)(Co,Ci» . • • »C«)-

Divided differences are very special functions of the data points in Z. Not only does thenumber of data points used increase with the order of the difference, but the divided differenceis symmetric in its arguments. This may be seen by employing an equivalent representation ofthe divided difference in terms of determinants [12].

/(Co)

Co""1

1

/(Ci)yn-\

Co"

ICo"'1

1

Cl

rn-1

(A"/)(Co.C, .C) =

/(c„)/ If—1Slf

1

Vny ii—1

1

(2.1.4)

The abscissae Z

(A"/)(Co.Ci, • •

1 1

={Co»Ci. • • • .Cit) may be arranged in any order without changing the value of,C„). This is the very important symmetry property.

Symmetry property. Let ir be a permutation on the set of indices 0,1, .... n. Then

(A Y)(C0,Ci Cw) = UVKCw.Cra) U„)) • (2.1-5)

The primary defect in definition (2.1.3) is that the data points must be distinct. However,if/is differentiable (2.1.3) may still be defined even for confluent (i.e. equal) abscissae. Inparticular, when Z={C0,Co, • • • .Co), (2.1-3) is defined if/"HCo) exists. For confluent abscissae the divided difference reduces to

(Aw/)(Co.Co Co) =/w)(Co)

n\(2.1.6)

Since, in theory, the data points can be arranged in any order without changing the value of thedivided difference, it is clear that for suitable / (2.1.3) will always be defined if (2.1.6) is usedwhen confluent abscissae occur. So, the requirement that the abscissae be distinct may beremoved.

Definition: Let Z = {C0, . . • .C0.C1. Ci, C/ >C/J be the set of abscissae (just arenumbering of the previous Z) where each C, appears w,+l times, JCvH) =«+l. The func-

'~° (i»)tion / is said to be "defined on the set of abscissae Z" when /(C,), /(£/),..•» / ' (O for/=0,1, . . . , /are all defined and finite. Thisset of values is denoted /(Z).

Before rewriting the definition (2.1.3) in more generality, a more compact notation will beintroduced. In the work here, the set of data points Z= {C0,Ci» . . . ,C«) is given and, usually,in a fixed order. Hence, reference to Z may be suppressed. Thus,

A// = (A*/KC,,C,+i C,+*) • (2.1.7)In the event that the set Zmust be emphasized, A£ywill be written for A//*

Recursive definition of divided differences. When / is defined on the set of abscissae Z,then for /r=l,2, . . . ,n

jJ C,+*-C;(2.1.8)

where Ay0/= /(Cy)-

This definition of divided differences and the definition of the Newton polynomial in §1.2are seen to be consistent. Indeed, if Z = A^, "defined on the set of abscissae Z" and "definedon the characteristic values of An are the same.

Divided differences have many useful representations and properties. Several of these arenow listed.

Divided difference tables. Divided differences are most conveniently presented in tables. Traditionally, they are arranged in a display like that in Fig. 2.1.1. Each divided difference is computed from its two immediate neighbors in the column to its left. For our purposes, it turnsout to be most useful to define the table as an upper triangular matrix.

tMilne-Thomson [12] writes Atf/ as [£„,£, £„], suppressing the function; Davis [2] uses/"l(£0.£i t„)\ and Kahan [11] uses A/(£0.fi S„K which suggested the notation used here.

Co

Ci

C2

C3

u

/(Co)

/(C.)

/(C2)

/(C3)

/(C4)

AJ/

A,1/

A21/

A31/

Ao2/

A?/

A22/

Ao3/

A?/A04/

Fig. 2.1.1: Standard divided difference table.

A/ s

/(Co) Ao7 A02/

/(C.) A//

/(C2)

A0W/

AT'/

A2"-2/

(2.1.9)

/(C) .

The symbol A/, without the superscript, is used here as a matrix, not a scalar. The elements ofthe matrix depend on their immediate neighbors in the diagonal below.

AC) - A// — A2/ — A3/ — A4/

1 t t t

/(C,+i) * A/W

t

~~* A|2+,/

1

* A,3+,/

t

/(c,+2)

t

/tt/+3)

—* A,2+2/

t

A,W

t

Fig. 2.1.2: Pattern of dependence in a divided difference table.

This causes a "pattern of dependence" where A// is independent of all table entries in rowsbefore the >th and columns after the j+k-th. This pattern of dependence is characteristic oftriangular matrices.

Linearity. If a and ft are scalars,

AdW+0*) = «A,5'/ + /3Ao"s. (2.1.10)

Translation invariance. For Z-a= {Co-fl»Ci_«» • • • »C«-tf} and /fl(C) =/(C~o)y

A{"/. = *£_,/. (2.1.11)

For example,

10-

1.000 1.718 1.476 .8455 .36322.718 4.671 4.013 2.298

7.389 12.70 10.9120.09 34.51

54.60

Fig. 2.1.3: Divided difference table for /= exp, Z={0,1,2,3,4}.

Al, /q(Ci)-/fl(Co) /(Ci-*)-/(Co-a) _,,A^~ C.-Co (C-a)-(Co-*) fo""

Mean value representation. When the abscissae are real, then [12]

A0"/= ^(p minC,<C< maxC/ (2.1.12)for any / having n continuous derivatives in the interval containing the data points. Thisrepresentation has noequivalent for the case of complex abscissae. For example, if /= exp andCo=£,Ci = £+2tt/, then

eS+2ni _ etA0exp =

(* + 2ir/)-£

for any finite C-

Integral representation. Another representation for A0n/ when / has a bounded w-th orderderivative on a closed convex domain ft containing Z is [5]

Ao/ =SS ' ' • J /"^o+tCr-Co)'!* ' •' +(C-C-.)'iM„ ••' dt2dtx. (2.1.13)0 0 0

Bound. If/has a bounded w-th derivative on the closed convex domain H containing the setof data points Z, then [5]

|A^/|<-VmM|/->ft)|. (2.1.14)

AqY may be represented in other ways, such as by contour integrals [5];, however, theseare not needed here. The reader is referred to books on the finite difference calculus byMilne-Thomson [12], Gel'fond [5], or Jordan [7] for more details on divided differences andtheir uses.

= 0^ el

2.2 Divided differences for some special functions.

Before proceeding to develop other methods for calculating divided differences of generalfunctions /, it is worthwhile to see how the classical formula (2.1.8) can be employed on somespecific functions. Later, these results will be used to develop formulas for more general functions.

Following the notation of Davis [2], let f=\p be the p-th power function, }P(0=CP.Because the notation introduced in the previous section suppresses variables, clarity demandsthat every function have a name. The linearity property of the divided difference allows

11

immediate extension of results for \p, any p ^ 0, to general polynomials. The well-known formula [12] which results from (2.1.8) is

In particular,

A0r= I*„+*,+ • • • +kn-p-n

ks>Q

Co*°Ci' ' ' ' C>, P>n

A0T=0 if p<n,

A0T=1»

A0THI = IC>.y-o

Writing (2.2.1) as a system of equations for &oV+k> 0</c<w, yields

CoAft'

Ajt^1

A0V+n

Co Ci

Co Ci C

(2.2.1)

(2.2.2)

In fact, all the differences A/t'+* for fc=0,1, . . ., n-j and y=0,1 ncan be expressedcompactly as a matrix,

A0V Ai|"+I

A,V

Co Ci

Ci

in

in

in

ntp+nor

An-l|/»+i»-l

A.V

Co Co

Ci

Co

Ci

in

1

(2.2.3)

(2.2.4)

The alternative form (2.2.4) results from the symmetry property. The matrix representationidea will be employed extensively in §2.6 where another expression similar to (2.2.3) and(2.2.4) will be developed.

Formula (2.2.2) leads directly to a simple, but elegant, procedure for computing theA0fct/,+*, 0</c<«, recursively in p, and so divided differences of any polynomial may be formed.

12-

Algorithm 2.1: Recursive computation of h0k]p+k.

1. Initialize A0*T*= 1, k=0, l,...,n

2. Forp=l,2,...

ôk]p+k=ikMV+k~l^^tl}p+k~l> *=0,1,(A// = 0for*<0).

,n.

Forp = 0,

Aft0-A<H'«Aft2sl

For/?°=l,

Aft^Co'AftÂo-'t^CoAo1t2 = CrA01I1 + Aft1-Ci + Co

A02?3 = C2'Ao2t2 + Ao1t2 = C2+Ci + Co

For p = 2,

Ao0T2 = Co-Aft1 + A6-1t1 = Co2Ajt3= CrA<Jt2 +Aft2 = Cf + CiCo+Co2A02t4 = C2-Ao2t3+Ao1t3 = C22+C2Ci +C2Co+Ci2+CiCo+Co2

Fig. 2.2.1: First couple of steps of Algorithm 2.1 for n= 2.

The computation requires w+1 multiplications for each p. A companion procedure forA*-*K+\ 0<fc<«, comes from (2.2.4).

Algorithm 2.1': Recursive computation ofA%~k]p+k.1. Initialize Ak-k\k= 1, k=0,1, . . . , n

2. For ^=1,2,...*!i-kV+k-in-k^n:-kV+k-l + LtLlV+k-1, *=<U. • • • .n.

For/> = 0,

Aft0-Aft1-Aft2 si

For/? = l,

Aft^frAftÂsT-CiA11t2 = CrA/t1+ A20t1 = Ci + C2A02f3 = Co-Ao2t2 + A,1t2 = Co+Ci + C2

For p = 2,

Aft^CrAft' + Aj-T-ClA/t3 = CrA,,T2 +A20t2 = C.2 +C.C2+C22A02t4 = Co-Ao2t3 +A,1t3 = Co2+CoCi +CoC2+C?+CiC2+C22

Fig. 2.2.2: First couple of steps of Algorithm 2.1' for n= 2.

13

2.3 Difficulties in computing divided differences.

Divided differences have acquired a poor reputation in numerical analysis. That this reputation is to some extent justified is illustrated by the following example.

Divided differences by recursive formula

C A°exp A'exp A2exp A3exp A4exp

0.00

0.25

0.50

0.75

1.00

1.000

1.284

1.649

2.117

2.718

1.136

1.460

1.872

2.404

6.480E-1

8.240E-1

1.064

2.347E-1

3.200E-1

8.530E-2

c

o.oo

0.25

0.50

0.75

1.00

Correct value of divided differences to 4 digits

A°exp

1.000

1.284

1.649

2.117

2.718

A'exp

1.136

1.159

1.873

2.405

A2exp

6.454E-1

8.287E-1

1.064

A3exp

2.444E-1

3.138E-1

A4exp

6.942E-2

Fig. 2.3.1: Example of loss of accuracy in computing divided differences.

The difficulty is one of finite precision arithmetic. That is, / is smooth enough and theabscissae are close enough together that the implementation of the recursive definition (2.1.8)involves not only subtraction of nearly equal quantities, but also division by small quantities.Computed divided differences which are not small and contain few, or even no correct digitsresult. The possibility of computing divided differences using schemes other than (2.1.8) willbe investigated following a discussion of the above remarks.

Subtractive cancellation. Suppose A0*+1Z is computed using (2.1.8). Let the abscissaeZ={Co.Ci» • • • .Cn) and A0*/be representable exactly in the computer, and assume all arithmetic is performed exactly. Let flfaff) =Af/O+e), for some small relative error €, be theavailable or previously computed floating-point value of Af/ Then

W/) - AW+.0 - A'/a+6)7A°fc/ - Ao-1/ + cAf/

C*+i - Co w ik+\ ~ Co

When |A//-Ady| =2-lAf/|, |6,|-2,|e|. For / <0, flUo+lf) has, at least, as much relativeaccuracy as the data it is formed from. However, when / is significantly greater than 0, the relative accuracy of y7(Af+l/) is worse than for flU\A- A,*/and A0Vhave leading digits in common and subtractive cancellation occurs. That is, about / leading binary digits cancel, and whenthe resulting answer is normalized to obtain a non-zero leading digit, t zeros are appended atthe right. If Af/is given to pcorrect digits, A0A+'/is computed to about p-1 correct digits.

This phenomenon may have devastating effect on higher order divided differences. Whenan early difference has already lost /, binary digits, hl^'M, and this difference is used to

(2.3.1)

14

Subtractive cancellation with small divisors, 6 digit arithmetic

C A°tan ^(A'tan) y?(A2tan) A°tan A'tan A2tan

0

tt/4tt/4 + .OOI

0.00000

1.00000

1.00200

1.27324

2.00000

9.24163E-1 0.00000

1.00000

1.00200

1.27324

2.00200

9.26709E-1

Subtractive cancellation but no small divisors

C A°tan TKA'tan) ^(A2tan) A°tan A'tan A2tan

0

n/45tt/4 + .001

0.00000

1.00000

1.00200

1.27324

6.36417E-4

-3.23983E-1 0.00000

1.00000

1.00200

1.27324

6.37055E-4

-3.23983E-1

Fig. 2.3.2: Large relative errors in small differences may not be serious.

compute a subsequent divided difference where t2 binary digits are lost, tx +12 digits may belost in the computed result, |«2l == 2'1 '2|c|. Clearly, all accuracy in the recursive computationmay be quickly lost.

Loss of relative accuracy may, or may not, be a disaster. If these differences are requiredas an end in themselves, there is little hope. Usually, however, they are used in additionalcomputations; and errors, even relatively large ones, in quantities which are small relative toothers involved in the calculation at hand need not be serious. If a computed divided differenceis small relative to other divided differences of the same order in the table or is small relative tosome given criterion, this difference may be acceptable, even if inaccurate. Subtractionbetween nearly equal numbers always leads to a result smaller than either of the originalnumbers; so subtractive cancellation in (2.1.8) tends to create inaccurate, but small, divideddifferences.

Small divisors. When can relatively large, but inaccurate, divided differences be obtained?The division by the difference of the abscissae is clearly the culprit. Specifically, if|C*+i - Col = 2~«, then

WA(f+1/)-A0fc+,/| = 2^||A1fc/|.

For q > 0, the absolute error in fl(^i+l) is large relative to Af/, and if / > 0 also, the error islarge relative to A<J+1/as well. Because q >0 requires Ao*+1/to be larger in absolute valuethan |Ai*/| when cancellation is absent, the combination of close abscissae and subtractive cancellation of the intermediate divided differences is what is to be feared.

The devastating effect of close data points is most apparent for divided differences ofsmooth, slowly varying functions. The divided dfferences, like the derivatives, of such functions are also slowly varying under small changes in the data points; so formula (2.1.8) willexperience subtractive cancellation between the intermediate divided differences in its numerator. Fig. 2.3.3 contrasts the effect of the same "close" abscissae for the "rapidly" varying function /=expio, exp10(C) = e10c> with the less rapidly varying /=exp of Fig. 2.3.1. Formula(2.1.8) gives results correct to a full four digits for exp10. Hence for smooth, slowly varying

15

Divided differences of expj0 by the recursive formula

C A°expl0 A'expio A2expI0 A3expl0 A4exp,0

0.00

0.25

0.50

0.75

1.00

1.000

1.218E+1

1.484E+2

1.808E+3

2.203E+4

4.472E+1

5.449E+2

6.638E+3

8.089E+4

1.000E+3

1.219E+4

1.485E+5

1.492E+4

1.817E+5

1.668E+5

Fig. 2.3.3: The recursive scheme works well for rapidly varying functions.

functions, close abscissae lead to subtractive cancellation; but exactly what "close" is depends onthe function.

There are two fundamental situations in which divided differences appear. (1) Given theabscissae and a table of function values at the abscissae, compute the divided differences. Noknowledge of the expression for the function is required; but also if the data are not given tosufficient precision, (2.1.8) will fail and there is no way to obtain desired accuracy in the result.(2) Given an expression for the function /and the ability to compute /s value and derivativesat any point in its domain, compute the divided differences of / The question must be asked,"What advantage can be taken of this knowledge about /?" The following section will showthat for some elementary functions, A0"/can be computed accurately in such a way as to avoidthe difficulties indicated in this section.

2.4 Power series for A of.

For a special class of functions /an alternative method to (2.1.8) for forming divideddifferences may be employed. This new method works particularly well when the abscissae arenearly confluent. The special class of functions includes many of the elementary functions.This method will now be presented and error bounds given.

In this section, /is a function which is represented as a power series about the point C= afor all Cinside the open disk |C - a\ < R. Expanding on the notation of Davis [2],

/-I«^y-0

(2.4.1)

Here \Ja(0 = ]j(i - a) = (C - a)J. In complex variable theory it is well-known that on the disknr = {C|r>|C-a| and r<R),(1) the power series (2.4.1) converges absolutely and uniformly to /,

aj=/^)(fl)/y! for 7=0,1,2,...; the power series is the Taylor series of /,/is holomorphic;

term by term differentiation is possible, giving a power series converging to /' on ilr.This last fact extends directly to divided differences, and yields a formula similar to that inBrent [1] for which Taylor's theorem with remainder was used.

(2)

(3)

(4)

16-

Theorem 2.1: Power series for divided differences. Let /have the power series representation (2.4.1) on the disk \i~a\ < R. Then if the set of abscissae Z= (Co,Ci. . • • ,in) liesentirely within the disk,

Aow/=E«,Ao''?;> (2.4.2)

proof: Let sm =âj]Ja. By fact (4), for any nmax\/tt)(i)-sn\")(i)\-*0 asm-" whenJ-0 {€Ilr

r < R. As r may be chosen such that Z C ftf, we have by (2.1.13)

|AoY- AfoJ < J J ••• J |/w)(o>) - sJWfeOl*. •••«*2<ft,0 0 0

^-V-maxl/Ô-si^COHO»'• cenr

as w —oo, where a> = Co+î(Ci-Co)+ • • ' +t„(C„-t„-\) € ft, because ft, is convex. D

The computation of divided differences for functions such as exp, sin, and cosh is quitestraightforward because the power series coefficients are easily obtained. (2.4.2) may also beused to form divided differences of other functions, such as </" and log; however, great care isrequired to find a series representation whose circle of convergence includes all the abscissae.This limitation to the circle of convergence cannot be over-emphasized. The same power seriesrepresentation must converge to /at each abscissa.

Algorithm. Algorithm 2.1 of §2.2 and Theorem 2.1 lead immediately to a method for comput-m

ing Aq*/ 0<k^n. Let sm = 2ajta be the partial sums off. Also recall the translation invari-y=0

ance property (§2.1).

Algorithm 2.2: Power series for A£/.

1. Initialize Aft^= 1, ^^sk = ak, k=0,1, . . . , n

2. Forp=l,2,...

Afta'+*-(Ct-fl)-A^#+*-1 + Af-ITr*-1

A6*sp+* = A6V*-i+ap+ik-A6*t£+*. /f=B°'1' • • • '"•

A companion algorithm may also be derived from Algorithm 2.1'.

Algorithm 2.2': Power series for A*_*/

1. Initialize Ak-k\£=\, A*_*s* = afc, k=0,\, ...,n

2. For/?=l,2,...

*k-k}rk=(in-k-a>*!i-klS+k-l+*^l!+k-1AJ Atyu-Aj kSpU 1+«rU.ii ,1^*. A--0.1 n.

Exclusive of the coefficient evaluations, the scheme requires 2//+2 multiplications per p-step.

Error bound. A bound on the computational error in Algorithm 2.2 may be given. Let8= max|C/— a\. To keep the form of the bounds simple, assume the arithmetic is such that

17

For/? = 0,

AftJ-AM-Aftisl

A035o = a0, A01Si = a,, A,2^ = a2

For p = 1,

Ao°ti - (Co- fl)'AfT?+Ao"1!? = Co"*A0'.V| - A(?.y„ + «rA(y|J«ao+a,(Co-fl);

Ao,lfl2»(Ci-«)-Ao,li + A00tJ = Ci + Co-2fl

A0,52 = A015i+a2,Ao,tfl = ai+a2(Ci + Co-2fl);A02tfl3=(C2-a)-Ao2ta2 + A01tfl2 = C2 + Ci + Co-3flA02s3 = A02s2 + a3,Ao2ti = «2+ «3(C2 + Ci + Co-3a).

Fig. 2.4.1: First step of Algorithm 2.2 for n = 2.

l/id«/)-E«,K«Lkl. (2A3)i i /

In keeping with Wilkinson [20], e < 1.06x (machine precision) and ./?(#) is the computed valueof g. The following error bound can be developed (Appendix A).

^/)-Ao"/I^I(P+1)("+:,)"—18' 0.4.4)For example, when /= exp,, exp,(C) = e'{,

l/?(A0wexp,) - A0wexp,| <ee'Hl +tb)^^-. (2.4.5)This bound is small relative to \f(")(a)\/n\= t"\eta\/n\ when /8 is small, tb can be made smallfor nearly confluent abscissae by choosing a so as to minimize 8.

Close data points and smooth functions /constitute the difficult case mentioned in §2.3.Algorithm 2.2 appears to fill the gap in computing divided differences for nearly confluentabscissae. One expects the recursive definition (2.1.8) and the results of this section to besufficient to form divided difference tables for the elementary functions. That this expectationcannot quite be realized is demonstrated in the next section.

2.5 Inadequacy of the series and recursive formulas, the question of "close" abscissae.

The arguments of §2.3 indicate that divided differences computed using the recursivedefinition (2.1.8) may be unacceptably inaccurate when /is smooth and the abscissae are too"close". The Taylor series algorithm 2.2 of the previous section appears to remove thisdifficulty for some /because the error bound (2.4.4) on the algorithm is smallest when the datapoints are close. The idea is to order, at least crudely, the abscissae such that close abscissaeare clustered together in the ordering, and well-separated abscissae are well-separated in theordering. Then recalling the pattern of dependence of elements of the divided difference table,the portions of the table depending on the clustered data points may be computed using theTaylor series algorithm and the remainder of the table filled in using the recursive definition.This process is discussed in detail in §2.8 and is illustrated in Fig. 2.8.2.

There are two difficulties here. (1) There must be freedom to order the data points asdesired. The amount of freedom depends on the use contemplated for the divided difference

18

k abscissae A0Aexp, Aoexp using Aoexp using Aoexp using

Ca correct to 7 recursive scheme Taylor series Algorithm 2.4decimal digits after k = 8 Algorithm 2.2 of §2.7

0 -13.0 2.260329E-06 2.260332E-06

1 -12.5 2.932648E-06 2.917451E-06 2.932650E-06

2 -12.0 1.902471E-06 1.905751E-06 1.902472E-06

3 -11.5 8.227822E-07 8.215427E-07 8.227824E-07

4 -11.0 2.668782E-07 2.669856E-07 2.668782E-07

5 -10.5 6.925181E-08 6.925365E-08 6.925180E-08

6 -10.0 1.497504E-08 1.496362E-08 1.497505E-08

7 -9.5 2.775608E-09 2.777049E-09 2.775609E-09

8 -9.0 4.501490E-10 4.501490E-10 4.501040E-10 4.501491E-10

9 -8.5 6.489361E-11 6.489360E-11 6.490737E-11 6.489364E-11

10 -8.0 8.419572E-12 8.419580E-12 8.420343E-12 8.419573E-12

11 -7.5 9.930829E-13 9.930800E-13 9.930225E-13 9.930829E-13

12 -7.0 1.073723E-13 1.073727E-13 1.073735E-13 1.073724E-13

13 -6.5 1.071611E-14 1.071609E-14 1.071603E-14 1.071611E-14

14 -6.0 9.931098E-16 9.931271E-16 9.930869E-16 9.931100E-16

15 -5.5 8.590019E-17 8.590612E-17 8.590039E-17 8.59002 IE-17

16 -5.0 6.965660E-18 6.951898E-18 6.965612E-18 6.965660E-18

17 -4.5 5.316202E-19 5.402473E-19 5.316202E-19 5.316201E-19

18 -4.0 3.831926E-20 3.486964E-20 3.831920E-20 3.831926E-20

19 -3.5 2.616686E-21 3.650387E-21 2.616686E-21 2.616687E-21

20 -3.0 1.697500E-22 -6.986900E-23 1.697499E-22 1.697500E-22

21 -2.5 1.048766E-23 5.010054E-23 1.048766E-23 1.048766E-23

22 -2.0 6.185062E-25 -L792933E-24 6.185061E-25 6.185063E-25

23 -1.5 3.489027E-26 -1.236419E-24 3.489027E-26 3.489027E-26

24 -1.0 1.886172E-27 6.496526E-25 1.886172E-27 1.886172E-27

25 -0.5 9.788799E-29 -1.931645E-25 9.788798E-29 9.788797E-29

Fig. 2.5.1: Top row of A exp using several methods.

table. For the matrix function problem, this point is considered in §3.1. (2) It is not clear what"close" means when speaking of clustering the abscissae. This can be illustrated by the examplein Fig. 2.5.1.

The divided differences in the example were computed, from the 9-th difference on, usingformula (2.1.8). All divided differences before the 9-th were formed exactly by the formula

A0Aexp= •±re"$e"-$/a)k (2.5.1)

where the set of abscissae Z^= [a,a+a, . . . ,a+ka, . . . ,a+na) is evenly spaced. Even with aspread 8H = |C«~Col°" 12.5, Atfexp cannot be computed accurately by (2.1.8) given the numberof digits available. The Taylor series algorithm 2.2 might be employed instead, but the errorbound (2.4.5) when 8 = 8„/2 = 6.25, is unacceptable. Actually, Algorithm 2.2 yields an excellent value for A025exp because the abscissae arc symmetrically spaced about the Taylor seriesexpansion point. However, some of the lower order divided differences are not nearly as accurate. These differences may be computed using Algorithm 2.2 about other expansion points,but the advantage of obtaining all Aoexp, A:=0,1. .... 25, in one computation is lost. So, C„ istoo close to Co to use (2.1.8) here; but the abscissae are too well-separated to use Algorithm 2.2

19

efficiently. To illustrate why this difficulty arises, consider the following two special cases.

1. Suppose the abscissae are real, increasing and evenly spaced. Formula (2.1.8) is used. IfA(fexp=-^-e"[(?"-l)/a]A and Afexp= •j-lea+t'[(e"-\)/a]kare given exactly, then

Ao+1exp =±eo^[(e<.-l)/a]k_ ±ea[u,,_l)/a]>

(k+\)a a(k+\)\e"-\

k

U>"-1)12.5.2)

where common factors have been removed. In binary arithmetic subtractive cancellation willoccur when a < log2. [In principle, ea-1 can be computed very accurately for small «, e.g. usethe Taylor series. However, (2.1.8) avoids explicit use of the function exp and forces subtraction of (possibly) nearly equal quantities.] Thus, even if all fc-th differences are given to fullmachine accuracy, formula (2.1.8) will suffer a loss of information at each further step whentv < log2. If n is large enough, 8 = 8„/2= na/2 will also yield too large a bound on the Taylorscries error (2.4.5) to permit safe use of that method. A0Mexp can be computed here by neitherof the methods yet discussed.

2. Suppose the abscissae are Z= {0,0, . . . ,0,8,,} where b„ > 0.

Ao'exp A|-'exp - Ao" 'exp 1A—i/+l

k\

1

(#1-0!= 1

A=n

k~n

k\(2.5.3)

The power series formula (2.4.2) was used to expand A," 'exp. In binary arithmetic, subtractive cancellation will occur in the numerator when

8>-D!-I-n-<ik\(2.5.4)

n 10 20 30 50 100

«• <6 <11 <16 <26 <51

Fig. 2.5.2: Upper bounds on 8„ for possible cancellation in example 2.

The table in Fig. 2.5.2 indicates that when 8„ < '/zn+l, approximately, subtractive cancellationwill occur in using formula (2.1.8).

The examples illustrate that, in general, there is no simple a priori answer to when (2.1.8)will, or will not, fail. In practice, this forces a choice. (1) Try to find an a priori criterion, sayId-Col > n* fo' wnen (2.1.8) is assumed to work; or (2) compute the divided difference tableby (2.1.8) and monitor the error,and if failure occurs for some difference (i.e. an error estimatebecomes too large) do something else to get that difference. The former is simple, but unsure;the latter is sure, but more complicated and possibly more costly. §2.8 expands these ideas anddeals with the specifics of computing the entire divided difference table.

The above examples also show that another method for computing divided differencesmay sometimes be needed. The next sections attempt to supply one.

20

2.6 The divided difference table as a matrix.

The divided difference table (§2.1) of / for the abscissae Z= Co.Ci-expressed as an w+1 x «+l upper triangular matrix.*

/(Co)

A/ =

AJ/ A,?/

f<M A,1/

/(C2)

Define also the special n+lxn+1 bidiagonal matrix

Co 1

C. 1

Z =

Opitz [14] calls this a "steigungsmatrix".

Ci,-i 1

Ci,

Theorem 2.2: "The divided difference table is a matrix function."

A/ = /(Z)

A0"/

A,-'/

A2"-2/

/(c„)

C„ may be

(2.6.1)

(2.6.2)

(2.6.3)

proof: The conditions for the existence of the divided difference table A/(§2.1) and the Newton polynomial of/(Z) (§1.2) are identical.

H-1

/(Z) =/(Co)/ + A0,/(Z-C0/)+ ' • ' +^f[[{Z-lkl)A-0

Because Z-C*/, all k, are bidiagonal, JJ(Z-C*/) f°T m<n—\ has (0,w) element zero. The*-o

ii-i

(0,w) element of JJ(Z-Ca^) Is one; so, the (0,«) element of/(Z) is Ao"/ which is also theA-0

(0,«) element of A/ Because of the pattern of dependence of the elements of the differencetable, the result holds for every element of the table. •

The theorem has several important and useful consequences.

1. IfZ = {C0,Co Co),

t Ajr/will be written when the set of abscissae musi be emphasized. Recall that If, no superscript, is a matrix and A "/ is a scalar.

21

/(Co) /'(Co)

/(Co)

^nco)/'(Co)

/(Co)

1 An)-^-/'"(Co)

r/'^'HCo)(»-t)!

A/ =

/(Co)

This is the well-known special form for the function of a Jordan block.

Define ./;(C)=/(/C), then

A/, = f(tZ). (2.6.5)

Define D = diag(U,f2, . . . , t")y a diagonal matrix, and tZ = {/Co.'Ci. • • • »'C„h then

Az/^D-A^/-/)-'. (2.6.6)

proof: Az// = A/, = /(rZ)=/(Z)Z/Z)-1) = D/(Z/)/)-' = i)•A,z/•Z)-,, where

/Co 1

'C. 1

Z,=

'Ci,-i 1

tin

D

When/-^ f^O^C, then

Af = zp.

This leads immediately to algorithms that resemble those of §2.2.

Algorithm 2.3: Recursive computation of Aof'-

1. Initialize A00T°=1, A<ft°=0, *=1,2 ,n

2. For/>=l,2,...AoI^Ca-AoI^' + Ao*-'^-', A-=0,1, ...,«.

(A//=0for*<0).

There is also a companion algorithm.

Algorithm 2.3': Recursive computation of Af_ftfp.

1. Initialize AJt°=l, A*_A.t°=0, *=1,2 n

2. Forp=l,2,...A* a1"=C(( a-a;.aK , + Ai-Ali,l"-1. A—0.1..

(2.6.4)

(2.6.7)

Algorithm 2.3 produces the lop row of A]'* for /»=1.2 while Algorithm 2.3' gives the righthand column of A|p.

22

2.7 A scaling and squaring method for the exponential.

Theorem 2.2 suggests that special properties of the function / may be used to form thedivided difference table. When /=exp„ exp,(£) = ?'«, consequence 2 of the previous sectionbecomes

Aexp, = exp(/Z) = e,z. (2.7.1)

The expression on the right of (2.7.1) immediately suggests trying to power the differencetable;

.exp, [Aexp2 if]v, for integer j^ 0. (2.7.2)

If the table Aexp2 l( is obtainable to suitable accuracy by some method for some j > 0, whileAexp, is not, a "scaling and squaring" procedure may be employed. Ward [19] suggests this ingeneral for v,A% but here its use is confined to special matrices.

The table Acxp2 , may be computed using the Taylor series algorithm 2.2 on /=exp2 ;/.A relative error tolerance Tj€ =ee2~Jlb(\+2~'tb)/(2-e2~J,h) may be given for Algorithm 2.2,where j is chosen as the smallest non-negative integer such that (Appendix B)

2-'/8 > 0.2209. (2.7.3)

Algorithm 2.2 produces only the 0-th (i.e. the lop) row of the divided difference table. However, it is not necessary to employ the algorithm on every row, as sufficient information is nowavailable to fill out the rest of the table by running the recursive scheme (2.1.8) backwards.

Backfilling the table. If the divided differences A^/for k=0,1,k=0,1, . . . , a-/and /=1,2, . . . , n may be computed recursively by

A//=(Ci+A-C,-,)Aa,/+A<Ll/.

X X X X X

5 4 3 2 1

9 8 7 6

, n are known, A,A/for

(2.7.4)

Fig. 2.7.1: Possible order of backfilling using (2.7.4) given 0-th row of table.

As use of the backfill scheme is contemplated only when the forward scheme (2.1.8) fails,there is no danger of introducing large errors due to subtractive cancellation. The reason forthis is quite simple. If (2.1.8) fails, then (§2.3) subtractive cancellation between some A*/andA/L,/has occured. This means that A/A/=A^_|/, at least in the first few digits. But by(2.7.4), (C/+A-Ci-i)A*-V/must be significantly smaller in magnitude than A*.,/ Hence, sub-tractive cancellation in (2.7.4) cannot occur when subtractive cancellation occurs in (2.1.8).

A method for computing Aexp, for any set of abscissae Z = {£o.£i £«) is now evident.

-23-

Algorithm 2.4: Scaling and squaring for Aexp,.

1. Find j to satisfy (2.7.3).

2. Compute the A0AexprV A:=0.1. ...,/;, by Algorithm 2.2.

3. Backfill the table Aexp2_// using (2.7.4).

4. Square the divided difference table matrix Aexpr// j limes.

This method requires Oi^-jn3) multiplications. Most of the work is in step 4; the first three6

steps require only 0(n2) operations. A similar algorithm may be developed from Algorithm2.2\

Error bound. A quite satisfactory error bound is possible for Algorithm 2.4 when all the abscissae are real. In this case, Aexp, > 0.* If e is as defined in §2.4 and the arithmetic satisfies thefollowing condition, also given in §2.4,

|yKZ*/)-2>il<«Ekl. (2.7.5)I I I

the following, element by element, bound on the error in the computed divided difference lable.//(Aexp,) is possible (Appendix B).

[//(Aexp,) - Aexp,| < e[2%7-+ 1) - l]Aexp, (2.7.6)

In particular, the relative error for each computed divided difference in the table is bounded by

e[2'(T,+ l)- 1J. (2.7.7)

This indicates a loss of about j binary digits in the squaring operation. Fig. 2.5.1 gives anexample of Algorithm 2.4 in the real case. Here j = 4 and 7,== 3.936. In arithmetic to sevendecimal digits, the relative error in using Algorithm 2.4 is always less than 78e < 4x10 5.

If the abscissae are complex, such uniformly satisfactory bounds are nol possible. Thebound analogous to (2.7.6) is

[//(Aexp,) - Aexp,| < cfe-'^+ l) - l]-|Aexp2 J2'. (2.7.8)

This is satisfactory only when the elements of |Aexp2_y |2-/ are not much larger than thecorresponding elements of |Aexp,|.

When Z={Co,Ci ,£„) and X= {Re(C0).Re(Ci) Re(C„)}, the integral representation (2.1.13) yields

|A£exp,|< A&({o>exp,. (2.7.9)

This gives another bound on the error,

[//(Azexp,) - Azexp,| < e[2'(Ty + l) - 1]-Axexp,. (2.7.10)

tMatrix inequalities B ^ C hold element by element: \B\ is ihe matrix whose elements arc the absolutevalues of the elements of B.

24

2.8 The complete divided difference table.

The previous sections have presented methods which permit the computation of the complete divided difference table for some elementary functions / This section now outlines theentire computation.

(1) Ordering the abscissae. As discussed in §2.5, the abscissae should be ordered such thatclose abscissae are also close in the ordering, and well-separated abscissae are well-separated inthe ordering. This is to facilitate both the use of the Taylor series algorithm 2.2 on the clustersof close abscissae and the use of the recursive formula (2.1.8) on the remainder of the table.Only a crude ordering (clustering) is needed; as best orderings of complex numbers are seldomobvious, this leeway is valuable.

If the given problem requires the abscissae be in an order where close and well-separatedabscissae may interlace, none of the presented methods is sure to work well. However, if it ispossible to form a divided difference table for some desireable ordering of the abscissae, theneither (2.1.8) or (2.7.4) may be used with the symmetry property to construct, element by element, the required table from the one at hand.

(2) Natural clustering. If there is a simple a priori estimate of when (2.1.8) may be usedsafely, say |C,-C/I > gO-J.f) for some non-negative function g, the procedure becomes quitestraightforward. Given an ordering of the abscissae, let all abscissae C/, £/+!»—» C, satisfyingICi—C/l ^g(i-j,f) be a cluster upon which (2.1.8) must not be used, instead, the Taylorseries algorithm 2.2 and the backfill formula (2.7.4) may be employed to form the block of ihedivided difference table corresponding to the cluster. [If the spread of the abscissae in the cluster is so great that the series error bound (2.4.4) is excessive, the ideas of §2.6 may sometimesbe employed.] Each cluster can be treated in ihis manner. The divided difference table willnow resemble the matrix in Fig. 2.8.1.

xxx

xxx

X X

X

X X

X X

X

X X

X X

X

Fig. 2.8.1: Block structure of A/ corresponding to clusters.

(3) Analytic computation of the diagonal. The diagonal (0-th order divided differences) ofthe table may always be computed directly from the function; i.e. A?/=/(£,) for/=0,1 n. For some functions, the first super-diagonal may also be computed analytically.For example,

abscissae

0.00

0.25

0.50

5.00

10.00

abscissae

0.00

0.50

5.00

10.00

0.25

abscissae

0.00

0.50

5.00

10.00

0.25

1.000

25

Given Divided Difference Table

1.136

1.284

6.454E-1

1.459

1.649

1.183

6.559

3.261E+1

1.484E+2

4.503

4.622E+1

4.572E+2

4.376E+3

2.203E+4

Correct Rearranged Divided Difference Table

1.000 1.297 6.263 4.509E+1 4.503

1.649 3.261E+1 4.572E+2 4.622E+1

1.484E4-2 4.376E+3 4.456E+2

2.203E+4 2.259E+3

1.284

Rearranged Table Using Recursive Scheme Only

1.000 1.298

1.649

6.262

3.261E+1

1.484E+2

4.509E+1

4.572E+2

4.376E+3

2.203E+4

3.640

4.600

4.457E+2

2.259E+3

1.284

abscissae Rearranged Table Using Both Schemes

0.00

0.50

5.00

10.00

0.25

1.000 1.298 6.262 4.509E+1

1.649 3.261E+1 4.572E+2

1.484E+2 4.376E+3

2.203E+4

4.503

4.622E+1

4.457E+2

2.259E+3

1.284

Underlined element was computed by the backfill scheme (2.7.4) using the given lableand the symmetry property to obtain 4.503

Fig. 2.8.2: Rearranging the divided difference table Aexp.

A/exp, = e,(g<+C)+1)/2sinh[KCHi-C,)/2]

(Ch.|-C,)/2. C^C,+1 (2.8.1)

= /<• *,-£,

(4) Recursive formula. The results of (2) and (3) permit the remainder of ihe lable to befilled by ihe recursive formula (2.1.8) by working outwards towards the upper right. Thediagram in Fig. 2.8.3 illustrates one way of proceeding.

26

Fig. 2.8.3: Filling out the table by the recursive formula.

Summary. The above ideas are summarized in the following algorithm.

Algorithm 2.5: The divided difference table A/.

1.

2.

4.

Order the abscissae Z = {Co»Ci. • • • >i„) such that close abscissae are close in theordering, well-separated abscissae are well-separated in the ordering.

Use an a priori estimate to determine clusters of "close" abscissae. Use Algorithm2.2 and formula (2.7.4) (or the ideas of §2.6) to form blocks of the tablecorresponding to the clusters. [If/=exp„ use Algorithm 2.4.]

Compute all 0-th (and first) order divided differences analytically. [If /=exp„ use(2.8.1).]

Fill the remainder of the table by the recursive formula (2.1.8).

Adaptive computation of Af. If the a priori estimate of step 2 is not employed, step by stepmonitoring of error growth for (2.1.8) is required. For example, if

WAff) ~ A//I < e,* and L/?(A,*+1/) - Afi.,/| < €&, (2.8.2)

where €/*= 0 for k < 0, then

|yKA,*+y)-A/*+1/l<€,*+1 = e'*+, +6'AICi+A+i —Cil

+ e-l/KA,*+,/> (2.8.3)

for /=0,1, . . . , n-k-l and k=0,1, .... n.

The idea is as follows. The diagonal (and first super-diagonal) of the table is formedanalytically and the initial error bounds €,° (e/) are obtained from (2.8.3). The recursive formulas (2.1.8) and (2.8.3) are now used.1 If at any step, say for A//, ek exceeds someprescribed error tolerance, the computed A//is rejected and recomputed using Algorithm 2.2(or the ideas of §2.6). ek is reset to the error bound (2.4.4) of the series and the compulationis resumed with formulas (2.1.8) and (2.8.3). [If/=exp„ Algorithm 2.4 may be used lo getA*exp, and e/ reset according to (2.7.8).] Algorithms 2.2 and 2.4 reproduce some alreadycomputed differences as a bonus. These new values, if more accurate, may replace the old

twork outwards by diagonals or rightwards by rows to the top or upwards by columns to the right, as illustrated in Fig. 2.8.3.

-27-

ones. If €,* is still too large, the computation may have failed; however, as only the errorbound is too large, there is no assurance that the computed divided differences are in facl unacceptable. These ideas may now be summarized.

Algorithm 2.5': The divided difference table A/with error bound.1. Order the abscissae (as before).

2. Compute all 0-th (and first) divided differences analytically. Obtain the initial errorbounds, e,°= |A,°/|€ for /=0,1, /;.

3. Fill the table by (2.1.8) and compute an error bound for each element by (2.8.3).If the error bound exceeds a given tolerance for some difference, recompute thatdifference by Algorithm 2.2 (or the ideas of §2.6); reset the error bound by (2.4.4)and resume filling the table.

Whether the clustering approach or the error monitoring approach is more efficientdepends on the problem at hand. The former may fail without warning, and if the clusters arelarger than required or great accuracy is not needed, the very fast recursion (2.1.8) may beavoided unnecessarily. The latter approach may indicate failure when, in fact, failure has notoccurred. Also it is very fast, 0(n2) operations, when (2.1.8) may be used exclusively, but canbe very slow, up to 0(w4) operations, when recomputation of the differences is required aieach step.

The discussion of divided differences shows that for some elementary functions divideddifferences may be computed. This is what is required to make the Newton polynomial a practical approach for computing matrix functions. Now that forming the Newton polynomialcoefficients has been considered, the remainder of this paper is devoted to other details of thecomputation of f(A).

-28

3. Computing the Newton Polynomial of f(A)

3.1 Complex matrices.

The previous discussion indicated that for some elementary functions /, the coefficients ofthe Newton polynomial for f(A) may be computed. Indeed, these coefficients are just the 0-th(i.e. top) row of the divided difference table of/at the set of abscissae AA*= [k0,k]t • . . ,A„hthe eigenvalues of A.

A/ =

/(Ao) Aj/ ±lf ±if ±tf A05/ A^/

/Ui) A,1/ A,2/ A,3/ A?/ A,5/

Ak2) Aj/ A22/ a23/ A24/

Ah) Aj/

Ak4)

A37

Aj/

/Us)

A//

A42/

A5"/

/(x6)

Fig. 3.1.1: Newton polynomial coefficients (underlined) for AA = [k0,ku . . . ,A6).

The basic algorithm to obtain f(A) is now clear.

Algorithm 3.1: Newton polynomial algorithm for f(A).

1. Find the eigenvalues of A.

2. Order the abscissae (eigenvalues) as described in §2.8 and form the divideddifference table for /

3. Pick out the Newton polynomial coefficients from the table and compute

/U)=£A0*yin^-\7/).*=0 7-0

This is an 0(w4) procedure, the bulk of the work being in the matrix multiplications ofstep 3. Only three n+l x «+l storage arrays are required: one for A, one to accumulate /(/*),and one for the matrix products. The divided difference table may be formed in the, as yet,unused array for f(A). If error bounds t0a€, A:=0,1, ...,«, on the coefficients are available(§2.8),

k-l

Wftlf(A)] - f(A)\\ < e^raUiA-kjOWA=0 7-0

bounds the norm of the error in the final result if the matrix products are formed accurately.

In practice, /often depends on a non-negative parameter /, f(tA)=f,(A), where Aremains fixed, but many different values of / are given. Only the coefficients of the Newtonpolynomial for f,(A) depend on r, so, it seems worthwhile to study how the computation off,(A) can be arranged to lessen the work at each value of /. Such economies, and others, arenow discussed in order that a more practical algorithm may be presented.

(1) The Schur form. When n is not large, say /; < 100, the eigenvalues of A are usually foundby the QR algorithm which performs a sequence of unitary similarity transformations on A inorder to transform it into upper (lower) triangular form T. Wilkinson [21] gives a treatment of

(3.1.1)

-29-

the theory behind the QR algorithm; it is implemented in the EISPACK [17] package of computer algorithms. T is the Schur form of A, and the eigenvalues of A appear on the diagonal ofT.

A = P'TP or T=PAP*, P* = P~x (3.1.2)

From (1.1.2),

f,U) = P*'ft(T)P or f(T) = P-f,(A)Pm. (3.1.3)

Because A^ (=Ay), Pand Tare independent of the parameter /, the Schur factorization needbe performed only once at the beginning of the computation for f,(A). f,(T) may be computed at each step and f,(A) formed by (3.1.3) only when required. Thus, the problem may bereduced to computing a function of a triangular matrix.

(2) Ordering the eigenvalues. Typically, the QR algorithm produces a matrix T whose eigenvalues appear in almost monotonically decreasing order by absolute value along the diagonal.Parlett [16] has developed routines for reordering the eigenvalues of a triangular matrix intoany desired order by using unitary similarity transformations. Hence, Ts eigenvalues may beput in any order and the transformations necessary to achieve this may be absorbed into P.

(3) Products of triangular matrices. The problem is now to compute

Mn^t^fUiT-kjI). (3.1.4)A=0 7=0

k-l

The matrix factors II (T— kjl) for k=0,1, ... ,n are all upper triangular; and thus, f,(T) is7-0

also. The matrix products need 0(-~n3) operations per multiplication, as opposed to 0(n3) for6

full matrices.A-1

The matrix multiplications for the factors H(7'— kjl) for k=2,3, ...,/; do not depend7-0

on the parameter / and may, if storage is available, be performed once and stored for future useat each /-step. This requires an additional xh(n3+n2—n—2) storage locations. Then at each t-step, only l/2(n3+3n2+2n) multiplications are required to form f,(T)y once the coefficients ofthe polynomial are known. As shown below, this figure for storage locations can be reducedconsiderably. If the reduction is still not sufficient, the matrix multiplications may be performed at each /-step.

(4) Analytic computation of the diagonal. The diagonal elements of f,(T) can be computedanalytically. As 7}y = X7,

fWjj = Mkj) for y=0,1, . . . , n. (3.1.5)

For some / the first super-diagonal of f,(T) can also be formed analytically. For example, for7=0,1 ,n-\

tr\ t a 1 t /(A,+Ah.|)/2 Smh[t(kJ+]-kj)/2]exp,(n,-J+1 = 7}J+1A;exp,= TjJ+]e J J+l (x- -\-)/2 ' X'*X'+i (31'6)

= TJJ+\te ' > X7 = X7+1-

This reduces the additional storage requirements to '/:(«-*•-//) locations and the number of multiplications at each /-step (exclusive of the above evaluations and the coefficient evaluations) toV2(n3+n2).

k=l A-l

(5) Special form of the II(T-\jI). The matrix factors IJ(T-kjI) for £=2,3 n arej-0 7-0

30-

not full upper triangular matrices. This is one of the features that makes the Newton polynomial so very attractive and may be demonstrated as follows. Define S(I)= T-k()I. Siu has(0,0) element S0U& =0. S^^iT-koDiT-kJ) has not only S0l2i =0, but also Stf{ =0 andS\2\ = 0. In general, the upper triangular matrix

A-1

.S'*) = IJ(7'-A//) has S^'-O. j<k7-0

Fig. 3.1.2: Matrix S(4) when n= 8

(3.1.7)

With this knowledge, only

-k/z3 - n) storage locations

U3 + W + i/i3" '2" ' 6

are needed. This nice property requires that the divided difference abscissae used to form thepolynomial coefficients be in exactly the same order as the eigenvalues along the diagonal of T.The remarks under (2) show that the eigenvalues may be ordered to best advantage.

Summary. The complete algorithm may now be summarized.

multiplications(3.1.8)

Algorithm 3.2: Newton polynomial algorithm for f,(A), with ample storage.

1. Use the QR algorithm to compute Tand P.

2. Use Parlett's swap routines to order the eigenvalues of T to aid in forming the di

3.

vided difference table of/; update T and P.jt-i

Form and store the matrix products II^~M^ for /:==2'3» • • • »" usinS tney-o

economies indicated in (4) and (5).

At each /-step,

4. Compute the Newton polynomial coefficients (§2).

5. Use the results of steps 3 and 4 to compute the off-diagonal elements of f,(T),

6.

7.

A—I

MT)U = EAoVin^-V>u for i<j.k-\ /-o

Fill the diagonal (and first super-diagonal) of f,{T) analytically; e.g. use (3.1.5) and(3.1.6).

If desired, transform P"'f,{T)-P~f,{A)-

-31

The entire algorithm requires, with care, about yw3 +3w2 storage locations, and 0(-n3) operations per /-step (exclusive of step 7).

3.2 Real matrices.

All the remarks of the previous section apply equally well when the matrix A is real.However, if the methods presented there are used without change, complex arithmetic is usually required. This section will discuss how complex arithmetic can be avoided when computingf,(A) for real A.

(1) The real Schur form. When A is real, its eigenvalues are either real or complex conjugatepairs. The QR algorithm may be used to reduce Ato a nearly upper triangular form Tby use oforthogonal (real) similarity transformations [21].

A = PTTP or T=PA-PT, PT= P~l (real) (3.2.1)

Tis real and upper triangular except for some non-consecutive non-zero elements on its firstsub-diagonal.

T =

xxxxxxxxxxx

xxxxxxxxxxx

xxxxxxxxx

xxxxxxxxx

X X X X X X X

X X X X X X

X X X X X

X X X X X

xxx

X X

X X

Fig. 3.2.1: Example of nearly upper triangular real Schur form.

We may speak of Tas being block upper triangular, with lxl or 2x2 blocks along its diagonal.The lxl blocks are the real eigenvalues of A (and 7); that is if Titi is a lxl block, A,= Tltl.

The 2x2 blocks correspond to the conjugate pair eigenvalues. Let T T be such ablock. Orthogonal P may be chosen such that Thi= Ti+]J+h and this normalization will beassumed. Then the corresponding conjugate pair eigenvalues X, and ki+] = A., are the roois of

\2 - 27i,X + (7g- Ti+uTu+l) = 0. (3.2.2)It follows that Tu= 7;+iF,+1 = Re(\,) and Im(\,) = ±[-Ti+liiTu+lY/!.

The product of two real Schur matrices of the same structure is again a real Schur matrixof that structure. Hence, f,(T) is a Schur matrix with the same structure as T. Since

fi{A)s=PT.fi{T).p or fiiT)s=p.fiiA).Pri (3.2J)

f,(T) may be computed in order to avoid use of the full matrix A.

(2) The real Newton polynomial. The set of eigenvalues A,, now consists of real and conjugate pair elements. Generally, the divided differences formed from A^ and ./) will be complex;however, because of the symmetry property there are some important relationships.

A/ =

A"/ =

32

' /(Ao) ApV ±U Ao3/ Ao4/

/(X0) A// A,2/ A,3/

/a,) Aj/

/a2)

Aj2/

A3'/

-/a2)

/Uo> ±ifAk0)

A^/

A/7

aI?a7/-

Ao4/

a7?

/a.) a]7/(x2)

A|AAj/

/U2)

Fig. 3.2.2: Divideddifference table for A/( = [k0,k0,k]tk2,k2] is conjugate of table

for A/} = {X0»ô.î»^2»^2}- (real elements underlined)

The above_example shows that (A4/K\n,An,Xi,X->,X->) = (A4/)(\n,Xn.Xi,X'),X?) is real, while(&3f)(kQ,kQ,kl,k2) = (&3f)(kQ,k0,kl,k2). This suggests forming the Newton polynomial off,(T) using both the given ordering and its conjugate ordering, and then averaging the tworesulting polynomials. Using the abscissae of the above example,

/,(D = /(Ao)/ + A&AT-k0I) + Ao2/(7,-A0/)(r-\0/) + b03AT-k0l)(T- kîT-k^)

+ A04/(r-Ao/)(7,-A0/)(7,-A1/)(7,-A2/)

f,(T) = /(\0)/ + AoW-aV) +ÂT-koMT-koI) + Il/XT-koDiT-koniT- A,/)

+ A04/(7'-Xo/)(7,-W)(7,-î/)(7,-M).

and then the real Newton polynomial is -

f,(T) = Ke[f(k0)]I + &if[T-RQ(k0)I] + A02/(7--A0/)(7'-V)

+ Re(Ao3/)(7,-A0/)(7,-X~o/)(7,-M) (3.2.4)+ Ao4/(7,-X0/)(7,-Ao/)(7,-Xi/)[7'-Re(X2)/]

The coefficients of the real Newton polynomial are just the real parts of the coefficients of thegeneral Newton polynomial.

(3) Ordering the abscissae. In order for 7" to remain real, the conjugate pair eigenvalues mustbe ordered together. Parlett [16] has also developed routines for swapping the lxl and 2x2blocks along the diagonal of the real Schur T by orthogonal similarity transformations; so, theeigenvalue pairs may be ordered as desired. If some of the eigenvalues have large imaginaryparts, the abscissae for the divided differences cannot be taken in the same order without making a reasonable clustering impossible. It is suggested that the conjugate pair abscissae _beordered separately as in Fig. 3.2.3. The idea is that if A, and A, are close, so are A, and A,.The differences giving the Newton coefficients no longer appear in the top row of the table, butgo almost diagonally up to the right.

(4) Products of real Schur matrices. The matrix factors, e.g.S(4) = (r-A0/)(r-X0/)(7,-Xi/)[r-Re(A2)/], are aii reaj Schur matrices; so their formationrequires little more work than for triangular matrices. Also, they exhibit the same special

A/ =

33

' Ak2) Ao1/ A02/ A03/ A^/'

/(Ao) A// A^ AjV-

/Uo) Aj/

/a.)

A22/

AJ/

/(A2)

Fig. 3.2.3: Divided difference table for conjugate pairs, abscissae A,, = {X2, X0, kQ, Aj,A2}

(Newton coefficients are underlined)

property, the zeroing out of columns, that was seen in (5) of §3.1. In (3.2.4),S(2) = (7-_x0/) (T-k0I) has Sjf =0 for j < 2 (i,j=0,1 4);S(3)= (7,-X0/)(7,-\0/)(r-\1/) has Sfi^O for j < 3; and 5(4) has S^-0 for 7 < 3. Theslight difference from the purely triangular case of §3.1 is due to the presence of the 2x2blocks.

(5) Analytic computation of the diagonal blocks. As before, the diagonal blocksof/(D maybe computed analytically. If Tu= ki is a lxl block,

fWu-fM. (3.2.5)

7/.i 7/. 1+1If rp T where Tu= Ti+l ,+1 = Re(A;) is a 2x2 block corresponding to the eigenvalues

'1+1.1 {j+1,1+1X/> X,+i = X/,

/(n,, = /(r),+,./+, = Re|/(\,)]

and

/,(7%rH., = 7-/>/+1-Im[/,(\,.)]/Im(\,)//(D/+U = 7,/+1/Im[/,(\,)]/Im(X,).

In particular, if/, = exp, and X,- = a/+/£/, then

exp,(r)/#, = exp, (D,-+!,,+! = e^'costpj

exp,(T)u+l = Tu+le,a,(tfntfi,)/fi,

exp,(D,+iP, = r^^'feintf/ViS,-;and if X;= 27/ and X/+j= 7)+i>/+j are two adjacent distinct real eigenvalues,

exp,(7,)l(/+1 = Ti+le/U,+xj4,,)/2Sinh[/(X,.fi-X,)/2]

(X i+i X,)

(3.2.6)

(3.2.7)

(3.2.8)

(3.2.9)

With the modifications suggested above, Algorithm 3.2 may be carried over directly to thecase of real matrices A. The arithmetic is now real, except for the divided difference table formation. The only complication is that special attention must be paid to the peculiar form of 7!

-34-

4. Use of the Newton Polynomial Method

4.1 Other methods for computing f(A).

There are many possible ways to compute f(A). Moler and Van Loan [13] discuss nineteen different ways to compute exp, (,4); but unfortunately, many of the methods are unsatisfactory for numerical computation, and all of them prove unsatisfactory in some cases.Nevertheless, a combination of methods may prove the best way to proceed. One such possibility will be explored briefly here.

The Newton polynomial method presented in §§3.1 and 3.2 is not intended as a generalroutine for computing f,(A), but as a part of a larger program. When the matrix A has «=100,or even «=50, the Newton polynomial approach by itself is likely to be very costly. Sufficientfast storage for the matrix products is unlikely to be available (for w=50, —w3 = 41,667; for

//H00, •//3- 333,333) and 0(//4) operations will be required for each value of /. Only when

desired accuracy can be obtained from just the first few terms of the polynomial can thisapproach be considered attractive.

Parlett [15] has proposed a fast recursive method for computing f,(A) using the fact thatA and f,(A) commute. In brief, A is transformed to an upper triangular Schur matrix 7'andthe diagonal of f,(T) is obtained directly, as in (3.1.5). The off-diagonal elements of f,(T)may be solved for uniquely by working outwards super-diagonal by super-diagonal in the equation

T-ft(T)-f,(T)'T=0. (4.1.1)

The general recursion is

Tuf.Wu-f.muTjj =Jl,lyt(T)u+kTi+kj- Tu-kft(T)j-kJ] (4.1.2)A-0

where i<j. The pattern of dependence here is exactly the same as that described in §2.1. Infact when T=Z, the "steigungsmatrix" of §2.6, the recursion (4.1.2) reduces to recursion(2.1.8), giving another way to define the divided difference table matrix.

Recursion (4.1.2) requires only 0(^n3) multiplications to compute f,(T) and storage for6

just the three matrices T, f,(T) and the transformation matrix Pwhere T=PAP*. When 7'is areal Schur matrix, (4.1.2) may be interpreted as a block matrix equation.

Formula (4.1.2) fails numerically whenever T has confluent or nearly confluent eigenvalues. Kagstrom [8] points this out in his analysis of this method. However since the diagonal elements of Tmay be ordered as desired (§3.1), the eigenvalues may be arranged such thatclose ones are clustered in the ordering along Ts diagonal and well-separated ones are well-separated in the ordering. If a suitable clustering is decided upon, the blocks of f,(T)corresponding to the eigenvalue clusters may be computed by the Newton polynomial method;the remainder of f,(T) may be formed by the fast recursion (4.1.2). The basic idea is exactlythe same as that proposed in (2) of §2.8 for divided difference tables.

Finally, the discussion to this point has assumed that the given data, i.e. the matrix AorA's eigenvalues, are exact. However, this is seldom true. For example, A is often determinedfrom experiment, and the QR Algorithm produces a matrix Twhich js correct for some slightlyperturbed matrix. So, we actually seek to compute f(A) for some A close to A. The questionof the condition of the problem, that is how sensitive f(A) is to small perturbations in A, isimportant as we wish f(A) to be close to f(A). The condition is not dependent on themethod selected to compute f(A)\ however, the reliability of the computed solution dependsstrongly upon the condition. Kagstrom [9] develops some criteria for deciding on the conditionofexp,(i4).

-35-

Appendix

A. Error bound for Algorithm 2.2.

In order that the bounds to be developed remain simple in form, the arithmetic will beassumed to satisfy

WJ>)-5>/K«5>/l (a-1*i i i

where €< 1.06 x (machine precision) and Ji(g) is the computed floating-point value of theexpression g. Condition (A.l) holds when double precision accumulation of sums and innerproducts is available.1 The important simplification in this case is that the error bound on thesum is independent of the number of quantities summed. The leeway given in e allows allerror terms containing powers of e to be absorbed into terms linear in e.

Algorithm 2.2 is based on Theorem 2.1,

A0n/=i«„+,Ao"tr<\ (A.2)p-0

where ]Ja(i) = (i~a)J and where the set of abscissae Z = [i0,i\, . . . ,i„) is within the circle ofoo

convergence of the power series /= J^ayt« about the expansion point £= a. Let7=0

8 = max|£-fl|. (A.3)i

We first develop bounds on the divided differences of the power functions A0*f£+\k=0,1, ...,«, and on the error in using Algorithm 2.1. These results are then used to boundthe error in computing (A.2) by Algorithm 2.2.

Lemma 1: For/?=l,2,...

iA0*tr*i< ^S'8'' k=0A "• (A4)k

proof: Forp= 1, A0*f *+1 = ]£ (ira) and s0i=0

|Ao*T*+1| < (* + l)8, A:=0,1, ...,«.

If for some p > 1

Ia k1P+k-\\ < (l +ft-D •• • (k+ l) «„-!

for *=0,1 «, then since &0k}p+k= ^(ira)M\S+i~li=0

Mp+k\ <-^yj-i[(p+/-l) •' ' (/ +!)] = (p +k) 'pi{k +l)8Pfor £=0,1. n. D

tscc Wilkinson 120] for a general treatment of rounding error analysis.

36-

Lemma 2: If for p=\, 2,... the A0*t£+A are computed according to Algorithm 2.1. then for£=1,2, ...,/z

i/ftAftrq-Aftr'K %+*>!*«

proof: By condition (A.l) for /? = 1,

l/KAft*+,)-A6*Ta*+Il < «IIC/-«I < (A: + l)8€, £=0,1 n.i=0

If for some p > 1,

for £=0,1, . . . ,«, then

WAo*tr*)-A6ni+*i <wx(^-a)y?(A(jtr'-,)]-i(€i-fl)^(A(itr/-,)ii=0 i=0

+ik,-«iWA6tr/-,)-A(jtr/-,ii=0

for £=0,1, ...,«. D

< €bp 1 + !0>-l)! (p-2)\

(/> + £) ••• (* + !)»._= (p-=W\ 8c

X[(/>+ /-l)--(/ + l)]i=0

(A.5)

Theorem: If the arithmetic satisfies condition (A.l), then the error in using Algorithm 2.2satisfies

y,^)-^\<^(P+l)(nTK+pl&' (A.6)

proof: Let A$s„+m = £««+/>Ao"]a+p °e tne partial sums of (A.2). The error may be boundedp=0

by three terms,m

L/7(A0"/)-Aoy| < WA0"/)-2:aH+^(Ao''Tr/')|p=0

m m

+ IL«n+^(Aowir,')-Zan+pA0''tr/,l + lA^+m-AcJ'/l.p=0 p=0

IM

In the first te^m,y7(Ao,'/)=y7[£a«+;^(Ad'tfl^+/,)] and so by Lemma 1 and condition (A.l),P=o

in in

WAoV) - L«„+^(A0wtflw+p)I < e£ lor^^lUCA^T '̂̂ )I/>=0 />-0

(W+/>)!k,+j8*

By Lemma 2 and condition (A.l), the second term satisfies

-37

i£aff+^(Ao"rr',)-E«n+,Ao''tri ^ zhn+.iWA^tr^-A^ri/"-0 />=0 /»=0

^ e A (n+p)\\a„+p\8p/»=! 0>-l)!

Finally, m may be chosen so large that the truncation error is as small as we like, and so it maybe considered insignificant compared with the round-off error bounds. Adding the boundsgives (A.6). •

Corollary: If/=exp, exp,(£) = <?'*, then the error in using Algorithm 2.2 satisfies

l/7(Aowexp/)-A0''exp/| < €*'•(!+ /8) t" e'

proof: For/=exp„aw+/, = /"+V7(w-|-/;)!. Inserting this into (A.6) yields

L/?(A0"exp,)-Ao"exp,| ^ e-^lf i£±!i,/^»! 0> Pl

^te'Hl +tb)-!^-. o

B. Error bound for Algorithm 2.4.

Algorithm 2.4 computes

Aexp, = [Aexp,.,]272"V

(A.7)

(B.l)

given that Aexpr-/f has already been computed according to Algorithm 2.2. A bound of theform*

l/7(Aexp,)-Aexp,| < €T|exp,_, I272-7/1 (B.2)

is possible for some t > 0 and integer j ^ 0. In the event that the divided difference abscissaeare real, every element of Aexp, and Aexp2_,r is non-negative; and so the bound (B.2) becomesa relative error bound

l/y(Aexp,)-Aexp,| < €TAexp,. (B.3)

Bound (B.2) will be developed here under the same conditions used in Appendix A.

Lemma: If tb < log2, the relative error in each element of Aexp, computed by Algorithm2.2 is bounded by

/see

1 + /8

2-e's

proof: From equation (A.2) with a„+p=t"+peta/(n+p)\.

"° tpAn'l'!+pAri'exp, =/V"£ °''y^o (»+Pr>

(B.4)

tMatrix inequalities of the form B*>C hold element by clement; the matrix \B\ is the matrix whose elements are the absolute values of the elements of B.

and so by (A.4)

lAtfexpJ - t"\eta\

> t"\e,ci-Z

(P^p

i P\

When tb < log2, inequality (A.7) gives

38

^iiilo-(2-e'8)

1 + /8|̂ (A0,,exp,)-Ao"exp,| < €e'8:f^-|Ao''exp,|2 — e'°

where the quantity

ce'1 + /8

2-e'?

bounds the relative error in using Algorithm 2.2. This bound is independent of n and so holdsfor every element of the divided difference table. D

Under the condition (A.l), the operation of squaring a matrix 5 satisfies the condition

\fl(B2)-B2\ <€\B\2. (B.5)

Combining this with the lemma yields the sought for error bound.

Theorem: For anyrithm 2.4 satisfies

non-negative integer j such that 2 jtb < log2, the error in using Algo-

L/7(Aexp,)--Aexp,| <€[2>(t;+D--l]|Aexp2_,J2y (B.6)

where

T7~_t-j,a 1+ 2-48

2-e2"7'8(B.7)

proof: From the lemma,

|y?(AexprJ>)-AexprJ/| < eT,|Aexp2_,J.

Then the first matrix squaring yields

l/?(Aexp2_/+1/)-Aexp2_/+1/| < WAexp2_y+1/)-[/7(AexprJ/)]2|

+1 [^(Aexp2_7>)]2-Aexp2_y+,J

< e|Aexp2_yJ2+2€Tj,|Aexp2_7/|2

= e[2(T7-r-l)-l]|Aexp2_7,|2.

Repeating the squaring operation j times yields (B.6). D

The non-negative integer j may be chosen to minimize the error coefficient in (B.6),

c\ = 2 e2 '/«I±IM+12-e2**

1 (B.8)

Coefficient (B.8) has a minimum for 2~-/>8 = 0.3297. Since 2~'tb = 0.3297 is probably not truefor integer j\ we ask that j satisfy cj+\ ^ c, and cHX ^ c,. This condition is true for the smallestinteger j such that

2-'/8 > 0.2209 (B.9)

39

References

[I] R. P. Brent, Algorithms for Minimization without Derivatives, Prentice-Hall (1973).

[2] C. Davis, Explicit functional calculus, Linear Algebra and its Applications 6 (1973), 193-199.

[3] D. T. Finkbeiner, Introduction to Matrices and Linear Transformations, Freeman, San Francisco (1960).

[4] F. R. Gantmacher, Theory ofMatrices, v.l, Chelsea, New York (1959).

[5] A. O. GePfond, Calculus of Finite Differences, Hindustan, India (1971).

[6] G. H. Golub and J. H. Wilkinson, Ill-conditioned eigensystems and the computation ofthe Jordan canonical form, SIAM Rev. 18 (1976), 578-619.

[7] C. Jordan, Calculus of Finite Differences, Chelsea, New York (1950).[8] B. Kagstrom, Numerical computation of matrix functions, Revised report UMINF-58.77,

Dept. of Information Processing, Umea Univ., Sweden, July 1977.

[9] B. Kagstrom, Bounds and perturbation bounds for the matrix exponential, BIT11 (1977),39-57.

[10] B. Kagstrom and A. Ruhe, An algorithm for numerical computation of the Jordan normalform of a complex matrix, Revised report UMINF-59.77, Dept. of Information Processing, Umea Univ., Sweden, Sept. 1976.

[II] W. Kahan and I. Farkas, Algorithm 167, calculation of confluent divided differences,Commun. Assoc. Comput. Mach. 6 (1963), 164-165.

[12] L. M. Milne-Thomson, The Calculus of Finite Differences, Macmillan, London (1933).

[13] C. B. Moler and C. F. Van Loan, Nineteen ways to compute the exponential of a matrix,Technical Report 76-283, Dept. of Computer Science, Cornell Univ., July 1976.

[14] G. Opitz, Steigungsmatrizen, ZAMM44 (1964), T52-54.[15] B. N. Parlett, A recurrence among the elements of functions of triangular matrices, Linear

Algebra and its Applications 14 (1976), 117-121.

[16] B. N. Parlett, A program to swap diagonal blocks, Memo. No. UCB/ERL M77/66, Univ.of Calif, at Berkeley, Nov. 1977.

[17] B. T. Smith et al., Matrix eigensystem routines - EISPACK guide, Lecture Notes in ComputerScience, v.6, Springer-Verlag (1974).

[18] H. W. Turnbull and A. C. Aitken, An Introduction to the Theory of Canonical Matrices,Blackie and Son, London (1932).

[19] R. C. Ward, Numerical computation of the matrix exponential with accuracy estimate,SIAM J. Numer. Anal. 14 (1977), 600-609.

[20] J. H. Wilkinson, Rounding Errors in Algebraic Processes, Prentice-Hall (1963).

[21] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford (1965).

copyright © 1978, by the author(s). all rights reserved ... · newton's divided difference...

Documents