numerical analysis an advanced coursemath.ubbcluj.ro/~tcatinas/cursanalizanum.pdf · theorems, best...

Numerical AnalysisAn Advanced Course

Gheorghe Coman, Ioana Chiorean, Teodora Catinas

Contents

1 Preliminaries 91.1 Linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Examples of functions spaces . . . . . . . . . . . . . . . . . . . 201.3 The Peano Kernel Theorems . . . . . . . . . . . . . . . . . . . 241.4 Best approximation in Hilbert spaces . . . . . . . . . . . . . . 271.5 Orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . 31

2 Approximation of functions 412.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.2 Univariate interpolation operators . . . . . . . . . . . . . . . . 47

2.2.1 Polynomial interpolation operators . . . . . . . . . . . . . 482.2.2 Polynomial spline interpolation operators . . . . . . . . . 552.2.3 Spline operators of Lagrange type . . . . . . . . . . . . . 692.2.4 Spline operators of Hermite type . . . . . . . . . . . . . . 732.2.5 Spline operators of Birkhoff type . . . . . . . . . . . . . . 762.2.6 Rational interpolation operators . . . . . . . . . . . . . . 77

2.2.6.1 An iterative method to construct a rational interpo-lation operator . . . . . . . . . . . . . . . . . . . . . . 77

2.2.6.2 Univariate Shepard interpolation . . . . . . . . . . . . 832.2.6.2.1 The univariate Shepard-Lagrange operator . . . 892.2.6.2.2 The univariate Shepard-Hermite operator . . . . 942.2.6.2.3 The univariate Shepard-Taylor operator . . . . . 952.2.6.2.4 The univariate Shepard-Birkhoff operator . . . . 96

2.2.7 Least squares approximation . . . . . . . . . . . . . . . . 972.3 Some elements of multivariate interpolation operators . . . . . 103

2.3.1 Interpolation on a rectangular domain . . . . . . . . . . . 1032.3.2 Interpolation on a simplex domain . . . . . . . . . . . . . 1102.3.3 Bivariate Shepard interpolation . . . . . . . . . . . . . . 112

3

4

2.3.3.1 The bivariate Shepard-Lagrange operator . . . . . . . 1152.3.3.2 The bivariate Shepard-Hermite operator . . . . . . . 1202.3.3.3 The bivariate Shepard-Birkhoff operator . . . . . . . 126

2.4 Uniform approximation . . . . . . . . . . . . . . . . . . . . . . 1282.4.1 The Weierstrass theorems . . . . . . . . . . . . . . . . . . 1292.4.2 The Stone-Weierstrass theorem . . . . . . . . . . . . . . . 1352.4.3 Positive linear operators . . . . . . . . . . . . . . . . . . 137

2.4.3.1 Popoviciu-Bohman-Korovkin Theorem . . . . . . . . 1382.4.3.2 Modulus of continuity . . . . . . . . . . . . . . . . . . 1392.4.3.3 Popoviciu-Bohman-Korovkin Theorem in a quantita-

tive form . . . . . . . . . . . . . . . . . . . . . . . . . 139

3 Numerical integration 1453.1 Optimal quadrature formulas . . . . . . . . . . . . . . . . . . 146

3.1.1 Optimality in the sense of Sard . . . . . . . . . . . . . . . 1473.1.2 Optimality in the sense of Nikolski . . . . . . . . . . . . . 161

3.2 Some optimality aspects in numerical integration of bivariatefunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

4 Numerical linear systems 1754.1 Triangular systems of equations . . . . . . . . . . . . . . . . . 175

4.1.1 The Blocks version . . . . . . . . . . . . . . . . . . . . . 1764.2 Direct methods for solving linear systems of equations . . . . . 177

4.2.1 Factorization techniques . . . . . . . . . . . . . . . . . . 1774.2.1.1 The Gauss method . . . . . . . . . . . . . . . . . . . 1784.2.1.2 The Gauss-Jordan method . . . . . . . . . . . . . . . 1804.2.1.3 The LU methods . . . . . . . . . . . . . . . . . . . . 1804.2.1.4 The Doolittle algorithm . . . . . . . . . . . . . . . . . 1814.2.1.5 Block LU decomposition . . . . . . . . . . . . . . . . 1834.2.1.6 The Cholesky decomposition . . . . . . . . . . . . . . 1844.2.1.7 The QR decomposition . . . . . . . . . . . . . . . . . 1864.2.1.8 The WZ decomposition . . . . . . . . . . . . . . . . . 191

4.3 Iterative methods . . . . . . . . . . . . . . . . . . . . . . . . . 1944.3.1 The Jacobi method . . . . . . . . . . . . . . . . . . . . . 1964.3.2 The Gauss-Seidel method . . . . . . . . . . . . . . . . . . 1964.3.3 The method of relaxation . . . . . . . . . . . . . . . . . . 1994.3.4 The Chebyshev method . . . . . . . . . . . . . . . . . . . 1994.3.5 The method of Steepest Descendent . . . . . . . . . . . . 202

5

4.3.6 The multigrid method . . . . . . . . . . . . . . . . . . . . 204

5 Non-linear equations on R 209

5.1 One-step methods . . . . . . . . . . . . . . . . . . . . . . . . . 210

5.1.1 Successive approximations method . . . . . . . . . . . . . 213

5.1.2 Inverse interpolation method . . . . . . . . . . . . . . . . 215

5.2 Multistep methods . . . . . . . . . . . . . . . . . . . . . . . . 221

5.3 Combined methods . . . . . . . . . . . . . . . . . . . . . . . . 227

6 Non-linear equations on Rn 231

6.1 Successive approximation method . . . . . . . . . . . . . . . . 231

6.2 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . 234

7 Parallel calculus 239

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

7.2 Parallel methods to solve triangular systems of linear equations240

7.2.1 The column-sweep algorithm . . . . . . . . . . . . . . . . 240

7.2.2 The recurrent-product algorithm . . . . . . . . . . . . . . 242

7.3 Parallel approaches of direct methods for solving systems oflinear equations . . . . . . . . . . . . . . . . . . . . . . . . . . 243

7.3.1 The parallel Gauss method . . . . . . . . . . . . . . . . . 243

7.3.2 The parallel Gauss-Jordan method . . . . . . . . . . . . . 245

7.3.3 The parallel LU method . . . . . . . . . . . . . . . . . . 246

7.3.4 The parallel QR method . . . . . . . . . . . . . . . . . . 247

7.3.5 The parallel WZ method . . . . . . . . . . . . . . . . . . 247

7.4 Parallel iterative methods . . . . . . . . . . . . . . . . . . . . 249

7.4.1 Parallel Jacobi method . . . . . . . . . . . . . . . . . . . 249

7.4.2 Parallel Gauss-Seidel algorithm . . . . . . . . . . . . . . 251

7.4.3 The parallel SOR method . . . . . . . . . . . . . . . . . . 253

7.5 The parallelization of the multigrid method . . . . . . . . . . 254

7.5.1 Telescoping parallelizations . . . . . . . . . . . . . . . . . 254

7.5.2 Nontelescoping parallelizations . . . . . . . . . . . . . . . 256

Preface

Numerical Analysis is a branch of Mathematics which deals with the constructive

solutions of many problems formulated and studied by other fields of Mathema-

tics. As illustrative examples, one may consider the problem of solving operatorial

equations, the polynomial approximation of real-valued functions, the numerical

integration of functions, and so on.

At the same time, Numerical Analysis has a theoretical aspect of its own. When

analyzing a numerical method, one studies the error for the numerical solution,

and obtains results concerning convergence and error bounds.

This book provides both theoretical and numerical aspects for basic problems

of Numerical Analysis, with a special emphasis on optimal approximation.

It is addressed, first of all, to the undergraduate students, master students and

Ph.D. students in Mathematics, but also to the students of Computer Science,

Engineering, and to all the people interested in Numerical Analysis and Approxi-

mation Theory.

The book is structured in 7 chapters.

Chapter 1 is dedicated to some preliminary notions, linear spaces, Peano kernel

theorems, best approximation and orthogonal polynomials.

Chapter 2 treats the functions approximation by univariate interpolation ope-

rators, respectively by linear and positive operators. Also, there are presented

some elements of multivariate approximation.

Chapter 3 provides some applications to numerical integration of functions.

In Chapters 4-6 there are studied several methods of solving some operatorial

equations (linear systems, nonlinear equations on R,nonlinear equations on Rn).Chapter 7 is a brief introduction to parallel calculus.

The content of the book is essentially based on the authors work and research.

The authors

7

Chapter 1

Preliminaries

1.1 Linear spaces

Definition 1.1.1. Let K be a field and V be a given set. We say that V isa K-linear space (a linear space over K) if there exist an internal operation:

” + ” : V × V → V ; (v1, v2)→ v1 + v2,

and an external operation:

” · ” : K × V → V ; (α, v)→ αv

that satisfy the following conditions:1) (V,+) is a commutative group2)

a) (α + β)v = αv + βv, ∀α, β ∈ K, ∀v ∈ V,b) α(v1 + v2) = αv1 + αv2, ∀α ∈ K, ∀v1, v2 ∈ V,c) (αβ)v = α(βv), ∀α, β ∈ K, ∀v ∈ V,d) 1 · v = v, ∀v ∈ V.

The elements of V are called vectors and those of K are called scalars.

Definition 1.1.2. Let V be a K-linear space. A subset V1 ⊂ V is called asubspace of V if V1 is stable with respect to the internal, respectively, to theexternal operations, i.e.:

∀v1, v2 ∈ V1 =⇒ v1 + v2 ∈ V1,∀α ∈ K, ∀v ∈ V1 =⇒ αv ∈ V1,

9

10 Chapter 1. Preliminaries

and V1 is a K-linear space with respect to the induced operations.

Remark 1.1.3. 1) If V1 is a subspace of V then

0 ∈ V1

and∀v ∈ V1 =⇒ −v ∈ V1;

2) For all linear spaces the subsets 0 and V are subspaces;3) If V1 and V2 are subspaces of V then

V1 + V2 = v1 + v2 | v1 ∈ V1, v2 ∈ V2 is a subspace.

Definition 1.1.4. Let V and V ′ be two K-linear spaces. A function f : V →V ′ is called linear transformation or linear operator if:

1) f(v1 + v2) = f(v1) + f(v2), ∀v1, v2 ∈ V2) f(αv) = αf(v), ∀α ∈ K, ∀v ∈ V.

Remark 1.1.5. 1) f : V → V ′ is a linear operator if and only if

f(αv1 + βv2) = αf(v1) + βf(v2), ∀α, β ∈ K, ∀v1, v2 ∈ V.

2) f linear operator implies

f(0) = 0 and f(−v) = −f(v), v ∈ V.

3) If f : V → V ′ is a linear operator then

Ker f = v ∈ V | f(v) = 0

is a subspace of V called the Kernel or the null space of f ; and

Im f := f(V ) = f(v) | v ∈ V

is a subspace of V ′, called the image of f.

Remark 1.1.6. A R-linear space (K = R) is called the real linear space anda C-linear space (K = C) is called the complex linear space.

Definition 1.1.7. If V is a K-linear space then an application f : V → Kis called functional (real functional if K = R, respectively complex functionalif K = C).

1.1. Linear spaces 11

Definition 1.1.8. A nonnegative real functional p, defined on a linear spaceV, i.e.,

p : V → [0,∞],

with the properties

1) p(v1 + v2) ≤ p(v1) + p(v2), ∀v1, v2 ∈ V2) p(αv) = |α| p(v), ∀α ∈ R and v ∈ V

is called a seminorm on V. If, in addition,

3) p(v) = 0 =⇒ v = 0

then it is called a norm and V is called a normed linear space.

A norm is denoted by ‖·‖ .Remark 1.1.9. If V is a normed linear space then we have:

a) The properties:

1) and 3) =⇒ ‖v‖ = 0⇔ v = 0,

2) =⇒ |‖v1‖ − ‖v2‖| ≤ ‖v1 − v2‖ ;

b) d(v1, v2) = ‖v1 − v2‖ , for v1, v2 ∈ V, is a distance in V.c) A sequence (vn)n∈N of elements of V is said to be convergent in norm

of V to an element v ∈ V, if

limn→∞

‖vn − v‖ = 0.

d) A sequence (vn)n∈N is called a Cauchy sequence if for any ε > 0 thereexists a natural number N = Nε such that

‖vm − vn‖ < ε, for m,n > Nε.

Remark 1.1.10. A normed linear space is a metric space.

Remark 1.1.11. A convergent sequence is a Cauchy sequence.

Definition 1.1.12. In a normed linear space V, if any Cauchy sequence isconvergent, then V is called a complete space.

Definition 1.1.13. A complete normed linear space is called a Banach space.


Definition 1.1.14. Let V be a K-linear space. An application

〈v1, v2〉 : V × V → K,

with the properties:

1) 〈v1, v2〉 = 〈v2, v1〉,2) 〈v1 + v2, v3〉 = 〈v1, v3〉+ 〈v2, v3〉 ,3) 〈αv1, v2〉 = α 〈v1, v2〉 , α ∈ K4) v 6= 0 =⇒ 〈v, v〉 > 0,

is called the inner-product of v1 and v2.

Remark 1.1.15. The following statements are fulfilled:

1) and 4) =⇒ 〈v, v〉 is real and strict positive;

1) and 3) =⇒ 〈0, v〉 = 〈v, 0〉 = 0; in particular 〈0, 0〉 = 0;

1) and 2) =⇒ 〈v1,v2 + v3〉 = 〈v2 + v3, v1〉 = 〈v2, v1〉+ 〈v3, v1〉= 〈v1, v2〉+ 〈v1, v3〉 ;

1) and 3) =⇒ 〈v1, αv2〉 = α 〈v1, v2〉 .If K = R then

〈αv1, v2〉 = 〈v1, αv2〉 = α 〈v1, v2〉 .Definition 1.1.16. A linear space with an inner-product is called an inner-product space.

Definition 1.1.17. In an inner-product space, the norm defined by

‖v‖ =√〈v, v〉,

is called the norm induced by the inner-product.

Definition 1.1.18. A normed linear space with the norm induced by aninner-product is called a pre-Hilbert space.

In any pre-Hilbert space, the following inequality (called the Schwarzinequality) holds, with respect to the inner-product norm:

|〈v1, v2〉| ≤ ‖v1‖ · ‖v2‖ .Also, the inner-product norm satisfies the triangle inequality:

‖v1 + v2‖ ≤ ‖v1‖+ ‖v2‖ .


Definition 1.1.19. A complete pre-Hilbert space is called a Hilbert space.

A Banach space with the norm induced by an inner-product is a Hilbertspace.

In any Hilbert space V the following equality holds:

‖v1 + v2‖2 + ‖v1 − v2‖2 = 2(‖v1‖2 + ‖v2‖2), ∀v1, v2 ∈ V, (1.1.1)

that is called the parallelogram identity or parallelogram law.Moreover, a Banach space whose norm satisfies the parallelogram identity

is a Hilbert space.

Definition 1.1.20. Two elements v1, v2 in an inner-product space V areorthogonal if

〈v1, v2〉 = 0.

A set V ′ ⊂ V is called an orthogonal set if

〈v1, v2〉 = 0, ∀v1, v2 ∈ V ′, v1 6= v2.

If, in addition,〈v, v〉 = 1, ∀v ∈ V ′,

then V ′ is called an orthonormal set.

Definition 1.1.21. Let V be a linear space over R or C. A linear operatorP : V → V is called a projector if

P P = P (or shortly, P 2 = P ).

For example, the identity operator

I : V → V, I(v) = v

and the null operator0 : V → V, 0(v) = 0

are projectors.

Definition 1.1.22. Let P1, P2 : V → V be projectors. If

P1 P2 = P2 P1

then P1 and P2 are called commuting projectors.


Remark 1.1.23. 1) ” ” is also denoted by ” · ” or it is just missing, i.e.,P1 P2 is denoted by P1 · P2 or P1P2.

2) P is a projector implies that PC := I − P is a projector.Indeed,

(PC)2 = (I − P )(I − P ) = I2 − P − P + P 2 = I − P = PC .

3) The projectors P and PC are commuting projectors.We have

PPC = P (I − P ) = PI − P = P − P = 0,

PCP = (I − P )P = IP − P 2 = P − P = 0,

hence,PPC = 0 = PCP.

4) (PC)C = P.5) P is projector implies ImPC = KerP.6) P and Q are commuting projectors implies PQ is projector.Indeed,

(PQ)2 = PQPQ = PPQQ = P 2Q2 = PQ.

7) P,Q are commuting projectors implies PC, QC are commuting projec-tors.

We have

PCQC = (I − P )(I −Q) = I2 − IQ− PI + PQ = I −Q− P + PQ,

QCPC = (I −Q)(I − P ) = I2 − IP −QI + PQ = I − P −Q+ PQ.

It follows thatPCQC = QCPC .

8) P,Q are commuting projectors implies PCQC and (PCQC)C are pro-jectors, with

(PCQC)C = P +Q− PQ := P ⊕Q,which is the boolean sum of P and Q.

Indeed, we already have seen that

PCQC = I − P −Q+ PQ,

so,(PCQC)C = I − (I − P −Q+ PQ) = P +Q− PQ.


Let V be a linear space over R or C and

L= Pj | j ∈ J

be a set of commuting projectors of V , i.e., Pj : V → V are linear operatorsand

P 2j = Pj;

PjPk = PkPj , for j, k ∈ J.

Proposition 1.1.24. The set

L′ = PjPk | j, k ∈ J.

includes L, i.e.,L ⊆ L′, (1.1.2)

and it is made up by commuting projectors.

Proof. For all j ∈ J, we have P 2j = Pj, i.e.,

PjPj = Pj,

which implies Pj ∈ L′, so (1.1.2) is proved.For any j, k ∈ J denote by

Pjk := PjPk,

which implies that Pjk is a projector, as a composition of two commutingprojectors. For every j, k, i,m ∈ J we have

PjkPim = PjPkPiPm = PjPiPkPm = PiPjPkPm

= PiPjPmPk = PiPmPjPk = PimPjk,

i.e., the projectors from L′ are commuting.

Proposition 1.1.25. The set

L′′ = Pjk ⊕ Pim | j, k, i,m ∈ J

includes L′, i.e.,L′ ⊆ L′′, (1.1.3)

and it is made up by commuting projectors.


Proof. For ∀j, k ∈ J, we have

Pjk ⊕ Pjk = Pjk + Pjk − Pjk Pjk = Pjk + Pjk − P 2jk

= Pjk + Pjk − Pjk = Pjk,

which implies Pjk ∈ L′′, so (1.1.3) is proved. From the permutability of theprojectors from L′, it follows that the elements of L′′ are projectors. For thecommutability of the projectors of L′′ we have to prove that if Pi, i = 1, ..., 4are commuting projectors then

(P1 ⊕ P2)(P3 ⊕ P4) = (P3 ⊕ P4)(P1 ⊕ P2). (1.1.4)

Indeed, we get

(P1 ⊕ P2)(P3 ⊕ P4) =(P1 + P2 − P1P2)(P3 + P4 − P3P4)

=P1P3 + P1P4 − P1P3P4 + P2P3 + P2P4

− P2P3P4 − P1P2P3 − P1P2P4 + P1P2P3P4.

On the other hand,

(P3 ⊕ P4)(P1 ⊕ P2) =(P3 + P4 − P3P4)(P1 + P2 − P1P2)

=P3P1 + P3P2 − P3P1P2 + P4P1 + P4P2

− P4P1P2 − P3P4P1 − P3P4P2 + P3P4P1P2.

As, Pi, i = 1, ..., 4 are commuting projectors, we have

(P3 ⊕ P4)(P1 ⊕ P2) =P1P3 + P1P4 − P1P3P4 + P2P3 + P2P4

− P2P3P4 − P1P2P3 − P1P2P4 + P1P2P3P4,

and (1.1.4) is proved.

Remark 1.1.26. From the definitions of L′ and L′′, it follows that if L isfinite then L′ and L′′ are also finite.

Starting from a set L of commuting projectors, we have built L′ and L′′,which are sets of commuting projectors and

L ⊆ L′ ⊆ L′′.

DenotingL1 = L′′


and inductively defining

Ln+1 = L′′n, (n ≥ 1),

one obtains an increasing sequence of sets of commuting projectors:

L1⊆ L2⊆ . . . ⊆ Ln⊆ Ln+1 ⊆ . . . . (1.1.5)

Let us denote

L =∞⋃n=1

Ln. (1.1.6)

Proposition 1.1.27. The set of commuting projectors L, is stable with re-spect to the composition and boolean sum, i.e.,

∀P,Q ∈ L =⇒ PQ ∈ L∀P,Q ∈ L=⇒P⊕Q∈L.

Proof. If P,Q ∈ L then, from (1.1.6), if follows that ∃n1, n2 ≤ n ∈ N∗ suchthat P ∈ Ln1, Q ∈ Ln2. If n1 ≤ n2, from (1.1.5) one obtains

Ln1 ⊆ Ln2 =⇒ P,Q ∈ Ln2 =⇒ PQ ∈ L′n2≤ L′′

n2⊆ Ln2+1

and

P ⊕Q ∈ L′′n2

= Ln2+1.

So, PQ = QP and PQ, P ⊕Q ∈ L.Remark 1.1.28. The set L is the lowest set of commuting projectors thatincludes L and which is stable with respect to the operations of compositionand boolean sum.

Remark 1.1.29. For P1, P2, P3 ∈ L, we have

P1P2 = P2P1,

P1 ⊕ P2 = P2 ⊕ P1,

(P1P2)P3 = P1(P2P3),

(P1 ⊕ P2)⊕ P3 = P1 ⊕ (P2 ⊕ P3),

which are easy to verify.


Proposition 1.1.30. The relation ” ≤ ”, defined on L by

P1 ≤ P2 ⇔ P1P2 = P1,

is an order relation (reflexive, transitive and antisymmetric).

Proof. Reflexive: ∀P ∈ L, PP = P (P is a projector), so P ≤ P.Transitive: ∀P1, P2, P3 ∈ L we have

P1 ≤ P2, P2 ≤ P3 =⇒ P1P2 = P1, P2P3 = P2 =⇒ P1P2P2P3 = P1P2

=⇒ P1P2P3 = P1P2 =⇒ P1P3 = P1 =⇒ P1 ≤ P3.

Antisymmetric: ∀P1, P2 we have

P1 ≤ P2, P2 ≤ P1 =⇒ P1P2 = P1,

P2P1 = P2 =⇒ P1 = P2 (P1P2 = P2P1)

Proposition 1.1.31. (L,≤) is a lattice and

inf(P1,P2) = P1P2, (1.1.7)

sup(P1, P2) = P1 ⊕ P2, ∀P1, P2 ∈ L. (1.1.8)

Proof. We have

P1(P1P2) = P 21P2 = P1P2 =⇒ P1P2 ≤ P1

P2(P1P2) = P1P22 = P1P2 =⇒ P1P2 ≤ P2

=⇒ P1P2 is a minor for P1 and P2.

Let P3 ∈ L be another minor of P1 and P2. It follows

P3 ≤ P1, P3 ≤ P2 =⇒ P3P1 = P3, P3P2 = P3 =⇒ (P3P1)(P3P2) = P 23

=⇒ P 23 (P1P2) = P 2

3 =⇒ P3(P1P2) = P3 =⇒ P3 ≤ P1P2.

Hence, P1P2 is the highest minor of P1 and P2, what implies that

inf(P1, P2) = P1P2

and (1.1.7) is proved.


To prove (1.1.8), we consider

P1(P1 ⊕ P2) = P1(P1 + P2 − P1P2) = P 21 + P1P2 − P 2

1P2

= P1 + P1P2 − P1P2 = P1 =⇒ P1 ≤ P1 ⊕ P2.

In the same way it can be proved that

P2 ≤ P1 ⊕ P2.

So,P1 ≤ P1 ⊕ P2

P2 ≤ P1 ⊕ P2

=⇒ P1 ⊕ P2 is a major for P1 and P2.

Now, let P3 ∈ L be another major of P1 and P2. We have

P1 ≤ P3 =⇒ P1P3 = P1,

P2 ≤ P3 =⇒ P2P3 = P2,

respectively,

(P1 ⊕ P2)P3 = (P1 + P2 − P1P2)P3

= P1P3 + P2P3 − P1P2P3

= P1 + P2 − P1P2 = P1 ⊕ P2.

Therefore,(P1 ⊕ P2)P3 = P1 ⊕ P2 =⇒ P1 ⊕ P2 ≤ P3,

i.e., P1 ⊕ P2 is the lowest major of P1 and P2, and (1.1.8) is also proved.In concusion, (L,≤) is a lattice.

Proposition 1.1.32. The lattice (L,≤) is distributive, i.e.,

P1(P2 ⊕ P3) = (P1P2)⊕ (P1P3), ∀P1, P2, P3 ∈ L. (1.1.9)

Proof. We have that

P1(P2 ⊕ P3) = P1(P2 + P3 − P2P3) = P1P2 + P1P3 − P1P2P3,

(P1P2)⊕ (P1P3) = P1P2 + P1P3 − P1P2P1P3 = P1P2 + P1P3 − P1P2P3,

and therefore, the distributivity is proved.


Remark 1.1.33. The relation (1.1.9) represents the distributivity of infwith respect to sup. In any lattice this distributivity is equivalent to thedistributivity of sup with respect to inf. So, (1.1.9) is equivalent to

P1 ⊕ (P2P3) = (P1 ⊕ P2)(P1 ⊕ P3), ∀P1, P2, P3 ∈ L.

Remark 1.1.34. The lattice (L,≤) is generated by L.Definition 1.1.35. The elements of the set L are called the generators ofthe lattice (L,≤).

1.2 Examples of functions spaces

1. The space Cp[a, b], p ∈ N of functions f : [a, b] → R continuously differ-entiable up to order p, inclusively. For p = 0 one gets the space C[a, b] ofcontinuous functions on [a, b].

Cp[a, b] is a linear space over R, with respect to the usual addition offunctions and multiplication of functions by real numbers. It is also normed,with the norm defined by

‖f‖ =

p∑

k=0

maxa≤x≤b

∣∣f (k)(x)∣∣ .

Moreover, Cp[a, b] is a complete space, thus it is a Banach space.2. The space Pm of polynomials of degree at most m. One has Pm ⊂

C∞[a, b].We denote by Pnm the set of polynomials in n variables and of global degree

at most m and by Pnm1,...,mnthe set of polynomials in n variables x1, ..., xn

and of degree mk with respect to the variable xk, for k = 1, ..., n.3. The space Lp[a, b], p ∈ R, p ≥ 1 of functions p-Lebesgue integrable on

[a, b].Recall that a function f : [a, b] → R is said to be p-Lebesgue integrable

on [a, b] if |f |p is Lebesgue integrable on [a, b].Moreover, the class f of p-Lebesgue integrable functions on [a, b] is the

set of all functions g, which are p-Lebesgue integrable on [a, b] and equivalentto f (denoted g ∼ f), i.e., g = f almost everywhere on [a, b].

The set of equivalence classes defined above is a linear space with respectto addition:

f1 + f2 = g, iff f1 + f2 ∼ g,

1.2. Examples of functions spaces 21

and multiplication:λf = h, iff λf ∼ h.

Identifying the class f with its representant f ∈ f , we define the norm

‖f‖ =

(∫ b

a

|f(x)|p dx)1/p

. (1.2.1)

The space Lp[a, b] is the linear space of the introduced equivalence classes,and it is normed, with the norm given by (1.2.1).

Moreover, Lp[a, b] is a complete space, thus it is a Banach space.For p = 2, i.e., on L2[a, b] one can define the inner product

〈f, g〉2 =

∫ b

a

f(x)g(x)dx.

It follows that L2[a, b] is a Hilbert space, with respect to the norm endowedby this inner product, namely,

‖f‖2 =

(∫ b

a

f 2(x)dx

)1/2

.

4. Let w be a positive integrable function on [a, b], with

∫ b

a

w(x)dx <∞.

Let L2w[a, b] be the set of functions f such that

∫ b

a

w(x)|f(x)|2dx <∞.

On L2w[a, b] it is defined the inner-product

〈f, g〉w,2 =

∫ b

a

w(x)f(x)g(x)dx, f, g ∈ L2w[a, b],

and the induced norm

‖f‖w,2 =

(∫ b

a

w(x)|f(x)|2dx)1/2

.


(L2w[a, b], ‖·‖w,2

)is a Hilbert space.

5. The space Hm[a, b], m ∈ N∗ of functions f ∈ Cm−1[a, b], with f (m−1)

absolutely continuous on [a, b].Notice that every function f ∈ Hm[a, b] can be represented using Taylor’s

formula with remainder in the integral form:

f(x) =m−1∑

k=0

(x−a)k

k!f (k)(a) +

∫ x

a

(x−t)m−1

(m−1)!f (m)(t)dt. (1.2.2)

So, it immediately follows that for f, g ∈ Hm[a, b] and λ ∈ R, one has f + g,λf ∈ Hm[a, b].

Hm[a, b] is a linear space.6. The space Hm,2[a, b], m ∈ N∗ of functions f ∈ Hm[a, b], with f (m) ∈

L2[a, b].From (1.2.2) it follows that a function f ∈ Hm,2[a, b] is uniquely deter-

mined by the values f (k)(a), k = 0, ..., m−1 and f (m). Consequently, one canconsider the expression

〈f, g〉m,2 =

∫ b

a

f (m)(x)g(m)(x)dx+m−1∑

k=0

f (k)(a)g(k)(a),

as an inner product, since the four properties of the inner product are verifiedin this case.

Let ‖·‖m,2 be the norm endowed by this inner product, namely,

‖f‖2m,2 =∥∥f (m)

∥∥2

2+

m−1∑

k=0

(f (k)(a))2.

In Hm,2[a, b] we consider a Cauchy sequence fnn∈N. Since f (m)n n∈N is a

Cauchy sequence in L2[a, b], and f (k)n (a)n∈N is a Cauchy sequence in R,

for each k = 0, ..., m − 1, it follows that f (m)n n∈N converges to a function

g ∈ L2[a, b], and every sequence f (k)n (a)n∈N converges to a number ck ∈ R,

k = 0, ..., m− 1, (L2[a, b] and R are complete spaces). Let

f(x) =

m−1∑

k=0

ck(x−a)k

k!+

∫ x

a

(x−t)m−1

(m−1)!g(t)dt.

1.2. Examples of functions spaces 23

Since f ∈ Hm[a, b] and f (m) = g (f (m) ∈ L2[a, b]), it follows that f ∈Hm,2[a, b]. But,

limn→∞

‖fn − f‖2m,2 = limn→∞

(∥∥f (m)n − f (m)

∥∥2

2+

m∑k=0

[f (k)n (a)− f (k)(a)]2

)= 0,

so,limn→∞

fn = f,

in ‖·‖m,2 norm.

It yields Hm,2[a, b] is complete, thus it is a Hilbert space.7. The Sard-type space Bpq(a, c), (p, q ∈ N, p + q = m) of the functions

f : D → R, D = [a, b]× [c, d] satisfying:1. f (p,q) ∈ C (D) ;2. f (m−j,j)(·, c) ∈ C [a, b] , j < q;3. f (i,m−i)(a, ·) ∈ C [c, d] , i < p.8. The Sard-type space Br

pq(a, c), (p, q ∈ N, p + q = m, r ∈ R, r ≥ 1) ofthe functions f : D → R, D = [a, b]× [c, d] satisfying:

1. f (i,j) ∈ C (D) , i < p, j < q;2. f (m−j−1,j)(·, c) is absolute continuous on [a, b] and f (m−j,j)(·, c) ∈

Lr[a, b], j < q;3. f (i,m−i−1)(a, ·) is absolute continuous on [c, d] and f (i,m−i)(a, ·) ∈

Lr[c, d], i < p;4. f (p,q) ∈ Lr(D).As in the univariate case, we have that if f ∈ Bpq(a, c) then this function

admits the Taylor representation:

f(x, y) =∑

i+j<m

(x−a)i

i!(y−c)j

j!f (i,j)(a, c) + (Rmf) (x, y), (1.2.3)

with

(Rmf) (x, y) =∑

j<q

(y−c)j

j!

∫ b

a

(x−s)m−j−1+

(m−j−1)!f (m−j,j)(s, c)ds

+∑

i<p

(x−a)i

i!

∫ d

c

(y−t)m−i−1+

(m−i−1)!f (i,m−i)(a, t)dt

+

∫∫

D

(x−s)p−1+

(p−1)!

(y−t)q−1+

(q−1)!f (p,q)(s, t)dsdt,


where

z+ =

z, when z ≥ 0,0, when z < 0.

Remark 1.2.1. The Taylor type formula (1.2.3) holds for any domain Ωhaving the property that there exists a point (a, c) ∈ Ω such that the rectan-gle [a, x]× [c, y] ⊆ Ω, for all (x, y) ∈ Ω. In this case, formula (1.2.3) is writtenas

f(x, y) =∑

i+j<m

(x−a)i

i!(y−c)j

j!f (i,j)(a, c) + (Rmf) (x, y),

with

(Rmf) (x, y) =∑

j<q

(y−c)j

j!

∫

I1

(x−s)m−j−1+

(m−j−1)!f (m−j,j)(s, c)ds

+∑

i<p

(x−a)i

i!

∫

I2

(y−t)m−i−1+

(m−i−1)!f (i,m−i)(a, t)dt

+

∫∫

Ω

(x−s)p−1+

(p−1)!

(y−t)q−1+

(q−1)!f (p,q)(s, t)dsdt,

where

I1 = (x, y) ∈ R2 | y = c ∩ Ω,

I2 = (x, y) ∈ R2 | x = a ∩ Ω.

Examples of such domains are the triangle

Th = (x, y) ∈ R2 | x ≥ 0, y ≥ 0, x+ y ≤ h

with (a, c) = (0, 0) and any circle with center at (a, c).

1.3 The Peano Kernel Theorems

Theorem 1.3.1. Let L : Hm[a, b] → R be a linear functional which com-mutes with the defined integral operator, for example, of the form

L(f) =

m−1∑

i=0

∫ b

a

f (i)(x)dµi(x),

where µi are functions of bounded variation on [a, b].

1.3. The Peano Kernel Theorems 25

If KerL = Pm−1 then

L(f) =

∫ b

a

Km(t)f (m)(t)dt, (1.3.1)

where

Km(t) = Lx(

(x−t)m−1+

(m−1)!

).

The notation Lx(f) means that L is applied to f with respect to the variablex.

Proof. Since f ∈ Hm[a, b], one can apply Taylor’s formula and get

f = Pm−1 +Rm−1,

where Pm−1 ∈ Pm−1 and

Rm−1(x) =

∫ b

a

(x−t)m−1+

(m−1)!f (m)(t)dt.

Thus,L(f) = L(Pm−1) + L(Rm−1).

Since L(Pm−1) = 0 (KerL = Pm−1), we obtain

L(f) = Lx(∫ b

a

(x−t)m−1+

(m−1)!f (m)(t)dt

),

and, from the hypothesis condition, we have that

L(f) =

∫ b

a

L(

(·−t)m−1+

(m−1)!

)f (m)(t)dt.

The function K is called the Peano’s kernel.

Corollary 1.3.2. If the kernel K does not change its sign on the interval[a, b] and f (m) is continuous on [a, b], then

L(f) = 1m!L(em)f (m)(ξ), a ≤ ξ ≤ b, (1.3.2)

where ek(x) = xk.


Proof. Applying mean value theorem to (1.3.1), one obtains the representa-tion

L(f) = f (m)(ξ)

∫ b

a

Km(t)dt, (1.3.3)

which does not depend on f. Taking f = em, this yields∫ b

a

Km(t)dt = 1m!L(em).

Substituting the integral in (1.3.3) by the above expression, one gets (1.3.2).

Theorem 1.3.3. Let f ∈ Bpq(a, c) and the linear functional L given by

L(f) =∑

i+j<m

∫∫

D

f (i,j)(x, y)dµi,j(x, y) (1.3.4)

+∑

j<q

∫ b

a

f (m−j,j)(x, c)dµm−j,j(x)

+∑

i<p

∫ d

c

f (i,m−i)(a, y)dµi,m−i(y),

where µm,n are functions with bounded variation on D, [a, b] and, respectively,on [c, d]. If KerL = P2

m−1 then

L(f) =∑

j<q

∫ b

a

Km−j,j(s)f(m−j,j)(s, c)ds (1.3.5)

+∑

i<p

∫ d

c

Ki,m−i(t)f(i,m−i)(a, t)dt

+

∫∫

D

Kp,q(s, t)f(p,q)(s, t)dsdt,

where

Km−j,j(s) = Lx,y(

(x−s)m−j−1+

(m−j−1)!(y−c)j

j!

),

Ki,m−i(t) = Lx,y(

(x−a)i

i!

(y−t)m−i−1+

(m−i−1)!

),

Kp,q(s, t) = Lx,y(

(x−s)p−1+

(p−1)!

(y−t)q−1+

(q−1)!

).

1.4. Best approximation in Hilbert spaces 27

Proof. Since f ∈ Bpq(a, c), we can consider formula (1.2.3). Applying thefunctional L to each member of this formula and taking into account thehypothesis

L(p) = 0, ∀p ∈ P2m−1,

we are led toL(f) = L(Rmf).

Next, from (1.3.4) we get (1.3.5).

1.4 Best approximation in Hilbert spaces

Definition 1.4.1. Let B be a normed linear space A ⊂ B and f ∈ B. Theelement g∗ ∈ A with the property that

‖f − g∗‖ ≤ ‖f − g‖ , ∀g ∈ A

is called the best approximation of f from A .

Next, we are interested in some conditions for the existence, uniquenessand for a characterization of the best approximation, in the case when B isa Hilbert space.

Theorem 1.4.2. Let B be a Hilbert space. If A ⊂ B is a nonempty, convexand closed set then for all f ∈ B there exists an unique best approximationg∗ ∈ A .

Proof. If f ∈ A then g∗ = f and the proof is over.Suppose that f /∈ A. Let

d∗ = d(f,A) = infg∈A‖f − g‖ .

Let (gn)n∈N ⊂ A be such that

limn→∞

‖f − gn‖ = d∗.

Using the parallelogram identity (1.1.1), with v1 = f − gn and v2 = f − gm,one obtains:

‖2f − (gm + gn)‖2 + ‖gm − gn‖2 = 2(‖f − gm‖2 + ‖f − gn‖2

),


or

‖gm − gn‖2 = 2(‖f − gm‖2 + ‖f − gn‖2

)− 4

∥∥∥f − (gm+gn)2

∥∥∥2

.

Since A is a convex set and gm, gn ∈ A, it follows that

12(gm + gn) ∈ A,

which implies ∥∥∥f − (gm+gn)2

∥∥∥2

≥ (d∗)2.

So,‖gm − gn‖2 ≤ 2

(‖f − gm‖2 + ‖f − gn‖2

)− 4(d∗)2.

As,

limm→∞

‖f − gm‖ = d∗,

limn→∞

‖f − gn‖ = d∗,

it follows that (gn)n∈N is a Cauchy sequence, hence it is a convergent one,in B. But A is closed, so g∗ ∈ A and d∗ = ‖f − g∗‖ . Hence, g∗ is the bestapproximation to f and the existence is proved.

Suppose now, that there exist two elements g∗, h∗ ∈ A such that

‖f − g∗‖ = d∗

and‖f − h∗‖ = d∗.

Since A is a convex set and g∗, h∗ ∈ A, it follows

12(g∗ + h∗) ∈ A,

hence,‖2f − (g∗ + h∗)‖ ≥ 2d∗.

Using again the parallelogram identity, with v1 = f − g∗, v2 = f − h∗, oneobtains

‖g∗ − h∗‖2 = 2(‖f − g∗‖2 + ‖f − h∗‖2

)− ‖2f − (g∗ + h∗)‖2

≤ 4d∗ − 4d∗ = 0.

1.4. Best approximation in Hilbert spaces 29

Therefore,‖g∗ − h∗‖ = 0,

i.e.,g∗ = h∗.

Theorem 1.4.3. In the conditions of Theorem 1.4.2, g∗ ∈ A is the bestapproximation to f ∈ B if and only if

〈g∗ − f, g − g∗〉 ≥ 0, ∀g ∈ A. (1.4.1)

Proof. Suppose that g∗ is the best approximation to f, i.e.,

‖g∗ − f‖ ≤ ‖g − f‖ , ∀g ∈ A.

The set A being convex, it follows that ∀ε > 0, g∗ + ε(g − g∗) ∈ A, hence,

‖g∗ − f‖ ≤ ‖g∗ + ε(g − g∗)− f‖ .

It yields

‖g∗ − f‖2 ≤ ‖g∗ − f + ε(g − g∗)‖2

= 〈g∗ − f + ε(g − g∗), g∗ − f + ε(g − g∗)〉= ‖g∗ − f‖2 + 2ε 〈g∗ − f, g − g∗〉+ ε2 ‖g − g∗‖2 ,

which implies2 〈g∗ − f, g − g∗〉+ ε ‖g − g∗‖2 ≥ 0.

As this inequality holds for ∀ε > 0, we must have

〈g∗ − f, g − g∗〉 ≥ 0.

Now, let us suppose that (1.4.1) holds. For g ∈ A, we have

‖g − f‖2 = ‖g∗ − f + (g − g∗)‖2

= ‖g∗ − f‖2 + ‖g − g∗‖2 + 2 〈g∗ − f, g − g∗〉 ,

or‖g − f‖2 − ‖g∗ − f‖2 = ‖g − g∗‖2 + 2 〈g∗ − f, g − g∗〉 .


As (1.4.1) holds, it follows

‖g − f‖2 − ‖g∗ − f‖2 ≥ 0,

i.e.,

‖g − f‖ ≥ ‖g∗ − f‖ , ∀g ∈ A,

and the theorem is completely proved.

Remark 1.4.4. The property (1.4.1) is called the property of pseudoorthog-onality.

Corollary 1.4.5. If A is a linear subspace of B and At is a translation ofA then g∗ ∈ At is the best approximation for f ∈ B, if and only if

〈g∗ − f, g〉 = 0, ∀g ∈ A. (1.4.2)

Proof. According to Theorem 1.4.3, g∗ ∈ At is the best approximation forf ∈ B, if and only if

〈g∗ − f, g − g∗〉 ≥ 0, ∀g ∈ At.

As, h := g − g∗ ∈ A, it follows

〈g∗ − f, h〉 ≥ 0, h ∈ A.

Suppose that 〈g∗ − f, h〉 > 0. We have h ∈ A implies −h ∈ A (A being alinear space), so

〈g∗ − f,−h〉 ≥ 0, h ∈ A.

But, if

〈g∗ − f, h〉 > 0

then

〈g∗ − f,−h〉 < 0,

which is a contradiction with our assumption, therefore it implies (1.4.2).

1.5. Orthogonal polynomials 31

1.5 Orthogonal polynomials

Let P ⊂ L2w[a, b] be a set of orthogonal polynomials on [a, b], with respect

to the weight function w, and Pn ⊂ P the set of orthogonal polynomials ofdegree n, with the coefficient of xn equals to 1 (such a polynomial is denotedby pn).

Theorem 1.5.1. If pn ⊥ Pn−1 then pn has exactly n real and distinct rootsin the open interval (a, b).

Proof. The condition pn ⊥ Pn−1 is equivalent with

∫ b

a

w(x)pn(x)xidx = 0, i = 0, 1, ..., n− 1,

hence, ∫ b

a

w(x)pn(x)dx = 0.

It follows that pn has m ≥ 1 roots in which it changes the sign. Letx1, ..., xm ∈ (a, b) be these roots and

rm(x) = (x− x1)...(x− xm).

Therefore, pnrm ≥ 0 on [a, b]. As pnrm 6= 0, we have

〈pn, rm〉w,2 =

∫ b

a

w(x)pn(x)rm(x)dx ≥ 0

and the condition pn ⊥ Pm−1 implies m ≥ n. As a polynomial of degree ncannot have more than n roots, it follows that m = n.

Theorem 1.5.2. Let (pn)n∈N be a sequence of orthogonal polynomials on[a, b], with respect to the weight function w, and

pn(x) = anxn + bnx

n−1 + ...

Then we have the recurrence relation:

anan+1

pn+1(x) =

(bn+1

an+1− bnan

+ x

)pn(x)−

an−1

an

‖pn‖2w,2‖pn−1‖2w,2

pn−1(x), (1.5.1)

for n = 1, 2, ...


Proof. We have

xpn(x) =n+1∑

k=0

cn,kpk(x), (1.5.2)

where

cn,k =

∫ b

a

w(x)xpn(x)pk(x)dx

/∫ b

a

w(x)p2k(x)dx,

orcn,k = 〈e1pn, pk〉w,2

/‖pk‖2w,2 , (e1(x) = x).

We also havecn,k = 〈pn, e1pk〉w,2

/‖pk‖2w,2 .

As, 〈pn, pi〉w,2 = 0, for i < n, it follows that cn,k = 0, for k < n−1 and (1.5.2)becomes

xpn(x) = cn,n+1pn+1(x) + cn,npn(x) + cn,n−1pn−1(x). (1.5.3)

In (1.5.3), identifying the terms of degree n+ 1, respectively n, one obtains

cn,n+1 =anan+1

,

cn,n =bnan− bn+1

an+1.

In order to get cn,n−1, one notices that

cn,k = 〈e1pn, pk〉w,2/‖pk‖2w,2 ,

respectively,

ck,n = 〈e1pk, pn〉w,2/‖pn‖2w,2 .

So,cn,k ‖pk‖2w,2 = ck,n ‖pn‖2w,2 .

It follows that

cn,n−1 = cn−1,n

‖pn‖2w,2‖pn−1‖2w,2

,

respectively,

cn,n−1 =an−1

an

‖pn‖2w,2‖pn−1‖2w,2

,

and from (1.5.3), the proof follows.


Remark 1.5.3. For the polynomials of P , we have

pn+1(x) = (x+ bn+1 − bn)pn(x)−‖pn‖2w,2‖pn−1‖2w,2

pn−1(x). (1.5.4)

Remark 1.5.4. If (pn)n∈N is a sequence of orthonormal polynomials then(1.5.1) becomes

anan+1

pn+1(x) =

(bn+1

an+1− bnan

+ x

)pn(x)−

an−1

anpn−1(x).

Theorem 1.5.5. If pn ⊥ Pn−1, on [a, b], with respect to the weight functionw, then

‖pn‖w,2 = minp∈Pn

‖p‖w,2 .

Proof. We have Pn = en+Pn−1, i.e., Pn is a translation of the linear subspacePn−1 of the space Hm,2[a, b]. In accordance with Corollary 1.4.5, p∗ ∈ Pn isthe best approximation to 0 ∈ L2

w[a, b] if and only if p∗ ⊥ Pn−1. So, p∗ = pn.

Next, we give a characterization of the orthogonal polynomials.

The recurrence relation of Theorem 1.5.2 is not always the most conve-nient method for computing the orthogonal polynomials. Some other usefultechniques come from the following characterization theorem.

Theorem 1.5.6. Let w : [a, b]→ R be any continuous function. The functionϕ

k+1∈ C[a, b] satisfies the orthogonality conditions,

∫ b

a

w(x)ϕk+1

(x)p(x)dx = 0, p ∈ Pk, (1.5.5)

if and only if there exists a k+1-times differentiable function, u(x), x ∈ [a, b]that satisfies the relations:

w(x)ϕk+1

(x) = u(k+1)(x), x ∈ [a, b] (1.5.6)

and

u(i)(a) = u(i)(b) = 0, i = 0, 1, ..., k. (1.5.7)


Proof. If equations (1.5.6) and (1.5.7) hold, then the integration by partsgives the following identity:

∫ b

a

w(x)ϕk+1

(x)p(x)dx = (−1)k+1

∫ b

a

u(x)p(k+1)(x)dx.

Therefore, as p ∈ Pk, the orthogonality condition (1.5.5) holds.Conversely, when equation (1.5.5) is satisfied, we let u be defined by

expression (1.5.6), where the constants of integration are chosen to give thevalues

u(i)(a) = 0, i = 0, 1, ..., k.

Expression (1.5.6) is substituted in the integral (1.5.5). For each integerj, j = 0, 1, ..., k, let p = pj be the polynomial

pj(x) = (b− x)j , x ∈ [a, b],

and we apply integration by parts j+1 times to the left-hand side of expres-sion (1.5.5). Thus, we obtain

[(−1)ju(k−j)(x)p(j)j (x)]

∣∣∣b

a+ (−1)j+1

∫ b

a

u(k−j)(x)p(j+1)j (x) = 0.

Because p(j+1)j (x) = 0, it follows that

u(k−j)(b) = 0, j = 0, 1, ..., k,

which completes the proof.In order to apply this theorem to generate orthogonal polynomials, it is

necessary to identify a function u, satisfying the conditions (1.5.7), such thatthe function ϕ

k+1, defined by (1.5.6), is a polynomial of k + 1 degree. There

is no automatic method of identification, but in many important cases therequired function u is easy to recognize.

Example 1.5.7. If there are satisfied the relations (1.5.7), by letting u bethe function

u(x) = (x− a)k+1(x− b)k+1, x ∈ [a, b],

then it follows that ϕk+1∈ Pk+1, when the weight function w is a constant.


In other words, the polynomials

ϕj(x) =

dj

dxj[(x− a)j(x− b)j ], j = 0, 1, 2, ... (1.5.8)

satisfy the orthogonality conditions∫ b

a

ϕi(x)ϕ

j(x)dx = 0, i 6= j, i, j = 0, 1, 2, ....

Definition 1.5.8. The polynomial ln given by (1.5.8), i.e.,

ln(x) = Cdn

dxn[(x2 − 1)n]

is called the Legendre polynomial of degree n, relatively to the interval [−1, 1],and

ln(x) =n!

(2n)!

dn

dxn[(x2 − 1)n]

is the Legendre polynomial with the coefficient of xn equals to 1.

From (1.5.4) it follows the recurrence formula

ln+1(x) = xln(x)− n2

(2n−1)(2n+1)ln−1(x), n = 1, 2, ...,

with l0(x) = 1 and l1(x) = x.

Example 1.5.9. Let α and β be real constants that are both greater than−1. The polynomials (ϕ

i)i∈N, that satisfies the orthogonality conditions

∫ 1

−1

(1− x)α(1 + x)βϕi(x)ϕj(x)dx = 0, i 6= j, i, j = 0, 1, 2, ...

are called the Jacobi polynomials.In this case we require the function (1.5.6) to be a polynomial of degree

k + 1, multiplied by the weight function

w(x) = (1− x)α+k+1(1 + x)β+k+1, x ∈ [−1, 1].

Because the condition (1.5.7) is satisfied, it follows that the Jacobi polyno-mials are defined by the equation

ϕi(x) = (1− x)−α(1 + x)−β

di

dxi[(1− x)α+i(1 + x)β+i], i = 0, 1, 2, ...,

which is called the Rodrigue’s formula.


Remark 1.5.10. For α = β = 0 the Jacobi polynomials become the Legendrepolynomials.

Remark 1.5.11. Each family of orthogonal polynomials is characterized bythe weight function w and the interval [a, b].

Example 1.5.12. For the weight function w(x) = e−x and the interval[0,+∞), one obtains the Laguerre orthogonal polynomials gn, n = 0, 1, ...,i.e., ∫ ∞

0

e−xgi(x)gj(x)dx =

0, i 6= j,i!, j = i.

If u is the function

u(x) = e−xxn, 0 ≤ x <∞,

the conditions (1.5.7) are verified and the function ϕk+1, defined by equation(1.5.6), is from Pk+1. Hence, the Laguerre polynomials are

gn(x) = exdn

dxn(e−xxn), n = 0, 1, 2, ...,

with x ∈ [0,∞).The recurrence relation verified by the Laguerre polynomials is:

gn+1(x) =2n + 1− xn + 1

gn(x)− ngn−1(x), n = 1, 2, ...,

where

g0(x) = 1,

g1(x) = 1− x.

Example 1.5.13. The Hermite polynomials given by

hn(x) = (−1)nex2 dn

dxn(e−x

2

), n = 0, 1, 2, ...

fulfill the orthogonality conditions:

∫ ∞

−∞e−x

2

hm(x)hn(x)dx =

0, m 6= n2nn!√π, m = n.


So, hn, n = 0, 1, ..., are orthogonal polynomials on R, with respect to theweight function w(x) = e−x

2. They verify the recurrence relation

hn+1(x) = 2xhn(x)− 2nhn−1(x),

with

h0(x) = 1,

h1(x) = 2x.

Example 1.5.14. The polynomial Tn ∈ Pn defined by

Tn(x) = cos(n arccos x), x ∈ [−1, 1],

is called the Chebyshev polynomial of the first kind.

Property 1.5.15. 1) Algebraic form. For x = cos θ we have

Tn(cos θ) = cosnθ = 12(einθ + e−inθ)

= 12[(cos θ + i sin θ)n + (cos θ − i sin θ)n].

Hence,

Tn(x) = 12[(x+

√x2 − 1)n + (x−

√x2 − 1)n]. (1.5.9)

2) Tn(x) = 12n−1Tn(x).

From the algebraic form (1.5.9), one obtains

limx→∞

Tn(x)xn = 1

2limx→∞

[(1 +

√1− 1

x2

)n+

(1−

√1− 1

x2

)n]= 2n−1.

So, the coefficient of xn in Tn is 2n−1. It follows that

Tn = 12n−1Tn.

3) The orthogonality property.Starting with

∫ π

0

cosmθ cos nθdθ =

0, m 6= n,π2, m = n 6= 0,π, m = n = 0,


and taking cos θ = x, it follows that

∫ 1

−1

1√1−x2Tm(x)Tn(x)dx =

0, m 6= n,c 6= 0, m = n.

So, (Tn)n∈N are orthogonal polynomials on [−1, 1], with regard to the weightfunction w(x) = 1/

√1− x2.

4) Theorem 1.5.5 implies that∥∥∥Tn

∥∥∥w,2

= minP∈Pn

‖P‖w,2 .

5) The roots of the polynomial Tn are

xk = cos 2k−12n

π, k = 1, ..., n,

that are real, distinct and belonging to (−1, 1).6) The recurrence relation:

Tn+1(x) = 2xTn(x)− Tn−1(x), x ∈ [−1, 1],

with T0(x) = 1 and T1(x) = x.This is provided by the identity

cos[(n+ 1)θ] + cos[(n− 1)θ] = 2 cos θ cos nθ.

7) On the interval [−1, 1] we have∥∥∥Tn

∥∥∥∞

= minP∈Pn

‖P‖∞ .

Proof. We have ∥∥∥Tn∥∥∥∞

= 12n−1

andTn(x

′k) = (−1)k

2n−1 ,

wherex′k = cos kπ

n, k = 0, 1, ..., n,

are the roots of the polynomial T ′n, i.e.,

T ′n(x) =

n sin(n arccosx)√1− x2

.


Now, suppose that there exists a polynomial Pn ∈ Pn, with ‖Pn‖∞ < 12n−1 .

It follows thatPn = Tn + Pn−1, with Pn−1 ∈ Pn−1.

Hence,

Pn−1(x′k) = Pn(x

′k)− (−1)k

2n−1 .

As, ‖Pn‖ < 12n−1 , for k = 0, 1, ..., n− 1 we have

Pn−1(x′k) < 0, for k even,

Pn−1(x′k) > 0, for k odd.

It follows that Pn−1 changes its sign for n + 1 times in [−1, 1], i.e., it has atleast n roots. As the degree of Pn−1 is at most n−1 we obtain that Pn−1 = 0,hence Pn = Tn.

Example 1.5.16. The polynomial Qn ∈ Pn defined by

Qn(x) =sin((n+ 1) arccosx)√

1− x2, x ∈ [−1, 1]

is called the Chebyshev polynomial of the second kind.

Remark 1.5.17. The following relation is fulfilled:

Qn(x) = 1n+1

T ′n+1(x), x ∈ [−1, 1].

As the coefficient of xn is 2n−1, in Tn, it follows that

Qn = 12nQn.

We remark also that∫ 1

−1

√1− x2QmQndx =

0, m 6= n,π2, m = n,

i.e., (Qn)n∈N is an orthogonal sequence on [−1, 1], with respect to the weightfunction w(x) =

√1− x2. From the identity

sin(n+ 2)θ + sinnθ = 2 cos θ sin(n+ 1)θ,

one obtains the recurrence formula

Qn+1(x) = 2xQn(x)−Qn−1(x), n = 1, 2, ...,

with Q0(x) = 1 and Q1(x) = 2x.


Also, ∥∥∥Qn

∥∥∥w,2

= minP∈Pn

‖P‖w,2 .

A very important property of the polynomials Qn is that

∥∥∥Qn

∥∥∥w,1

= minP∈Pn

‖P‖w,1 , on [−1, 1],

i.e., the minimum norm property takes place also in the norm of L1w[−1, 1].

Chapter 2

Approximation of functions

2.1 Preliminaries

Functions are the basic mathematical tools for describing (modeling) manyphysical processes. Very frequently these functions are not known explicitlyand it is necessary to construct approximations of them, based on limitedinformation about the underlying process. As a physical process dependson one or more parameters, the problem which appears is to approximate afunction of one or several variables based on some limited information aboutit.

The most commonly approach for finding approximations to unknownfunctions is:

1) choose a reasonable class of functions A, in which to look for theapproximations;

2) select an appropriate approximation process for assigning a specificfunction to a specific problem.

The success of this approach heavily depends on the existence of theconvenient set A of approximating functions.

Such a set A should possess at least the following properties:

1) the functions in A should be relatively smooth;

2) the set A should be large enough so that arbitrary smooth functionscan be well approximated by elements of A;

3) the functions in A should be easy to store, manipulate and evaluateon a digital computer.

The study of various classes of approximating functions is the content of

41

42 Chapter 2. Approximation of functions

the approximation theory. The computational aspects of such approxima-tions are a major part of numerical analysis.

The most important classes of approximation functions used in this bookare: polynomials, polynomial splines and rational functions.

1) Polynomials (A ⊂ P).The polynomials have always played a central role in approximation the-

ory and numerical analysis. We note that the space Pm, of polynomials ofdegree at most m, has the following properties:

1. It is a finite dimensional linear space with a convenient basis (ei,i = 0, 1, ..., m, with ei(x) = xi).

2. Polynomials are smooth functions.3. The derivatives and primitives of the polynomials are also poly-

nomials whose coefficients can be found algebraically.4. The number of zeros of a polynomial of degree m cannot exceed

m.5. Various matrices arising in interpolation and approximation by

polynomials are usually nonsingular and they have strong sign-regularityproperties.

6. The sign structure and shape of a polynomial are strongly relatedto the sign structure of its set of coefficients.

7. Given any continuous function on an interval [a, b], there exists apolynomial which is uniformly close to it.

8. Precise rates of convergence can be given for approximation ofsmooth functions by polynomials.

9. Polynomials are easy to store, manipulate and evaluate on com-puter.

2) Polynomial splines (A ⊂ Sm).The main drawbacks of the space Pm for approximation of functions is

that the set Pm is relatively inflexible. Polynomials seem to be all right onsufficiently small intervals, but when we go to larger intervals often appearsevere oscillations, particulary for large m. This observation suggests that inorder to achieve a class of approximating functions with greater flexibility, weshould work with polynomials of relatively low degree and divide the intervalinto smaller pieces. This is the class of polynomial splines.

The space Sm of polynomial splines of order m have the following prop-erties:

1. Polynomial spline spaces are finite dimensional linear spaces withvery convenient bases.

2.1. Preliminaries 43

2. Polynomial splines are relatively smooth functions.3. The derivatives and primitives of polynomial splines are also poly-

nomial splines.4. Polynomial splines posses nice zeros properties analogous to those

for polynomials.5. Various matrices arising naturally in the use of splines in approx-

imation theory and numerical analysis have convenient sign and properties.6. The sign structure and shape of a polynomial spline can be related

to the sign structure of its coefficients.7. Every continuous function on an interval [a, b] can be well ap-

proximated by polynomial splines with the order m fixed, supposing that asufficient number of knots are allowed.

8. Precise rates of convergence can be given for approximation ofsmooth functions by splines.

9. Low-order splines are very flexible and do not exhibit the oscilla-tions usually associated with polynomials.

10. Polynomial splines are easy to store, manipulate and evaluate oncomputer.

3) Rational functions (A = Rp,q).Polynomials are not suitable to approximate, for example, a function

f : [a, b] → R for which a line of equation x = a − ε or x = a + ε is anasymptote, or for the case when f is an exponential function and so on.

In such cases a rational function can give better approximation than apolynomial.

In a computational structure sense, a rational function R is the one thatcan be evaluated as the quotient of two polynomials, i.e.,

R(x) =P (x)

Q(x)=a0 + a1x+ ...+ amx

m

b0 + b1x+ ...+ bnxn,

where am 6= 0, bn 6= 0. When n = 0 and b0 6= 0, then R reduces to apolynomial.

A rational functionR is finitely representable by the coefficients (a0, ..., am)and (b0, ..., bn), so it is computable by a finite number of arithmetic opera-tions, as in the polynomial or polynomial spline case.

Definition 2.1.1. Let B be a normed linear space of real-valued functionsdefined on Ω ⊂ Rn, A ⊂ B and f ∈ B.• P : B → A is an approximation operator;


• Pf is the approximation of f by the operator P ;• f = Pf +Rf is the approximation formula generated by P, where R is

the corresponding remainder operator;• if P (Pf) = Pf then the operator P is idempotent;• if P is linear and idempotent then P is a projector;• the smallest real number, denoted by ‖P‖ , with the property that

‖Pf‖ ≤ ‖P‖ ‖f‖ , for all f ∈ B

is the norm of the operator P.

Remark 2.1.2. The norm of an approximation operator is also called theLebesgue constant of the operator.

Definition 2.1.3. The number r ∈ N for which

Pf = f, f ∈ Pnr

and there exists g ∈ Pnr+1 such that Pg 6= g is called the degree of exactnessof the operator P, (denoted by dex(P )).

Remark 2.1.4. If P is linear then (Pf = f, f ∈ P nr and there exists g ∈ Pnr+1

such that Pg 6= g) is equivalent to (Pei1,...,in = ei1,...,in, for all i1 + ...+ in ≤ rand there exists (µ1, ..., µn) ∈ Nn with µ1 + ... + µn = r + 1 such thatPeµ1,...,µn

6= eµ1,...,µn), where eν1,...,νn

(x1,...,xn) = xν11 ..xνnn .

Suppose that B is a set of real-valued functions defined on Ω ⊂ Rn, A ⊂ Band let

Λ := λi | λi : B → R, i = 1, ..., Nbe a set of linear functionals. The information of f ∈ B corresponding to theset Λ is denoted by

Λ(f) := λi(f) | λi ∈ Λ, i = 1, ..., N.

Definition 2.1.5. The operator P : B → A for which

λi(Pf) = λi(f), i = 1, ..., N, for f ∈ B,

is called an interpolation operator that interpolates the set Λ. Formula

f = Pf +Rf

is the interpolation formula generated by P, and R is the remainder operator.

2.1. Preliminaries 45

Definition 2.1.6. If for a given Λ, the operator P exists, then Λ is calledan admissible set of functionals, respectively Λ(f) is an admissible set ofinformation.

Definition 2.1.7. If pi, i = 1, ..., N are fundamental (cardinal) interpolationelements suitable to the interpolation operator P, i.e.,

λkpi = δki, i, k = 1, ..., N

then

Pf =N∑

i=1

piλi(f).

Now, let Λνi

i ⊂ Λ be a subset of νi (νi < N) functionals of Λ, associatedto λi, i = 1, ..., N, respectively. We have

⋃N

i=1Λνi

i = Λ.

Let Qνi

i be the operator that interpolates the set Λνi

i .

Definition 2.1.8. The operator PQ defined by

PQ =N∑

i=1

piQνi

i

is called the combined operator of P and Qνi

i , i = 1, ..., N.

Remark 2.1.9. If P and Qνi

i , i = 1, ..., N are linear operators then thecombined operator PQ is also linear.

Definition 2.1.10. A sequence of subsets Λνi

i ⊂ Λ, i = 1, ..., N, for whichthe suitable interpolation operators Qνi

i , i = 1, ..., N exist for all i = 1, ..., N,is called an admissible sequence.

Theorem 2.1.11. Let P and Qνi

i , i = 1, ..., N, be the operators that inter-polate the sets Λ and Λνi

i , i = 1, ..., N, respectively. If Λ is an admissibleset and Λν1

1 , ...,ΛνN

N is an admissible sequence then the combined operator PQexists.

Proof. As Λ is an admissible set and Λν11 , ...,Λ

νN

N is an admissible sequence,we have that all the operators P and Qνi

i , i = 1, ..., N exist, so the prooffollows.


Theorem 2.1.12. Let P and Qνi

i , i = 1, ..., N, be some linear operators. If

dex (P ) ≥ 0(∑N

i=1 pi = 1)

and dex(Qνi

i ) = ri, i = 1, ..., N, then

dex(PQ) = rm := minr1, ..., rN.

Proof. As, dex(Qνi

i ) = ri and Qνi

i is linear, we have

Qνi

i ei1,...,in = ei1,...,in,

for all (i1, ..., in) ∈ Nn, with i1+ ...+in ≤ ri and there exists (µ1, ..., µn) ∈ Nn,with µ1 + ... + µn = ri + 1, such that

Qνi

i eµ1,...,µn6= eµ1,...,µn

,

for i = 1, ..., N, respectively. It follows that

Qνi

i ei1,...,in = ei1,...,in,

with i1 + ... + in ≤ rm, (rm = minr1, ..., rN), for all i = 1, ..., N and thereexists an index j, 1 ≤ j ≤ N, for which

Qνj

j eµ1,...,µn6= eµ1,...,µn

,

with µ1 + ... + µn = rm + 1. We have

PQei1,...,in =

N∑

k=1

pkQνk

k ei1,...,in = ei1,...,in

N∑

k=1

pk = ei1,...,in,

for all i1 + ... + in ≤ rm. From

Qνj

j eµ1,...,µn6= eµ1,...,µn

, when µ1 + ... + µn = rm + 1,

it follows thatPQeµ1,...,µn

6= eµ1,...,µn.

Finally, the linearity of PQ implies that

dex(PQ) = rm.

2.2. Univariate interpolation operators 47

2.2 Univariate interpolation operators

Suppose that B is a linear space of univariate real-valued functions definedon the interval [a, b] ⊂ R, A ⊂ B,

Λ = λi | λi : B → R, i = 1, ..., N

is a set of linear functionals and P : B → A is the interpolation operatorsuitable to Λ, i.e.,

λi(Pf) = λi(f), i = 1, ..., N, for f ∈ B.

The basic elements in the definition of on interpolation operator are the setsA and Λ. For a given pair (A,Λ), it is obtained a corresponding interpolationoperator.

For A will be considered the sets P, S and R of polynomial functions,natural spline functions, respectively, rational functions.

Supposing that xi ∈ [a, b], i = 0, ..., m are the interpolation nodes, for Λwill be considered the functionals of the following types:

• Lagrange type (ΛL):

ΛL = λi | λi(f) = f(xi), i = 0, ..., m;

• Hermite type (ΛH):

ΛH = λij | λij(f) = f (j)(xi), j = 0, ..., ri; i = 0, ..., m,

with ri ∈ N, i = 0, ..., m;

• Birkhoff type (ΛB):

ΛB = λij | λij(f) = f (j)(xi), j ∈ Ii, i = 0, ..., m,

withIi ⊂ 0, 1, ..., ri, for ri ∈ N, i = 0, ..., m.

Hence, for example, the pair (P,ΛL) gives rise to the polynomial interpo-lation operators of Lagrange type, the pair (S,ΛB) generates spline interpo-lation operators of Birkhoff type and so on.


2.2.1 Polynomial interpolation operators

Consider the polynomial interpolation operators of Lagrange (P,ΛL), Her-mite (P,ΛH) and Birkhoff (P,ΛB) type, briefly called Lagrange (Lm), Her-mite (Hn), respectively Birkhoff (Bk) interpolation operators, where m =|ΛL|−1, n = |ΛH |−1 and k = |ΛB|−1, i.e., the degree of the correspondingpolynomials.

Next we give the interpolation formulas generated by these operators.⋆ Lagrange interpolation formula:

f = Lmf +Rmf, (2.2.1)

with Lagrange interpolation polynomial given by:

(Lmf)(x) =

m∑

i=0

um(x)(x−xi)u′m(xi)

f(xi), (2.2.2)

with

um(x) =m∏i=0

(x− xi).

For f ∈ Cm[a, b] and such that there exists f (m+1) on (a, b), we have

(Rmf)(x) = um(x)(m+1)!

f (m+1)(ξ), a < ξ < b, (2.2.3)

respectively, while if f ∈ Cm+1[a, b] we have

(Rmf)(x) =

∫ b

a

ϕm(x, t)f (m+1)(t)dt, (2.2.4)

whereϕm(x, t) = Rm

[(·−t)m

+

m!

]

is the Peano’s kernel.

Remark 2.2.1. If f (m+1) is also continuous on [a, b] we have

‖Rmf‖∞ ≤‖um‖∞(m+1)!

∥∥f (m+1)∥∥∞ .

Minimal norm property: ‖Rmf‖∞ takes the minimum value when‖um‖∞ takes the minimum value, i.e., when um ≡ Tm+1(·, a, b), where Tm+1


is the Chebyshev polynomial of the first kind. It means that the interpolationnodes are the zeros of Tm+1 and

‖Rmf‖∞ ≤(b−a)m+1

(m+1)!22m+1

∥∥f (m+1)∥∥∞ .

⋆ Hermite interpolation formula:

f = Hnf +Rnf,

where

(Hnf)(x) =

m∑

i=0

ri∑

j=0

hij(x)f(j)(xi),

with

hij(x) = ui(x)(x−xi)j

j!

ri−j∑

ν=0

(x−xi)ν

ν!

[1

ui(x)

](ν)x=xi

and

un(x) =

m∏

i=0

(x− xi)ri+1, ui(x) = un(x)

(x−xi)ri+1 .

We present two representations of the remainder:- for f ∈ Cn[a, b], and such that there exists f (n+1) on (a, b), we have

(Rnf)(x) = un(x)(n+1)!

f (n+1)(ξ), a < ξ < b;

- for f ∈ Cn+1[a, b] we have

(Rnf)(x) =

∫ b

a

ϕn(x, t)f(n+1)(t)dt,

where ϕn is the corresponding Peano’s kernel.⋆ Birkhoff interpolation formula:

f = Bkf +Rkf,

with

(Bkf)(x) =

m∑

i=0

∑

j∈Ii

bij(x)f(j)(xi), (2.2.5)


where bij are obtained from the cardinality conditions:

b(p)ij (xν) = 0, ν 6= i, p ∈ Iν ,b(p)ij (xi) = δjp, p ∈ Ii,

(2.2.6)

for all j ∈ Ii and i = 0, ..., m. For f ∈ Ck+1[a, b] we have:

(Rkf)(x) =

∫ b

a

ϕk(n, t)f(k+1)(t)dt,

with

ϕk(a, t) = R[

(·−t)k+

k!

].

Particular cases of Birkhoff interpolation.

• Abel-Goncharov interpolation formula:

f = Pnf +Rnf,

with

(Pnf)(x) =n∑

k=0

gk(x)f(k)(xk),

where gk, k = 0, ..., n are called Goncharov polynomials of degree k, and theyhave the following expressions:

g0(x) = 1,

g1(x) = x− x0, (2.2.7)

gk(x) =

∫ x

x0

dt1

∫ t1

x1

dt2 · · ·∫ tk−1

xk−1

dtk

= 1k!

[xk −

k−1∑j=0

gj(x)(kj

)xk−jj

], k = 2, ..., n.

For n ∈ N, a, b ∈ R, a < b and a function f : [a, b]→ R, having n derivativesf (i), i = 1, 2, ..., n, the Abel-Goncharov interpolation problem consists infinding a polynomial Pnf of degree n such that:

(Pnf)(i)(xi) = f (i)(xi), 0 ≤ i ≤ n. (2.2.8)


If f ∈ Hn+1[a, b] then

(Rnf)(x) =

∫ b

a

ϕn(x, s)f(n+1)(s)ds,

with

ϕn(x, s) =(x−s)n

+

n!−

n∑

k=0

gk(x)(xk−s)n−k

+

(n−k)! . (2.2.9)

The determinant of the corresponding interpolation system is alwaysnonzero, so the problem (2.2.8) always has a unique solution.• Lidstone interpolation formula:

f = L∆mf +R∆

mf, (2.2.10)

with

(L∆mf)(x) =

N+1∑i=0

m−1∑j=0

rm,i,j(x)f(2j)(xi),

where rm,i,j, 0 ≤ i ≤ N + 1, 0 ≤ j ≤ m− 1 are the basic elements of the setLm(∆) := h ∈ C[a, b] : h is a polynomial of degree at most 2m− 1 in eachsubinterval [xi, xi+1], 0 ≤ i ≤ N, satisfying

D2νrm,i,j(xµ) = δiµδ2ν,j , 0 ≤ µ ≤ N + 1, 0 ≤ ν ≤ m− 1 (2.2.11)

and

rm,i,j(x) =

Λj

(xi+1−xh

)h2j , xi ≤ x ≤ xi+1, 0 ≤ i ≤ N

Λj

(x−xi−1

h

)h2j , xi−1 ≤ x ≤ xi, 1 ≤ i ≤ N + 1

0, otherwise

with ∆ denoting a fixed partition of [a, b].The Lidstone polynomial Λn of degree 2n+1, n ∈ N on the interval [0, 1],

is defined by:

Λ0(x) = x, (2.2.12)

Λ′′n(x) = Λn−1(x),

Λn(0) = Λn(1) = 0, n ≥ 1,

For f ∈ C2m−2[a, b], the Lidstone polynomial uniquely exists and it sat-isfies the interpolation conditions:

(L∆mf)(2k)(xi) = f (2k)(xi), 0 ≤ k ≤ m− 1, 0 ≤ i ≤ N + 1. (2.2.13)


The remainder is given by

(R∆mf)(x) =

∫ b

a

gm(x, s)f (2m)(s)ds, (2.2.14)

where gm(x, s) is the corresponding Peano’s kernel.Common properties:1. Evidently, all the operators Lm, Hn, Bk, Pn and L∆

m are interpolationoperators.

2. Each operator Lm, Hn, Bk, Pn and L∆m has the degree of exactness

equal to the degree of corresponding interpolation polynomial, i.e.,

dex(Lm) = m, dex(Hn) = n, dex(Bk) = k, dex(Pn) = n and dex(L∆m) = 2m−1.

3. All operators Lm, Hn, Bk, Pn and L∆m, and their corresponding re-

mainder operators are projectors.Existence and uniqueness conditions:

1. If xi 6= xj , for i 6= j; i, j = 0, ..., m the operators Lm and Hn exist andare unique.

2. A necessary condition for the existence of the Birkhoff operator Bk isgiven by the Polya condition. Let

E = (eij)mi=0,j∈Ii

be the (m+ 1)× (k + 1) matrix associated to ΛB, in which

eij =

1, if j ∈ Ii,0, if j /∈ Ii,

for all i = 0, ..., m. One notices that the number of 1’s in E is exactly k + 1.The matrix E is called the incidence matrix corresponding to the set ΛB.In particular, for ΛL the (m+ 1)× (m+ 1) incidence matrix is

EL =

1 0 ... 01 0 ... 0...

......

1 0 ... 0

.

Considering Hermite interpolation, the i-th row of the (m + 1) × (n + 1)incidence matrix EH is

(1, ..., 1,︸︷︷︸ri+1

0, ..., 0︸︷︷︸n−ri

), for i = 0, 1, ..., m.


An incidence matrix E satisfies the Polya condition if

r∑

j=0

m∑

i=0

eij ≥ r + 1, for all r = 0, ..., k.

It can be seen that the incidence matrices EL and EH satisfy the Polyacondition.

Next consider an example of Birkhoff interpolation.

Example 2.2.2. For f ∈ C3[a, b] one considers the set of information

ΛB(f) = f ′(0), f(h2), f ′(h).

The incidence matrix is

EB =

0 1 01 0 00 0 1

,

that verifies the Polya condition. Therefore, we look for the interpolationpolynomial, as a polynomial of degree 2 (k = 2):

(B2f)(x) = Ax2 +Bx+ C, (2.2.15)

that interpolates the information ΛB(f). One obtains

(B2f)′(0) := B = f ′(0)

(B2f)(h2) := h2

4A+ h

2B + C = f(h

2)

(B2f)′(h) := 2hA+B = f ′(h).

(2.2.16)

The determinant of this system is D = 2h 6= 0, hence B2f exists and isunique. There are two ways to find it.

1) Solving the system (2.2.16) directly. One obtains

A = f ′(h)−f ′(0)2h

,

B = f ′(0),

C = f(h2

)− h

8f ′(h)− 3h

8f ′(0),

so(B2f)(x) = f ′(h)−f ′(0)

2hx2 + f ′(0)x+ f

(h2

)− h

8f ′(h)− 3h

8f ′(0),


or(B2f)(x) = (2x−h)(3h−2x)

8hf ′(0) + f

(h2

)+ 4x2−h2

8hf ′(h).

2) Using the general form (2.2.5):

(B2f)(x) = b01(x)f′(0) + b10(x)f

(h2

)+ b21(x)f

′(h)

and finding the cardinal polynomials b01, b10 and b21. Each of these poly-nomials is of degree 2 (as in (2.2.15)) and they must satisfy the cardinalityconditions (2.2.6), i.e.:

b′01(0) = 1b01(h2

)= 0

b′01(h) = 0

b′10(0) = 0b10(

h2) = 1

b′10(h) = 0

b′21(0) = 0b21(

h2) = 0

b′21(h) = 1.

Solving these systems, we get

b01(x) = (2x−h)(3h−2x)8h

,

b10(x) = 1,

b21(x) = 4x2−h2

8h.

Now, regarding the remainder term of the interpolation formula

f = B2f +R2f

we have:

(R2f)(x) =

∫ h

0

ϕ2(x, t)f(3)(t)dt,

whereϕ2(x, t) = 1

2[(x− t)2

+ − (h2− t)2

+ − 4x2−h2

4h(h− t)].

By a straightforward computation one obtains that

ϕ2(x, t) ≥ 0, x ∈ [0, h2]

andϕ2(x, t) ≤ 0, x ∈ [h

2, h],

for t ∈ [0, h]. As, for a given x ∈ [0, h], ϕ2(x, t) do not change the sign, fort ∈ [0, h], and f (3) is continuous on [0, h], using the mean value theorem, oneobtains:

(R2f)(x) = f (3)(ξ)

∫ h

0

ϕ2(x, t)dt, 0 ≤ ξ ≤ h.


Finally, we have

(R2f)(x) = (2x−h)(2x2−2hx−h2)24

f (3)(ξ), 0 ≤ ξ ≤ h,

respectively,

‖R2f‖∞ ≤ h3

24

∥∥f (3)∥∥∞ .

2.2.2 Polynomial spline interpolation operators

Let Λ = λi | λi : Hm,2[a, b] → R, i = 1, ..., n be a set of linear functionals,y ∈ Rn and

U(y) =f ∈ Hm,2[a, b] | λi(f) = yi, i = 1, ..., n

. (2.2.17)

Definition 2.2.3. The problem of finding the elements s ∈ U(y) with theproperty that ∥∥s(m)

∥∥2

= infu∈U(y)

∥∥u(m)∥∥

2

is called a Polynomial Spline Interpolation Problem (denoted PSIP).

Remark 2.2.4. A solution s of a (PSIP) is the smoothest function (∥∥s(m)

∥∥2→

min), or in other words, the ”closest” function to a polynomial P ∈ Pm−1

(∥∥P (m)

∥∥2

= 0), that also satisfies the conditions λi(P ) = yi, i = 1, ..., n.

A solution of a (PSIP) is called a polynomial spline function that inter-polates y with respect to Λ, or a natural spline that interpolates y.

Next, we are looking for some existence and uniqueness conditions, basicproperties and a structural characterization for a solution of a (PSIP). Forthese, we need some results given by the following lemmas.

Lemma 2.2.5. Let ei, i = 0, ..., m − 1 be a basis of Pm−1 and let k ∈ N,0 ≤ k ≤ m − 1. There exists a linear functional λk, defined on Hm,2[a, b],such that

λk(ei) = δki, i = 0, ..., m− 1.

Proof. Considering f ∈ Hm,2[a, b], it is known that it can be representeduniquely in the form

f(x) = pf(x) +

∫ x

a

(x−t)m−1

(m−1)!f (m)(t)dt,


with pf ∈ Pm−1. Denote

pf =m−1∑

i=0

αi(f)ei

the representation of pf in terms of ei, i = 0, ..., m− 1. Thus, the functionalλk = αk(f) has the desired property.

Lemma 2.2.6. Let f ∈ Lp[c, d], 1 < p <∞. If

∫ d

c

f(x)ϕ(x)dx = 0, ϕ ∈ C∞(c, d),

then f vanishes almost everywhere on (c, d), which is denoted by fa.e.= 0, on

(c, d).

Proof. Assume f > 0 on a set I ⊂ (c, d), with positive measure. Denotingby CI the characteristic function of the set I, we have f p−1CI ∈ Lp

′

(c, d),where 1

p+ 1

p′= 1. Indeed,

∫ d

c

[(f(x))p−1CI(x)

]p′dx =

∫

I

[(f(x))p−1

]p′dx =

∫

I

(f(x))pdx.

Note that the latter integral is well defined, since f ∈ Cp[c, d]. Taking intoaccount that C∞[c, d] is dense in Lp

′

[c, d], i.e., for every g ∈ Lp′

[c, d] thereexists a sequence (ϕk)k∈N ⊂ C∞[c, d], such that g = limk→∞ ϕk, one obtains

∫ d

c

(f(x))pdx =

∫ d

c

f(x)(f(x))p−1CI(x)dx = limk→∞

∫ d

c

f(x)ϕk(x)dx = 0,

that contradicts the assumption f(x) > 0, x ∈ I.

Lemma 2.2.7. Let f ∈ Cp[c, d], 1 < p <∞. If

∫ d

c

f(x)ϕ(m)(x)dx = 0, ϕ ∈ C∞m−1[c, d], (2.2.18)

then there exists a polynomial pf ∈ Pm−1 such that fa.e.= pf , on (c, d), where

C∞m−1[c, d] =

f ∈ C∞[c, d] | f (j)(c) = f (j)(d) = 0, j = 0, ..., m− 1

.


Proof. We construct pf ∈ Pm−1 such that

〈f − pf , ei〉2 = 0, i = 0, ..., m− 1.

Taking into account Lemma 2.2.6, it is sufficient to prove that g = f − pfsatisfies the condition

〈g, ψ〉2 = 0, ψ ∈ C∞[c, d].

Let ψ ∈ C∞[c, d] and construct the polynomial pψ ∈ Pm−1, such that

〈ψ − pψ, ei〉2 = 0, i = 0, ..., m− 1.

Let

ϕ(x) =

∫ x

c

(x−t)m−1

(m−1)!(ψ − pψ)(t)dt.

We haveϕ(j)(c) = 0, j = 0, ..., m− 1

and from the orthogonality condition ψ − pψ ⊥ Pm−1, we also get

ϕ(j)(d) = 0, j = 0, ..., m− 1.

Thus, it follows ϕ ∈ C∞m−1[c, d]. Using f − pf ⊥ Pm−1 and ψ − pψ ⊥ Pm−1,

one obtains

〈f − pf , ψ〉2 = 〈f − pf , ψ − pψ〉2 = 〈f, ψ − pψ〉2 =⟨f, ϕ(m)

⟩2

= 0,

according to (2.2.18). Hence,

f − pf a.e.= 0, on (c, d),

which impliesfa.e.= pf , on (c, d).

Remark 2.2.8. The problem

(PSIP ) find s ∈ U(y) such that∥∥s(m)

∥∥2

= infu∈U(y)

∥∥u(m)∥∥

2

is equivalent to the best approximation problem in L2[a, b], i.e.,

(BAP ) find σ ∈ U (m)(y) such that ‖σ‖2 = infv∈U (m)(y)

‖v‖2 ,

whereU (m)(y) =

v | v = u(m), u ∈ U(y)

.


Theorem 2.2.9. (Existence theorem). If the functionals λi ∈ Λ, i = 1, ..., nare bounded and U(y) given in (2.2.17) is a non-empty set then (PSIP) hasa solution.

Proof. Using the equivalence from Remark 2.2.8 and taking into accountthat the (BAP) has a unique solution if U (m)(y) ⊂ L2[a, b] is: 1) non-empty,2) convex and 3) closed, we have to check that U (m)(y) verifies these threeconditions.

1) Since U(y) was assumed to be non-empty, U (m)(y) is also non-empty.2) The functionals λi, i = 1, ..., n are linear. Thus, for u1, u2 ∈ U(y) and

α ∈ [0, 1], we have:

λi(αu1 + (1− α)u2) = αλi(u1) + (1− α)λi(u2)

= αyi + (1− α)yi = yi.

So, αu1 + (1− α)u2 ∈ U(y) which implies that U(y) is convex. Since, U(y)is convex and Dm is linear it follows that U (m)(y) is also a convex set.

3) Let (gk)k∈N be a sequence in U (m)(y) that converges to a functiong ∈ L2[a, b]. We have to prove that g ∈ U (m)(y). We show that there exists apolynomial p ∈ Pm−1 such that the function

f = p + h,

with

h(x) =

∫ x

a

(x−t)m−1

(m−1)!g(t)dt,

belongs to U(y). In this case

f (m) = g ∈ U (m)(y).

Since λi, i = 1, ..., n are bounded, U(y) is a closed set. Thus, it is sufficientto show that f is the limit of a sequence of functions from U(y), in the norm‖·‖m,2 of Hm,2[a, b]. Assume that fk ∈ U(y) are such that

gk = f(m)k , k ∈ N.

It follows that

fk = pk + hk,


with pk ∈ Pm−1 and

hk(x) =

∫ x

a

(x−t)m−1

(m−1)!gk(t)dt.

The sequence (hk)k∈N converges to h. In order to find the limit of the sequence(pk)k∈N, we assume first that there exist m linear independent functionalsλi ∈ Λ, i = 1, ..., m, on Pm−1. In particular, we assume that λi are such thatthe matrix

A = (λi(vj))i,j=1,...,m , with vj(x) = (x− a)j−1,

is non-singular. Then,

pk(a)...

p(m−1)k (a)

= A−1

λ1pk

...

λmpk

.

Since λi(fk) = yi are bounded and (λi(hk))k∈Nconverges to λi(h), i = 1, ..., n

it follows that λi(pk), i = 1, ..., m are uniformly bounded. Hence, each of the

sequences (p(j)k (a))k∈N, j = 0, ..., m− 1, is uniformly bounded, which implies

that each of the m sequences contains a convergent subsequence (p(j)kν

(a))kν∈N.Let

p(j)(a) = limkν→∞

p(j)kν

(a), j = 0, ..., m− 1.

These limits define a polynomial p ∈ Pm−1. The sequence (fk)k∈N with fk =pk + hk, converges to f = p+ h, which completes the proof in this case.

Assume now that there exist only d < m linear independent functionalsλi ∈ Λ, i = 1, ..., d, on Pm−1, i.e., the rank of the matrix

(λi(vj))i=1,...,n;j=1,...,m

is equal to d. Let λi, i = 1, ..., d and vj, j = 1, ..., m be a renumbering of vj ,j = 1, ..., m such that the rank of the matrix (λi(vj))i,j=1,...,d is d. According

to Lemma 2.2.5, one can construct the functionals λi, i = d + 1, ..., m, onHm,2[a, b], such that

λi(vj) = δij , i = d+ 1, ..., m; j = 1, ..., m.


It follows that the functionals λi, i = 1, ..., m are linear independent on Pm−1

and the proof is reduced to the previous case. Note that in the representationof fk, the polynomials pk have to be chosen such that

λi(pk) = 0, i = d+ 1, ..., m.

Thus, the (BAP) has an unique solution. This implies that the equivalent(PSIP) has a solution.

Theorem 2.2.10. If s1, s2 are two solutions of (PSIP) then

s1 − s2 ∈ Pm−1.

Proof. As the equivalent (BAP) has a unique solution σ, it follows that

s(m)1 = s

(m)2 = σ,

or(s1 − s2)

(m) = 0,

which is equivalent tos1 − s2 ∈ Pm−1.

Theorem 2.2.11. A (PISP) has an unique solution if and only if

Pm−1 ∩ U(0) = 0. (2.2.19)

Proof. Let s1 be a solution of (PSIP). Suppose that there exists

v ∈ Pm−1 ∩ U(0), v 6= 0.

Thens2 = s1 + v

is also a solution of (PSIP), since

λi(s2) = λi(s1) + λi(v) = λi(s1) = yi

and ∥∥∥s(m)2

∥∥∥2

=∥∥∥s(m)

1

∥∥∥2.


As s2 6= s1, it means that (PSIP) has at least two solutions and the necessityis proved. For the sufficiency, we assume that s1 and s2 are solutions of(PSIP). Then, by Theorem 2.2.10 it follows that s1 − s2 ∈ Pm−1 and by

λi(s1 − s2) = λi(s1)− λi(s2) = yi − yi = 0

we get that s1 − s2 ∈ U(0). Thus,

s1 − s2 ∈ Pm−1 ∩ U(0)

and from (2.2.19) it follows that

s1 = s2.

By Theorem 2.2.9 we have that the existence of a solution of (PSIP) ischaracterized by the set of functionals Λ. The question is whether the set Λcan also characterize the uniqueness of the solution. The answer is given bythe following result.

Theorem 2.2.12. If Λ contains at least m functionals of Hermite type, then

Pm−1 ∩ U(0) = 0.

Proof. Consider p ∈ Pm−1 ∩ U(0) and let M = λ1, ..., λr ⊂ Λ, with r ≥ m,be a set of Hermite functionals. Since p ∈ U(0), we have

λi(p) = 0, i = 1, ..., r.

Thus, p has at least r zeros (r ≥ m). Taking into account that p ∈ Pm−1, itfollows that p = 0.

The next theorem states one of the most important properties of thespline interpolant, namely the orthogonality property.

Theorem 2.2.13. The function s ∈ U(y) is a solution of (PSIP) if and onlyif ⟨

s(m), g(m)⟩2

= 0, g ∈ U(0). (2.2.20)

Proof. It is used again the equivalence between (PSIP) and (BAP) (Remark2.2.8). Denote by σ := s(m). Since U (m)(0) is a linear subspace of L2[a, b] and


U (m)(y) is a translation of U (m)(0), it follows that σ ∈ U (m)(y) is a solutionof the (BAP) if and only if σ ⊥ U (m)(0), or

〈σ, g〉2 = 0, g ∈ U (m)(0),

i.e., ⟨s(m), g(m)

⟩2

= 0, g ∈ U(0).

According to Theorem 2.2.13, the orthogonality property (2.2.20) com-pletely characterizes the solution of (PSIP), so it is natural to investigate theset

S(Λ) =f ∈ Hm,2[a, b] |

⟨f (m), g(m)

⟩2

= 0 g ∈ U(0).

Theorem 2.2.14. The set S(Λ) is a closed linear subspace of Hm,2[a, b].

Proof. Let f1, f2 ∈ S(Λ), i.e.,⟨f

(m)i , g(m)

⟩2

= 0, g ∈ U(0), for i = 1, 2.

Taking f = α1f1 + α2f2, we have⟨f (m), g(m)

⟩2

= α1

⟨f

(m)1 , g(m)

⟩

2+ α2

⟨f

(m)2 , g(m)

⟩

2= 0, for g ∈ U(0),

hence f ∈ S(Λ). It follows that S(Λ) is a linear subspace of Hm,2[a, b].In order to show that S(Λ) is closed, we consider a sequence (fk)k∈N ⊂

S(Λ) that converges to f ∈ Hm,2[a, b]. We have to prove that f ∈ S(Λ). Sincefk ∈ S(Λ), we have

⟨f

(m)k , g(m)

⟩

2= 0, g ∈ U(0),

and fromlimk→∞

fk = f, in Hm,2[a, b],

one obtainslimk→∞〈fk − f, g〉m,2 = 0,

which implieslimk→∞

⟨(fk − f)(m), g(m)

⟩2

= 0.

Consequently,⟨f (m), g(m)

⟩2

= limk→∞

[⟨f (m) − f (m)

k , g(m)⟩

2+⟨f

(m)k , g(m)

⟩2

]= 0, for g ∈ U(0),

hence, f ∈ S(Λ).


Remark 2.2.15. The orthogonality property (2.2.20) implies that⟨p(m), g(m)

⟩2

= 0, for p ∈ Pm−1,

i.e.,Pm−1 ⊂ S(Λ).

Definition 2.2.16. The space S(Λ) is called the space of spline functionsinterpolating U(y).

Theorem 2.2.17. Let (vi)i=1,...,d be a basis of the space Pm−1 ∩U(0), and sia solution of (PSIP) that interpolates the set

Ui =f ∈ Hm,2 [a, b] | λi (f) = δij , j = 1, ..., n

,

for i = 1, ..., n. Then, the set (si)i=1,...,n ∪ (vj)j=1,...,d is a basis for S (Λ) .

Proof. Let s ∈ S (Λ) and

h = s−n∑

i=1

siλi (s) . (2.2.21)

Then h ∈ S (Λ) and h ∈ U(0). From (2.2.20) it follows that⟨h(m), h(m)

⟩2

= 0.

Thus, h(m) = 0, implying h ∈ Pm−1. Hence, we have h ∈ Pm−1 ∩ U(0). Since(vj)j=1,...,d is a basis of Pm−1 ∩ U(0), we can write

h =d∑

j=1

cjvj

and using (2.2.21), we obtain

s =

n∑

i=1

siλi (s) +

d∑

j=1

cjvj .

In order to show that the elements of the set (si)i=1,...,n∪(vj)j=1,...,d are linearlyindependent, we start with

n∑

i=1

aisi +d∑

j=1

vjbj = 0. (2.2.22)


We have

λk

(n∑

i=1

aisi +d∑

j=1

vjbj

)= ak = 0, k = 1, ..., n,

since λk (si) = δki and λk (vj) = 0, j = 1, ..., d (vj ∈ U(0)) . Substitutingak = 0 in (2.2.22), it yields

d∑

j=1

vjbj = 0.

Because vj , j = 1, ..., d are linearly independent, we obtain bj = 0, j = 1, ..., d.Hence, (2.2.22) holds if and only if ai = 0, i = 1, ..., n and bj = 0, j = 1, ..., d.This implies that s1, ..., sn, v1, ..., vd are linearly independent.

Remark 2.2.18. IfPm−1 ∩ U(0) = 0,

(i.e., (PSIP) has a unique solution), then

dimS(Λ) = n

and s1, ..., sn is a basis of S(Λ).

The functions si, i = 1, ..., n are called fundamental interpolation splinesor cardinal splines, due to the property that

λk(si) = δki, k, i = 1, ..., n.

Also a very important property for the solutions of a (PSIP) is theirstructural characterization.

Obviously, the structure of a solution of (PSIP) essentially depends onthe set Λ that defines the interpolatory set U(y).

In the sequel we suppose that Λ is a set of Birkhoff type functionals,

Λ =λij | λijf = f (j) (xi) , i = 1, ..., k, j ⊂ Ii

, (2.2.23)

where a ≤ x1 < ... < xk ≤ b is a partition of the interval [a, b] , r1, ..., rk ∈ N,with ri ≤ m− 1 and Ii ⊆ 0, 1, ..., ri , i = 1, ..., k.

Theorem 2.2.19. (Structural Characterization) Let Λ be a set of Birkhofftype functionals given by (2.2.23), y ∈ Rn and let U(y) be the correspondinginterpolatory set. The function s ∈ U(y) is a solution of (PSIP) if and onlyif:


1) s(2m) (x) = 0, x ∈ [x1, xk] \ x1, ..., xk ,2) s(m) (x) = 0, x ∈ (a, x1) ∪ (xk, b) ,3) s(2m−1−µ) (xi − 0) = s(2m−1−µ) (xi + 0) , µ ∈ 0, 1, ..., m− 1 \ Ii, for

i = 1, ..., k.

Proof. Let s ∈ U(y) be a solution of (PSIP).1) Let I be one of the intervals (xi, xi+1) , i = 0, 1, ..., k (x0 = a and

xk+1 = b). Considering ϕ ∈ C∞m−1 (I) , we define the function

g (x) =

ϕ (x) , x ∈ I0, x ∈ [a, b] \ I.

Since g ∈ U(0) and s is a solution of (PSIP), we have (the orthogonalityproperty): ∫ b

a

s(m)(x)g(m) (x) dx = 0.

As g(m) = ϕ(m) on I, we obtain

∫ b

a

s(m) (x) g(m) (x) dx =

∫

I

s(m) (x)ϕ(m) (x) dx = 0.

Using Lemma 2.2.7, it follows

s(m) a.e.= p ∈ Pm−1, on I.

But s ∈ Hm,2 [a, b] , hence

s (x) = ps (x) +

∫ x

a

(x−t)m−1

(m−1)!s(m) (t) dt, ps ∈ Pm−1.

Consequently, s ∈ P2m−1 on the interval I. Thus s(2m) (x) = 0, x ∈ I, whichimplies 1).

2) Now, we denote I := [a, x1) 6= ∅. Since s ∈ P2m−1 on I, we haves(m) ∈ C (I) . Consider the function

s (x) =

p (x) , x ∈ [a, x1] ,s (x) , x ∈ (x1, b],

where p ∈ Pm−1 is such that

p(j) (x1) = s(j) (x1) , j = 0, 1, ..., m− 1.


Notice first that s ∈ U(y). Indeed,

λi (s) = λi (s) , i = 1, ..., n.

Assume that there exists ξ ∈ I, such as s(m) (ξ) 6= 0. Since s(m) is continuouson I, there exists a neighborhood V (ξ) of ξ, such that

s(m) (x) 6= 0, x ∈ V (ξ) .

This leads to the inequality

∥∥s(m)∥∥

2<∥∥s(m)

∥∥2,

which contradicts the hypothesis that s is a solution of (PSIP). Hence,

s(m) (x) = 0, x ∈ I.

The case I = (xk, b) can be treated similarly.3) Let g ∈ Hm,2 [a, b] . One has

⟨s(m), g(m)

⟩2

=

∫ b

a

s(m) (x) g(m) (x) dx =

k∑

i=0

∫ xi+1

xi

s(m) (x) g(m) (x) dx.

Integrating by parts in the above relation, it yields

⟨s(m), g(m)

⟩2

=

∫ x1

a

s(m) (x) g(m) (x) dx+

∫ b

xk

s(m) (x) g(m) (x) dx (2.2.24)

+k−1∑i=1

(−1)m∫ xi+1

xi

s(2m) (x) g (x) dx

+k∑i=1

m−1∑j=0

(−1)m−j g(j) (xi)[s(2m−1−j) (xi + 0)− s(2m−1−j) (xi − 0)

].

If x1 = a, the first integral is missing and in x1 the saltus is considered to bethe limit on the right in x1. The case xk = b is analogously treated.

Consider now an ε > 0 such that (xi − ε, xi + ε)∩xii=1,...,k = xi , andµ ∈ 0, 1, ..., m− 1 \ Ii. We construct h ∈ P3m−1 that satisfies conditions

h(j) (xi − ε) = h(j) (xi + ε) = 0,

h(j) (xi) = δjµ,


for j = 0, 1, ..., m− 1. We define the function g by

g (x) =

h (x) , x ∈ J = [xi − ε, xi + ε] ∩ [a, b]0, otherwise.

We have g ∈ U(0), hence

∫ b

a

s(m) (x) g(m) (x) dx = 0.

Taking into account the latter equality and 1), 2), (2.2.24), we obtain

m−1∑j=0

(−1)m−j g(j) (xi)[s(2m−1−j) (xi + 0)− s(2m−1−j) (xi − 0)

]= 0, i = 1, ..., k.

Since,

g(j) (xi) = δjµ,

we get

s(2m−1−µ) (xi + 0)− s(2m−1−µ) (xi − 0) = 0, i = 1, ..., k.

Conversely, if statements 1)–3) are satisfied, then, from (2.2.24), it follows

⟨s(m), g(m)

⟩2

= 0, g ∈ U(0),

thus s is solution of (PSIP).

Remark 2.2.20. The Structural Characterization Theorem states that thesolution s of (PSIP) is a polynomial of 2m−1 degree on each interior interval(xi, xi+1) and it is a polynomial of m− 1 degree on the intervals [a, x1) and(xk, b]. Furthermore, the derivative of order 2m− 1−µ is continuous in xi ifthe value of the µ-th order derivative of f , in xi, does not belong to Λ.

A solution of a (PSIP) is also called (natural) spline of order 2m− 1.In conclusion, according to Theorems 2.2.13 and 2.2.19 we can state the

following result.

Theorem 2.2.21. If Λ is a set of Birkhoff type linear functionals, U(y),y ∈ Rn is the corresponding interpolation set and S (Λ) is the set of splinesthat interpolate U(y), then the following three statements are equivalent:


(A) s ∈ S (Λ)⇐⇒∥∥s(m)

∥∥2

= infu∈U(y)

∥∥u(m)∥∥

2

(B) s ∈ S (Λ)⇐⇒⟨s(m), g(m)

⟩2

= 0, g ∈ U(0)

(C) s ∈ S (Λ)⇐⇒

1) s(2m) (x) = 0, x ∈ [x1, xk] \ xii=1,...,k

2) s(m) (x) = 0, x ∈ [a, x1) ∪ (xk, b]3) s(2m−1−µ) (xi + 0)− s(2m−1−µ) (xi − 0) = 0,

with µ ∈ 0, 1, ..., m− 1 \ Ii, i = 1, ..., k.Consequently, each of the three statements above can be used as definition

of the solution s of (PSIP) and the other two can be proved as theorems.In the present case, (A) was taken as definition (Definition 2.2.3), while (B)was proved as The Orthogonality Theorem (Theorem 2.2.13), respectively,(C) as The Structural Characterization Theorem (Theorem 2.2.19).

Suppose now that (PSIP) has an unique solution.

Theorem 2.2.22. Consider y = (y1, ..., yn) ∈ Rn and let si, i = 1, ..., n bethe fundamental interpolation splines. Then the function

sy =n∑

i=1

siyi

is the solution of (PSIP) corresponding to the set U (y) .

Proof. One has

λk (sy) = yk, k = 1, ..., n,

thus sy ∈ U(y). Since si are the solutions of (PSIP) that interpolate the setsUi, i = 1, ..., n, respectively, by Theorem 2.2.13 it follows that

⟨s(m)i , g(m)

⟩

2= 0, g ∈ U(0),

for all i = 1, ..., n. Consequently,

⟨s(m)y , g(m)

⟩2

=n∑

i=1

yi

⟨s(m)i , g(m)

⟩2

= 0, g ∈ U(0),

hence,

sy ∈ S (Λ) .


Remark 2.2.23. Theorem 2.2.22 can be interpreted as follows: For a givenf ∈ Hm,2 [a, b] and a given set of functionals Λ, the function

Sf =

n∑

i=1

siλi (f) (2.2.25)

is a spline that interpolates f with respect to Λ, i.e.,

λi (Sf) = λi (f) , i = 1, ..., n.

Furthermore, Sf is the smoothest function in Hm,2 [a, b] that interpolates f,in the sense that ∥∥∥(Sf)(m)

∥∥∥2→ min .

Remark 2.2.24. From (2.2.25) one can notice that S : Hm,2 [a, b] → S (Λ)is a linear and idempotent operator, hence it is a projector.

Definition 2.2.25. The operator S : Hm,2 [a, b]→ S (Λ) is called polynomialspline interpolation operator suitable to Λ.

Next, there will be considered the spline operator S for particular sets offunctionals Λ.

2.2.3 Spline operators of Lagrange type

For a given function f : [a, b]→ R and xi ∈ [a, b], i = 1, ..., n, we consider

Λ := ΛL = λi | λi(f) = f(xi), i = 1, ..., n.

By Theorem 2.2.12 it follows that if n ≥ m then for every f ∈ Hm,2[a, b]the interpolation spline function SLf exists and is unique.

Definition 2.2.26. For Λ := ΛL, the corresponding spline operator SL iscalled the spline operator of Lagrange type.

For determining SL one can use Theorem 2.2.19, but with the remarkthat the third condition from this theorem becomes 3′) SLf ∈ C2m−2[a, b].Writing the function Sf as

(SLf)(x) =m−1∑

i=0

aixi +

n∑

j=1

bj(x− xj)2m−1+ , (2.2.26)


it follows that the conditions 1), 2) for x ∈ [a, x1) and 3′) are verified. Indeed,

(SLf)(x) =

m−1∑

i=0

aixi +

k∑

j=1

bj(x− xj)2m−1,

for x ∈ (xk, xk+1), k = 1, ..., n− 1,

(SLf)(x) =m−1∑

i=0

aixi, x ∈ [a, x1),

and

(SLf)(ν)(xj − 0) = (SLf)(ν)(xj + 0), ν = 0, ..., 2m− 2, j = 1, ..., n.

So, SLf ∈ P2m−1 on each interval (xj , xj+1), j = 1, ..., m− 1, SLf ∈ Pm−1 on[a, x1) and SLf ∈ C2m−2[a, b]. On the interval (xn, b] we have

(SLf)(x) =

m−1∑

i=0

aixi +

n∑

j=1

bj(x− xj)2m−1,

so SLf ∈ P2m−1. Hence, the condition 2) is not verified on this interval.

Writing the polynomial SLf in the Taylor’s form, i.e.,

(SLf)(x) =2m−1∑

k=0

(x−α)k

k!(SLf)(k)(α), α > xn,

it follows that it reduces to a polynomial of m − 1 degree on (xn, b], if wesuppose that

(SLf)(p)(α) = 0, p = m, ..., 2m− 1. (2.2.27)

Consequently, if the conditions (2.2.27) are verified then all the three prop-erties of a spline function are fulfilled.

Regarding the expression from (2.2.26), we notice that the function SLfdepends on m + n parameters, ai, bj ∈ R, i = 0, ..., m − 1; j = 1, ..., n. Wealso notice that (2.2.27) and the interpolation conditions

(SLf)(xi) = f(xi), i = 1, ..., n


provide a (m+ n)× (m+ n) linear algebraic system:

(SLf)(p)(α) = 0, p = m, ..., 2m− 1(SLf)(xi) = f(xi), i = 1, ..., n.

(2.2.28)

For n ≥ m, the function SLf may also be written in the form

SLf =n∑

k=1

skf(xk),

where sk, k = 1, ..., n are the fundamental interpolation spline functions.So, the determination of SLf reduces to the problem of finding the func-tions sk, k = 1, ..., n. For doing this we write these functions in some formscorresponding to (2.2.26), i.e.,

sk(x) =m−1∑

i=0

aki xi +

n∑

j=1

bkj (x− xj)2m−1+ , k = 1, ..., n,

with aki , i = 0, ..., m− 1 and bkj , j = 1, ..., n obtained as the solutions of thelinear algebraic system:

s(p)k (α) = 0, p = m, ..., 2m− 1 and α > xnsk(xν) = δkν , ν = 1, ..., n

(2.2.29)

for k = 1, ..., n. We remark that the matrices of all these systems coincideand only the free terms are different.

Definition 2.2.27. The formula

f = SLf +RLf

is called the spline interpolation formula of Lagrange type, where RL is theremainder operator.

We have dex(SL) = m− 1 and applying the Peano’s theorem, we obtain

(RLf)(x) =

∫ b

a

ϕL(x, t)f(m)(t)dt,

where

ϕL(x, t) =(x−t)m−1

+

(m−1)!−

n∑

i=1

si(x)(xi−t)m−1

+

(m−1)!.


If f (m) is continuous on [a, b], then

|(RLf)(x)| ≤∥∥f (m)

∥∥∞

∫ b

a

|ϕL(x, t)|dt.

Example 2.2.28. Let f ∈ C[0, 1] and

ΛL(f) =f(in

)∣∣ i = 0, ..., n.

We are looking for a linear spline interpolation formula.We have

f = S1f +R1f,

where

(S1f)(x) =

n∑

i=0

si(x)f(in

)

and

(R1f)(x) =

∫ 1

0

ϕ1(x, t)f′(t)dt,

with

ϕ1(x, t) = (x− t)0+ −

n∑

i=0

si(x)(in− t)0+.

We have to find the functions si, i = 0, ..., n. We notice that

si(x) = ai0 +n∑

j=0

bij(x− j

n

)+, i = 0, ..., n,

and the coefficients ai0, bi0, ..., b

in, i = 0, ..., n are provided by the conditions:

si(kn

)= δik, k = 0, ..., n

s′i(α) = 0, α > 1 (we take α = 2).

We obtain

s0(x) = 1− nx+ n(x− 1

n

)+

s1(x) = nx− 2n(x− 1

n

)+

+ n(x− 2

n

)+

s2(x) = n(x− 1

n

)+− 2n

(x− 2

n

)+

+ n(x− 3

n

)+

...

sn−1(x) = n(x− n−2

n

)+− 2n

(x− n−1

n

)+

+ n (x− 1)+

sn(x) = n(x− n−1

n

)+− n (x− 1)+ .


2.2.4 Spline operators of Hermite type

For a function f ∈ Hm,2[a, b] we have the set of Hermite type functionals

ΛH = λkj | λkjf = f (j)(xk), k = 1, ..., n, j = 0, ..., rk,

where rk ∈ N, rk < m.

Definition 2.2.29. The spline operator S corresponding to the set of Her-mite type functionals, ΛH , is called the spline operator of Hermite type.

The spline function of Hermite type can be written as

(SHf)(x) =

m−1∑

i=0

aixi +

n∑

k=1

rk∑

j=0

bkj(x− xk)2m−j−1+ , (2.2.30)

with the parameters ai, bkj given by conditions 1)–3) from Theorem 2.2.19and by the interpolation conditions:

(SHf)(j)(xk) = f (j)(xk), k = 1, ..., n; j = 0, ..., rk.

It is easy to check that SHf, written as in (2.2.30), verifies the conditions1), 2) on [a, x1) and 3), from Theorem 2.2.19. For the condition 2) to takeplace also on (xn, b], it is sufficient to consider

(SHf)(p)(α) = 0, p = m, ..., 2m− 1,

that is analogous with (2.2.27). Therefore, the parameters ai, i = 0, ..., m−1and bkj , k = 1, ..., n, j = 0, ..., rk, (their number is N := n + m +

∑nk=1rk),

are determined as the solution of the N ×N linear algebraic system:

(SHf)(j)(xk) = f (j)(xk), k = 1, ..., n; j = 0, ..., rk(SHf)(p)(α) = 0, p = m, ..., 2m− 1.

If |ΛH| ≥ m, by Theorem 2.2.12, it follows that the spline function SHfexists and is unique, so the system is compatible.

Also, the function SHf can be represented using the fundamental splinefunctions, i.e.,

(SHf)(x) =n∑

k=1

rk∑

j=0

skj(x)f(j)(xk), (2.2.31)


where

skj(x) =m−1∑

µ=0

akjµ xµ +

n∑

µ=1

rk∑

ν=0

bkjµν(x− xµ)2m−ν−1+ ,

for k = 1, ..., n and j = 0, ..., rk. Each fundamental spline function skj isobtained from the corresponding system of the form:

s(q)kj (xν) = 0, ν = 1, ..., n, ν 6= k, q = 0, ..., rν ,

s(q)kj (xk) = δjq, q = 0, ..., rk,

s(p)kj (α) = 0, p = m, ..., 2m− 1 and α > xn,

(2.2.32)

for k = 1, ..., n and j = 0, ..., rk. As in case of the Lagrange interpolation, allthe systems have the same matrices.

Definition 2.2.30. Let Λ := ΛH and SH be the corresponding spline opera-tor. The formula

f = SHf +RHf,

is called the spline interpolation formula of Hermite type, where RH is theremainder operator.

Applying the Peano’s theorem, we obtain

(RHf)(x) =

∫ b

a

ϕH(x, t)f (m)(t)dt,

where

ϕH(x, t) =(x−t)m−1

+

(m−1)!−

n∑

k=1

rk∑

j=0

skj(x)(xk−t)m−j−1

+

(m−j−1)!.

If f (m) is continuous on [a, b] then

|(RHf)(x)| ≤∥∥f (m)

∥∥∞

∫ b

a

|ϕH(x, t)|dt.

Example 2.2.31. Let f ∈ C2[0, 1] and ΛH(f) =f(0), f ′(0), f

(12

), f(1), f ′(1)

be given. We have to find the corresponding spline interpolation formula.We consider the cubic spline function (m = 2) with its interpolation

formula:f = S3f +R3f.


From (2.2.31) we have

(S3f)(x) = s00(x)f(0) + s01(x)f′(0) + s10(x)f

(12

)+ s20(x)f(1) + s21(x)f

′(1),

with the fundamental interpolation spline functions having the following ex-pressions:

skj(x) = akj0 + akj1 x+ bkj00x3 + bkj01x

2 + bkj10(x− 1

2

)3+

+ bkj20(x− 1)3+ + bkj21(x− 1)2

+,

(2.2.33)for (k, j) ∈ (0, 0), (0, 1), (1, 0), (2, 0), (2, 1).

In this case the system from (2.2.32), for α = 2, becomes:

s00(0) := a000 = 1 (2.2.34)

s′00(0) := a001 = 0

s00

(12

):= a00

0 + 12a00

1 + 18b0000 + 1

4b0001 = 0

s00 (1) := a000 + a00

1 + b0000 + b0001 + 18b0010 = 0

s′00 (1) := a001 + 3b0000 + 2b0001 + 3

4b0010 = 0

s′′00 (2) := 12b0000 + 2b0001 + 9b0010 + 6b0020 + 2b0021 = 0

s′′′00 (2) := b0000 + b0010 + b0020 = 0.

By solving this system we get

s00(x) = 1 + 10x3 − 9x2 − 16(x− 1

2

)3+.

The systems that provide the coefficients of the other fundamental splinefunctions have the same matrix, but the free terms are changing successivelyin (0, 1, 0, 0, 0, 0, 0), (0, 0, 1, 0, 0, 0, 0), (0, 0, 0, 1, 0, 0, 0), (0, 0, 0, 0, 1, 0, 0). Thisway, we obtain:

s01(x) = x+ 3x3 − 72x− 4

(x− 1

2

)3+,

s10(x) = −16x3 + 12x2 + 32(x− 1

2

)3+,

s20(x) = 6x3 − 3x2 − 16(x− 1

2

)3+,

s21(x) = −x3 + 12x2 + 4

(x− 1

2

)3+.

For the remainder we have:

(R3f)(x) =

∫ 1

0

ϕ2(x, t)f′′(t)dt,

where

ϕ2(x, t) = (x− t)+ − s10(x)(

12− t)+− s20(x)(1− t)− s21(x).


2.2.5 Spline operators of Birkhoff type

For a function f ∈ Hm,2[a, b] the set of Birkhoff type functionals is given by:

ΛB = λkj | λkjf = f (j)(xk), k = 1, ..., n, j ∈ Ik,

for Ik ⊆ 0, ..., rk, rk ∈ N, rk < m.

Definition 2.2.32. The spline operator SB corresponding to the set of Birkhofftype functionals, ΛB, is called the spline operator of Birkhoff type.

If the spline function of Birkhoff type exists, it is of the following form:

(SBf)(x) =

m−1∑

i=0

aixi +

n∑

k=1

∑

j∈Ik

bkj(x− xk)2m−j−1+

or

(SBf)(x) =

n∑

k=1

∑

j∈Ik

skj(x)f(j)(xk),

where skj are the fundamental interpolation spline functions. They can beobtained by conditions as (2.2.32).

We show this procedure in the following example.

Example 2.2.33. Let f ∈ C2[0, 1] and ΛB(f) =f ′(0), f

(12

), f ′(1)

be

given. We are looking for the corresponding interpolation formula.We consider the cubic spline function:

(S3f)(x) = s01(x)f′(0) + s10(x)f

(12

)+ s21(x)f

′(1).

We determine the fundamental interpolation spline functions s01, s10 ands21. They have the same expressions as (2.2.33), their coefficients being thesolutions of some system of the form (2.2.34). So, for s01 we have:

s′01(0) := a011 = 1

s01

(12

):= a01

0 + 12a01

1 + 14b0101 = 0

s′01 (1) := a011 + 2b0101 + 3

4b0110 = 0

s′′01 (2) := 2b0101 + 9b0110 + 2b0121 = 0s′′′01 (2) := 6b0110 = 0.

It followss01(x) = −3

8+ x− 1

2x2 + 1

2(x− 1)2

+ .


In the same way we obtain

s10(x) = 1

s21(x) = −18

+ 12x2 − 1

2(x− 1)2+ .

For the remainder term we have:

(R3f)(x) =

∫ 1

0

ϕ2(x, t)f′′(t)dt,

with

ϕ2(x, t) = (x− t)+ −(

12− t)+− s21(x).

2.2.6 Rational interpolation operators

2.2.6.1 An iterative method to construct a rational interpolationoperator

Let F = Pr/Ps, where Pk ∈ Pk. Because F is not changing if we replacePr and Ps with CPr and CPs, respectively, where C is a nonzero constant,it follows that F depends of (r + s+ 1) parameters, for their determinationbeing necessary the same number of conditions. In this section we will useinterpolation conditions to determine function F.

Denoting by m = r + s, we consider xi ∈ [a, b] , i = 0, 1, ..., m, withxi 6= xj , if i 6= j.

Definition 2.2.34. The problem of determining the function F in conditions

F (xi) = f (xi) , i = 0, 1, .., m, (2.2.35)

is named the rational interpolation problem.

We shall present an iterative method for construction the function F,based on its representation under continuous limited fraction form.

First, we consider the sequence vk, k = 0, 1, ..., defined by recurrencerelation:

vk(x) = vk(xk) + (x− xk)/vk+1(x). (2.2.36)


Denoting by v0 = f and applying successively the relation (2.2.36), we obtain:

f (x) = f (x0) +x− x0

v1 (x1) +x− x1

v2 (x2)+. . .

+ x−xk−1

vk(xk)+x−xk

vk+1

(x)

.

(2.2.37)The problem is to define the numbers vi (xi), such that the function Fm givenby

Fm(x) = f (x0) +x− x0

v1 (x1) +x− x1

v2 (x2) +. . .

+x−xm−1

vm(xm)

,

satisfies the interpolation conditions

Fm (xk) = f (xk) , k = 0, 1, .., m.

Definition 2.2.35. Let us consider M = xi | xi ∈ R, i = 0, ..., m and f :M → R. The value

x0, x1, ..., xk−1, xk; f =xk−xk−1

x0,...,xk−2,xk;f−x0,...,xk−1;f ,

with x0, x1; f = [x0, x1; f ]−1 , is called the inverse divided difference of korder.

Remark 2.2.36. The inverse divided difference of k > 1 order differs fromthe inverse of divided difference. Also, unlike of divided difference, inversedivided difference depends on the order of nodes.

Theorem 2.2.37. Let f : M → R. If vk (xk) = x0, x1, ..., xk; f , k =1, ..., m, then

Fm (xk) = f (xk) , k = 0, 1, .., m.

Proof. We reduce it to a direct verification. Indeed, for each k = 0, 1, ..., m,we have

Fm (xk) = f (x0) + xk−x0

x0, x1; f+.. .

+xk−xk−2

x0,...,xk−1,xk;f+ xk−xk−1

x0,...,xk;f


Replacing the inverse divided difference of maximum order x0, ..., xk; f , bythe expression obtained from definition,

x0, x1, ..., xk; f =xk−xk−1

x0,...,xk−2,xk;f−x0,...,xk−1;f ,

we obtain

Fm (xk) = f (x0) + xk−x0

x0, x1; f+.. .

+ xk−xk−2

x0,...,xk−2,xk;f

Using this procedure, after k − 1 substitutions, it follows that

Fm (xk) = f (x0) + xk−x0

x0,xk;f .

Sincex0, xk; f = xk−x0

f(xk)−f(x0),

it follows thatFm (xk) = f (xk) .

Remark 2.2.38. The function Fm is a limited continuous fraction.

We introduce an iterative procedure which puts function Fm under ratio-nal function form. For this, we write the relation

f (x) = f (x0) + x−x0

v1 (x1)+. . .

+x−xk−1

vk(x)

under equivalent form

f (x) =vk(x)Gk(x)+(x−xk−1)Pk(x)

vk(x)Hk(x)+(x−xk−1)Qk(x). (2.2.38)

In order to determine functions Gk, Hk, Qk and Pk, we first observe that fromrelation (2.2.37) we also have

f (x) = vk+1(x)Gk+1(x)+(x−xk)Pk+1(x)

vk+1(x)Hk+1(x)+(x−xk)Qk+1(x). (2.2.39)


On the other hand, by replacing in (2.2.38) the expression (2.2.36) of vk (x),we obtain:

f (x) = vk+1(x)[vk(vk)Gk(x)+(x−xk−1)Pk(x)]+(x−xk)Gk(x)

vk+1(x)[vk(vk)Hk(x)+(x−xk−1)Qk(x)]+(x−xk)Hk(x), (2.2.40)

that is identical with (2.2.39).From (2.2.39) and (2.2.40), it follows that

Pk+1 (x) = Gk (x) ,

Qk+1 (x) = Hk (x) ,

and

Gk+1 (x) = vk (xk)Gk (x) + (x− xk−1)Pk (x) ,

Hk+1 (x) = vk (xk)Hk (x) + (x− xk−1)Qk (x) .

So, we have the recurrence relations

Gk+1 (x) = vk (xk)Gk (x) + (x− xk−1)Gk−1 (x) , (2.2.41)

Hk+1 (x) = vk (xk)Hk (x) + (x− xk−1)Hk−1 (x) .

Consequently,f (x) =

vk(x)Gk(x)+(x−xk−1)Gk−1(x)

vk(x)Hk(x)+(x−xk−1)Hk−1(x). (2.2.42)

The functions Gk, Hk, k > 1, are determined from (2.2.41), for given G0, G1

and H0, H1.In order to obtain these started functions, we consider the following ex-

pression,f (x) = v1(x)G1(x)+(x−x0)G0(x)

v1(x)H1(x)+(x−x0)H0(x),

obtained from (2.2.42), for k = 1. But, we also have the equivalent represen-tation:

f (x) = f (x0) + x−x0

v1(x)= v1(x)f(x0)+(x−x0)

v1(x).

Identifying these two last expressions of f , one obtains

G0 (x) = 1; G1 (x) = f (x0) (2.2.43)

H0 (x) = 0; H1 (x) = 1.

From (2.2.41) and (2.2.43) it follows that Gk and Hk are polynomials, so

Gk ∈ P[k2], Hk ∈ P[k−1

2 ], k = 1, 2, . . . .

We have proved the following result.


Theorem 2.2.39. If f : M → R, vk (xk) = x0, ..., xk; f , for k = 1, ..., m,and

Fm =Gm+1

Hm+1

,

with Gm+1 and Hm+1 given by (2.2.41), then

Fm (xi) = f (xi) , i = 0, 1, ..., m.

Hence, Fm is a rational function, Fm = Pr/Qs, where r = (m+ 1) /2 ands = r − 1, if m is odd, and r = s = m/2, if m is even.

We can consider the following formula of the rational interpolation

f = ρmf + rmf,

where ρm, defined by ρmf = Fm, is a rational interpolation operator and rmfis the remainder term.

Theorem 2.2.40. The remainder term have the following expression:

(rmf) (x) = (−1)m u(x)Hm+1(x)[vm+1(x)Hm+1(x)+(x−xm)Hm(x)]

, (2.2.44)

where u (x) = (x− x0) ... (x− xm) .

Proof. We have

(rmf) (x) = vm(x)Gm(x)+(x−xm−1)Gm−1(x)vm(x)Hm(x)+(x−xm−1)Hm−1(x)

− Gm+1(x)Hm+1(x)

.

Bringing at the same denominator and reducing alike terms, we obtain

(rmf) (x) = (x−xm−1)[vm(x)−vm(xm)][Gm(x)Hm−1(x)−Hm(x)Gm−1(x)][vm(x)Hm(x)+(x−xm−1)Hm−1(x)]Hm+1(x)

.

Using (2.2.36), it follows

vm (x)− vm (xm) = (x− xm) /vm+1 (x)

and

vm (x)Hm (x) + (x− xm−1)Hm−1 (x) = Hm+1 (x) + x−xm

vm+1(x)Hm (x) .

Then,(rmf) (x) = (x−xm−1)(x−xm)[Gm(x)Hm−1(x)−Hm(x)Gm−1(x)]

[vm+1(x)Hm+1(x)+(x−xm)Hm(x)]Hm+1(x).


Taking into account that

G2 (x)H1 (x)−H2 (x)G1 (x) = x− x0

and

Gm (x)Hm−1 (x)−Hm (x)Gm−1 (x) =

= − (x− xm−2) [Gm−1 (x)Hm−2 (x)−Hm−1 (x)Gm−2 (x)],

it follows that

Gm (x)Hm−1 (x)−Hm (x)Gm−1 (x) = (−1)m−2 (x− x0) ... (x− xm−2) ,

so the relation (2.2.44) is proved.

Remark 2.2.41. For the absolute approximation error we have

|(rmf) (x)| = |u(x)||Hm+1(x)||vm+1(x)Hm+1(x)+(x−xm)Hm(x)| ,

for given x ∈ [a, b].

To evaluate the value of the approximation ρmf at a point α ∈ R, takinginto account that

(ρmf) (α) =Gm+1 (α)

Hm+1 (α),

we can use the recurrence relations (2.2.41), with starting values (2.2.43).Using the following scheme we can evaluate the inverse divided differences

which intercede in the expressions of the polynomials Gk and Hk:

x0 v00

x1 v10 v11

x2 v20 v21 v22

...xi vi0 vi1 vi2 ... vii...xm vm0 vm1 vm2 ... vmi ... vmm,

where

vi0 = f (xi) , i = 0, ..., m (2.2.45)

vik =xi−xk−1

vi,k−1−vk−1,k−1, k = 1, ..., i; i = 1, ..., m.


We observe that the inverse divided differences vii, from the expressions ofthe polynomials Gk and Hk, are situated on the hypotenuses of the triangularscheme and their evaluation, according to (2.2.45), can be realized after lines,each line giving the inverse divided difference of the following superior order.

Thus, using numbers vii, i = 0, 1, ..., there can be generated successivelyapproximations:

(ρ1f) (α) , (ρ2f) (α) , ..., (ρif) (α) , ... (2.2.46)

until the distance between two consecutive elements of the sequence (2.2.46)is less or equal to a given ε > 0, i.e.,

|(ρif) (α)− (ρi−1f) (α)| ≤ ε.

Therefore, we can say that (ρif) (α) approximates f (α) with a given precisionε.

2.2.6.2 Univariate Shepard interpolation

D. Shepard has introduced in 1968 an interpolation procedure which is veryefficient in interpolation of large scattered data sets.

Let f be a real-valued function defined on I = [a, b] ⊂ R, xi ∈ I, i =1, . . . , N , X ⊂ I, X denoting the set of nodes, and consider

Λ := ΛL = λi |λi(f) = f(xi), i = 1, . . . , N (2.2.47)

a Lagrange type set of functionals.The univariate Shepard operator S0 is defined by:

(S0f) (x) =

N∑

i=1

Ai (x) f (xi) , (2.2.48)

where

Ai (x) =

N∏j=1, j 6=i

|x− xj |µ

N∑k=1

N∏j=1, j 6=k

|x− xj |µ, (2.2.49)

and µ ∈ R+. The basis functions Ai may be written in barycentric form

Ai(x) =|x− xi|−µ

N∑k=1

|x− xk|−µ, i = 1, . . . , N. (2.2.50)


andN∑

i=1

Ai (x) = 1. (2.2.51)

They have the cardinality property

Ai (xν) = δiν , i, ν = 1, ..., N. (2.2.52)

The main properties of the operator S0 are:

• the interpolation property regarding the set of functionals given by(2.2.47), i.e.,

(S0f)(xi) = f(xi), i = 1, . . . , N.

• the degree of exactness is

dex(S0) = 0.

Shepard’s method is a particular example of a convex combination, i.e.,the functions Ai, i = 1, . . . , N are non-negative and their sum is 1. Fromthis, it follows that the scheme has only constant precision.

The Shepard interpolation has two major drawbacks:• high computational cost;• low degree of exactness.These drawbacks can be overcome in two ways:⋆ Modifying basis functions. Shepard method has the property that there

are flat spots at the nodes, the accuracy tends to decrease in the areas wherethe interpolation nodes are sparse and the evaluation of (S0f)(x) requiresa considerable amount of work. These disadvantages are avoided by usinga local version of the Shepard formula, given by R. Franke and G. Nielson,that consists in replacing the basis functions Ai, from (2.2.50), by

wi(x) =

(1− |x− xi|

R

)ν

+

, (2.2.53)

where R is a radius of influence about the node xi and it is varying with i.⋆ Increasing the degree of exactness by combining the Shepard operator

with another interpolation operator. This is one of the most efficient ways ofgeneralizing the Shepard interpolation. In this way its degree of exactness is


increased and there are used different sets of functionals. We notice that indefinition of the Shepard operator S there are used Lagrange type functionals.

LetΛ := λi : i = 1, ..., N

be a set of functionals and P be the corresponding interpolation operator.We consider that Λi ⊂ Λ are the subsets associated to the functionals λi,i = 1, ..., N . We have

⋃N

i=1Λi = Λ and Λi ∩ Λj 6= ∅,

excepting the case Λi = λi , i = 1, ..., N, when Λi ∩ Λj = ∅, for i 6= j. Weassociate the interpolation operator Pi to each subset Λi, i = 1, ..., N .

The operator SP defined by

(SPf) (x) =N∑

i=1

Ai (x) (Pif) (x) (2.2.54)

is the combined operator of S0 and P .

Remark 2.2.42. If Pi, i = 1, ..., N, are linear operators then SP is a linearoperator.

Remark 2.2.43. Let Pi, i = 1, ..., N, be some arbitrary linear operators. If

dex (Pi) = ri, i = 1, ..., N,

thendex (SP ) = min r1, ..., rN .

The function S0f and its behavior in the neighborhood of the nodesdepend on the size of µ.

If 0 < µ ≤ 1 then the function S0f has peaks at the nodes. For µ > 1 thefirst derivatives vanish at the nodes, i.e., S0f has flat spots and if µ is largeenough S0f becomes a step function. This phenomenon is shown in Figure2.1.

Example 2.2.44. Let f : [−2, 2]→ R,

f(x) =1

1 + x2

and consider the nodes xi = −2 + 0.5i, i = 0, ..., 8. We plot S0f, for µ =1, 2, 20.


−2 −1.5 −1 −0.5 0 0.5 1 1.5 20.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a) Function f.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(b) Interpolant S0f , µ = 1.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(c) Interpolant S0f , µ = 2.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(d) Interpolant S0f , µ = 20.

Figure 2.1: Univariate Shepard interpolants.

The most frequent choice is µ = 2, in which case the basis functions Aiare rational and infinitely differentiable.

In the sequel we study the remainder of the interpolation formula gen-erated by Shepard operator. The interpolation function S0f can be viewedas a projection of f into the finite-dimensional linear space spanned by thefunctions Ai, i = 1, . . . , N . The Shepard interpolation formula is:

f = S0f +R0f,

where S0f was defined in (2.2.48) and R0f is the remainder term.

Theorem 2.2.45. If f is absolutely continuous on [a, b], then

(R0f)(x) =

∫ b

a

ϕ(x, s)f ′(s)ds, (2.2.55)

where

ϕ(x, s) = (x− s)0+ −

N∑

i=1

Ai(x)(xi − s)0+.

Also,|(R0f)(x)| ≤ K(x) ‖f ′‖∞ ,


where

K(x) = x−N∑

i=1

xiAi(x) + 2N∑

i=1

Ai(x)(xi − x)+.

Proof. Using the Peano’s theorem and taking into account that R0e0 = e0,where ek(x) = xk, formula (2.2.55) follows.

Next, let us suppose that x ∈ (xk−1; xk) and

ϕj(x; ·) = ϕ(x; ·)∣∣[xj−1,xj] , for j = 1, . . . , N.

We have

ϕj(x; s) = (x− s)0+ −

N∑i=j

Ai(x).

Using∑N

i=1Ai(x) = 1 and Ai(x) ≥ 0, i = 1, . . . , N , one obtains:

ϕj(x; s) =

j−1∑i=1

Ai(x), for s ≤ x,

−N∑i=j

Ai(x), for s > x.

Hence,

ϕj(x; s) ≥ 0, for s ≤ x,

ϕj(x; s) ≤ 0, for s > x.

Since s, x ∈ (xk−1, xk), it follows that

ϕj(x; s) ≥ 0, for j = 1, . . . , k − 1,

ϕj(x; s) ≤ 0, for j = k + 1, . . . , N,

ϕk(x; s) ≥ 0, for s ≤ x,

ϕk(x; s) ≤ 0, for s > x.

Finally, one obtains that

ϕ(x; s) ≥ 0, for s ≤ x

andϕ(x; s) ≤ 0, for s > x.


From (2.2.55) we have:

|(R0f)(x)| ≤ ‖f ′‖∞∫ b

a

|ϕ(x; s)| ds,

with ∫ b

a

|ϕ(x; s)| ds =

∫ x

a

ϕ(x; s)ds−∫ b

x

ϕ(x; s)ds = K(x)

and the theorem is proved.Next, we assume that at each point xi, i = 1, . . . , N, there also exists the

derivative f ′(xi) and we consider the operator

(S1f)(x) =N∑

i=1

Ai(x)[f(xi) + (x− xi)f ′(xi)].

It is easy to prove that if µ > 1, the operator S1 interpolates both the functionf and its first derivative at the points xi, i = 1, . . . , N .

Lemma 2.2.46. The operator S1 is linear and

S1ek = ek, for k = 0, 1.

Proof. The proof follows after a straightforward computation.This operator generates the following interpolation formula:

f = S1f +R1f.

Regarding its remainder term we have the following result.

Theorem 2.2.47. If f ∈ H2[a, b] then

(R1f)(x) =

∫ b

a

ϕ1(x, s)f′′(s)ds, (2.2.56)

where

ϕ1(x, s) = (x− s)+ −N∑i=1

Ai(x)[(xi − s)+ + (x− xi)(xi − s)0+].

Moreover, if f ′′is continuous on the interval [a, b], then

(R1f)(x) =

[−x

2

2+

1

2

N∑i=1

Ai(x)x2i

]f ′′(ξ), with ξ ∈ (a, b). (2.2.57)


Proof. Lemma 2.2.46 and the Peano’s theorem imply (2.2.56). The functionϕk1(x; ·) = ϕ1(x; ·)

∣∣[xk−1,xk] has the form

ϕk1(x, s) = (x− s)+ − (x− s)N∑i=k

Ai(x),

so

ϕk1(x; s) =

(x− s)k−1∑i=1

Ai(x), for s ≤ x

−(x− s)N∑i=k

Ai(x), for s > x,

i.e.,

ϕk1(x; s) ≥ 0, for all k.

It follows that ϕ1(x; s) ≥ 0, for x, s ∈ [a, b], and by the mean theorem, oneobtains the representation (2.2.57).

2.2.6.2.1 The univariate Shepard-Lagrange operator Let f be areal-valued function defined on X ⊂ R, xi ∈ X, i = 1, . . . , N some distinctnodes. Consider the set of Lagrange functionals

Λ := ΛL(f) = λi |λi(f) = f(xi), i = 1, . . . , N

and

Λi(f) = f(xi+ν) | ν = 0, 1, . . . , m, i = 1, . . . , N ,with xN+ν = xN−ν , ν = 1, . . . , m. The operator

(Limf)(x) =m∑

ν=0

u(x)

(x− xi+ν)u′(xi+ν)f(xi+ν)

is the Lagrange operator suitable to Λi(f).

Definition 2.2.48. The operator SLmgiven by

(SLmf)(x) =

N∑i=1

Ai(x)(Limf)(x)

is called the univariate Shepard-Lagrange operator.


Theorem 2.2.49. For µ > m, we have the following interpolation properties:

(SLmf)(xi) = f(xi), i = 1, . . . , N, (2.2.58)

and the degree of exactness is:

dex(SLm) = m. (2.2.59)

Proof. The interpolation properties (2.4.8) follows taking into account rela-tions (2.2.52) and the interpolatory properties of Limf, i = 1, . . . , N.

Relation (2.2.59) follows directly by Remark 2.2.43.The Shepard-Lagrange univariate interpolation formula is

f = SLmf − RLm

f,

where RLmf denotes the remainder term. We have the following error esti-

mation.

Theorem 2.2.50. If f ∈ Cm+1(I), then

‖RLmf‖∞ ≤ CM

∥∥f (m+1)∥∥∞ εmµ (r),

with

εmµ (r) =

|log r|−1 , µ = 1,rµ−1, 1 < µ < m+ 2,rµ−1 |log r| , µ = m+ 2,rm+1, µ > m+ 2,

where C is a positive constant independent of x and X.

Proof. If f ∈ Cm+1(I), then the following bound for the interpolation erroris known:

|(Rmf)(x; xi)| = |(Lmf)(x; xi)− f(x)| ≤ |x−xi|...|x−xi+m|(m+1)!

∥∥f (m+1)∥∥∞ .

We have

|(SLmf)(x)− f(x)| ≤

N∑

i=1

|(Rmf)(x; xi)|Ai(x) ≤ Cm∥∥f (m+1)

∥∥∞ smµ (x),

where

smµ (x) =

N∑i=1

|x−xi|...|x−xi+m||x−xi|−µ

N∑k=1

|x−xi|−µ

.


We show that

smµ (x) ≤ CMεmµ (r).

We introduce the following notations

Bρ(x) = [x− ρ, x+ ρ],

r = inf ρ > 0 : ∀x ∈ I ∃ u ∈ X u ∈ Bρ(x)

and

K = supy

card(Br(y) ∩X).

Let

n =[b−a2r

]+ 1,

Qr(u) be the interval (u− r, u+ r] and Tj = Qr(x+ 2rj)∪Qr(x− 2rj). Theset

n⋃j=−n

Qr(x+ 2rj)

is a covering of I with half open intervals. If xi ∈ I ∩ Tj we have

(2j − 1)r ≤ |x− xi| ≤ (2j + 1)r, j = 1, . . . , N (2.2.60)

and

1 ≤ card(X ∩ Tj) ≤M.

Also, N = card(x) = O(r−1) and [2(j− l)− 1]r ≤ |x− xi+l| ≤ [2(j+ l)+ 1]r.If xi ∈ Tj we have

|x− xi| . . . |x− xi+m| ≤ rm+1m∏l=0

[2(j + l) + 1] (2.2.61)

Let xd be the closest point to x. Since

N∑i+1

|x− xi|−µ ≥ |x− xd|−µ ,


applying (2.2.60) and (2.2.61), we get:

smµ (x) ≤ |x− xd|−µ( ∑xi∈T0

|x− xd|−µ |x− xi| . . . |x− xi+m|+

+n∑j=1

∑xi∈Tj

|x− xd|−µ |x− xi| . . . |x− xi+m|)

≤rµmMrm+1−µ

m∏l=0

(2l + 1) +n∑j=1

(m∏l=0

[2(j + l) + 1](2j + 1)−µ)

≤mMrm+1m∏l=0

(2l + 1)(1 + C

n∑j=1

jm+1−µ).

We have two cases. µ > 1:

• if 1 < µ < m+ 2 then

rm+1

(1 + C

n∑j=1

jm+1−µ

)= O(rµ−1).

• if µ = m+ 2 then

n∑j=1

jm+1−µ = logN = |log r| .

• if µ > m+ 2 then∑n

j=1 jm+1−µ is bounded.

µ = 1: We have

smµ (x) = sm1 (x) =

N∑i=1

|x−xi|−1|x−xi|...|x−xi+m|

N∑i=1

|x−xi|−1

.

Applying the inequality∑ai∑bi≤∑ai

bi,


we get

sm1 (x) =∑xi∈T0

|x− xi| . . . |x− xi+m|

· C1r

|log r|

n∑j=1

∑xk∈Tj

|x− xi|−1 |x− xi| . . . |x− xi+m|

≤ mMrm+1m∏l=0

(2l + 1)

(1 + C2

|log r|

n∑j=1

jq

)

≤ C1Mrm+1(1 + C2

|log r|O(r−m−1))

= O(|log r|−1).

Particular cases. 1) Consider m = 1. The combined operator SL1 isgiven by:

(SL1f)(x) =N∑

i=1

Ai(x)(Li1f)(x),

where

(Li1f)(x) =x− xi+1

xi − xi+1f(xi) +

x− xixi+1 − xi

f(xi+1), for i = 1, ..., N,

with xN+1 = x1.2) Consider m = 2. The combined operator SL2 is given by:

(SL2f)(x) =

N∑

i=1

Ai(x)(Li2f)(x),

where

(Li2f)(x) = li(x)f(xi) + li+1(x)f(xi+1) + li+2(x)f(xi+2),

with

li(x) =(x− xi+1)(x− xi+2)

(xi − xi+1)(xi − xi+2),

li+1(x) =(x− xi)(x− xi+2)

(xi+1 − xi)(xi+1 − xi+2),

li+2(x) =(x− xi)(x− xi+1)

(xi+1 − xi)(xi+2 − xi+1),

for all i = 1, ..., N and considering xN+1 = x1 and xN+2 = x2.


2.2.6.2.2 The univariate Shepard-Hermite operator Let f be areal-valued function defined on X ⊂ R, xi ∈ X, i = 1, . . . , N some distinctnodes. Consider the set of Hermite functionals regarding f :

Let

Λ := ΛH(f) =λij∣∣ λij(f) = f (j)(xi) , i = 1, . . . , N , j = 1, . . . , ri, ri ∈ N

∗

and the subset

Λi,j(f) =f (j)(xi+ν) | ν = 0, 1, . . . , m, j = 1, . . . , rν

,

for i = 1, ..., N with xN+ν = xN−ν , for ν = 1, ..., m, m ∈ N∗.

Definition 2.2.51. The operator SHqgiven by

(SHqf)(x) =

N∑

i=1

Ai(x)(Hiqf)(x)

is called the univariate Shepard-Hermite operator, where H iqf is the q-th

Hermite interpolation polynomial, with q = m+ r0 + ... + rm.

Theorem 2.2.52. For µ > q, the following interpolation conditions arefulfilled:

(SHqf)(j)(xi) = f (j)(xi), i = 1, . . . , N ; j = 0, . . . , ri, (2.2.62)

and the degree of exactness is:

dex(SHq) = q. (2.2.63)

Proof. The interpolation properties (2.2.62) follow taking into account rela-tions (2.2.52) and the interpolatory properties of H i

qf, i = 1, . . . , N.

Relation (2.2.63) follows directly by Remark 2.2.43.

The Shepard-Hermite interpolation formula is

f = SHqf +RHq

f,

where RHqf denotes the remainder term.


Theorem 2.2.53. If f ∈ Cq+1(I), then∥∥RHq

f∥∥I≤ CM

∥∥f (q+1)∥∥Iεqµ(r),

where

εqµ(r) =

|log r|−1 , µ = 1,rµ−1, 1 < µ < q + 2,rµ−1 |log r| , µ = q + 2,rq+1, µ > q + 2,

and C is a positive constant independent of x and X.

Proof. The proof is analogous to that of Theorem 2.2.50, with remark thatH iqf has the nodes xi+ν of multiplicities rν , respectively.

2.2.6.2.3 The univariate Shepard-Taylor operator Let f be a real-valued function defined on X ⊂ R and xi ∈ X, i = 1, . . . , N some distinctnodes. Consider the set of Taylor functionals regarding f :

Λ := ΛT (f) = λkj | λkj(f) = f (j)(xk), j = 0, 1, . . . , m; k = 1, . . . , N,m ∈ N∗ and Λi(f) = λij(f) | j = 0, 1, . . . , m, where Λi ⊂ ΛT is a subsetof ΛT associated to the functional λi such that λi ∈ Λi, for all i = 1, . . . , N .We shall denote by

(T imf)(x) =

m∑

j=0

(x− xi)jj!

f (j)(xi)

the Taylor interpolation operator corresponding to the subset of functionalsΛi.

Definition 2.2.54. The operator STmgiven by

(STmf)(x) =

N∑i=1

Ai(x)(Timf)(x)

is called the univariate Shepard-Taylor operator.

Theorem 2.2.55. For µ > m,

(STmf)(j)(xi) = f (j)(xi), j = 1, . . . , m,

anddex(STm

) = m.


Proof. The conclusion follows taking into account relations (2.2.52) and theinterpolatory properties of T imf, i = 1, . . . , N and by Remark 2.2.43.

A generalization of the operator STmis obtained when

ΛT = λkj | λkj(f) = f (j)(xk), j = 0, 1, . . . , mk; k = 1, . . . , N

andΛi = λij | j = 0, 1, . . . , mi .

We have

(STm1,...,mNf)(x) =

N∑i=1

Ai(x)(Timif)(x),

where the Taylor operator T imihas the degree mi, i = 1, . . . , N .

Remark 2.2.56. For µ > M := maxm1, . . . , mN,

(STm1,...,mNf)(j)(x) = f (j)(xi), j = 0, 1, . . . , mi, i = 1, . . . , N,

anddex(STm1,...,mN

) = minm1, . . . , mN.Remark 2.2.57. Regarding the approximation error we note that the Shepard-Taylor, the Shepard-Lagrange and the Shepard-Hermite operators have thesame rate of convergence.

2.2.6.2.4 The univariate Shepard-Birkhoff operator Let f be areal-valued function defined on X ⊂ R, xi ∈ X, i = 1, . . . , N some distinctnodes. Consider the set of Birkhoff functionals regarding f :

Λ := ΛB(f) = λkj | λkj(f) = f (j)(xk), j = Ik; k = 1, . . . , N,

andΛi,j(f) =

f (j)(xi+ν) |ν = 0, 1, . . . , m, j ∈ Ii+ν

,

with xN+ν = xN−ν , for ν = 1, ..., m, m ∈ N∗.

Definition 2.2.58. The operator SBmgiven by

(SBmf)(x) =

N∑

i=1

Ai(x)(Bimf)(x)

is called the univariate Shepard-Birkhoff operator.


Theorem 2.2.59. For µ > m, the following interpolation conditions arefulfilled:

(SBmf)(j)(xk) = f (j)(xk), j ∈ Ik; k = 1, . . . , N, (2.2.64)

anddex(SBm

) = m. (2.2.65)

Proof. The interpolation properties (2.2.64) follow taking into account rela-tions (2.2.52) and the interpolatory properties of H i

qf, i = 1, . . . , N.Relation (2.2.65) follows directly by Remark 2.2.43.

Remark 2.2.60. In the same way there are defined the combined Shepard-Abel-Goncharov and Shepard-Lidstone operators, which are particular casesof the combined Shepard-Birkhoff operator (see, e.g., [22] and [23]).

2.2.7 Least squares approximation

In this section, as an extension of the interpolation problem, it is given ashort review of the least squares approximation.

Recall that when the values of a function f are known for a given set ofpoints xi, i = 0, ..., m, an interpolation method can be used to determine anapproximation F of the function f , such that

F (xi) = f (xi) , i = 0, ..., m.

If only approximations of the values f (xi) are available, or if the numberof interpolation conditions is too large, instead of requiring that the approxi-mating function reproduces the given function values exactly, we ask onlythat it fits the data ”as closely as posible”. The most popular applicationinvolves the least squares principle, where the approximation F is determinedsuch that (

m∑

i=0

w (xi) [f (xi)− F (xi)]2

)1/2

→ min,

in the discrete case, and

(∫ b

a

w (x) [f (x)− F (x)]2 dx

)1/2

→ min,

in the continuous case, where w ≥ 0 on [a, b] is a weight function.


Remark 2.2.61. Notice that the interpolation is a particular case of theleast squares approximation, namely for

f (xi)− F (xi) = 0, i = 0, ..., m.

In the sequel we briefly describe each of the two cases, starting with thecontinuous one.

Consider the space L2w [a, b] , the corresponding inner product,

〈f, g〉w,2 =

∫ b

a

w (x) f (x) g (x) , f, g ∈ L2w [a, b] ,

the norm endowed by it,

‖f‖w,2 =

(∫ b

a

w (x) f 2 (x) dx

)1/2

,

and the distance between two functions f and g, given by

dw (f, g) = ‖f − g‖w,2 .

For A ⊂ L2w [a, b] and f ∈ L2

w [a, b] , the question is whether there existsan element g∗ ∈ A such that

dw (f, g∗) = ming∈A

dw (f, g) .

It will be shown that such an element exists and it is unique.If A is a linear finite dimensional space and g∗ exists, then

〈f − g∗, g〉w,2 = 0, g ∈ A. (2.2.66)

Let n be the dimension of A and g1, ..., gn ∈ A be a basis in A.The orthogonality condition (2.2.66), with

g =n∑

i=1

aigi

and

g∗ =

n∑

i=1

a∗i gi,


is equivalent to the system

〈f − g∗, gk〉w,2 = 0, k = 1, ..., n,

with n equations and n unknowns ai, i = 1, ..., n. The latter system can beexplicitly written as

n∑

i=1

ai 〈gi, gk〉w,2 = 〈f, gk〉w,2 , k = 1, ..., n, (2.2.67)

and its determinant is the Gram determinant G (g1, ..., gn) . Since the func-tions g1, ..., gn are linearly independent, we have

G (g1, ..., gn) 6= 0,

hence

g∗ =

n∑

i=1

a∗i gi

is uniquely determined.Next we discuss three particular cases.Case A. We consider A = Pn, and gk = ek, k = 0, ..., n.In this case, g∗ is the nth degree polynomial whose coefficients satisfy the

condition

∫ b

a

w (x)

[f (x)−

n∑k=0

a∗kxk

]2

dx = mina0,...,an

∫ b

a

w (x)

[f (x)−

n∑k=0

akxk

]2

dx.

According to (2.2.67), the solution of the approximation problem can beobtain solving the following system of equations

n∑

k=0

ak⟨xk, xj

⟩w,2

=⟨f (x) , xj

⟩w,2

, j = 0, ..., n,

where

⟨xk, xj

⟩w,2

=

∫ b

a

w (x) xk+jdx,

⟨f (x) , xj

⟩w,2

=

∫ b

a

w (x) xjf (x) dx.


Case B. We consider A = Pn, and gk = pk, k = 0, ..., n, where p0, ..., pnis a set of orthogonal polynomials, in the interval [a, b] , relative to the weightfunction w.

Since 〈pi, pj〉w,2 = 0, for i 6= j, and

〈pk, pk〉w,2 = ‖pk‖2w,2 ,

from (2.2.67) it follows

a∗k = 1‖pk‖2

w,2

∫ b

a

w (x) f (x) pk (x) dx, k = 0, ..., n, (2.2.68)

and

g∗ (x) =

n∑

k=0

a∗kpk (x) .

Remark 2.2.62. The numbers a∗k in (2.2.68) are the coefficients of the ex-pansion of f in terms of the orthogonal polynomials pk, k = 0, 1, . . . . If thesepolynomials are orthonormal (i.e., ‖pk‖2 = 1), we have

a∗k =

∫ b

a

w (x) f (x) pk (x) dx, k = 0, 1, . . . .

Case C. We consider that f : R → R is a periodic function with period2π, and A = Tn, (i.e., the set of trigonometric polynomials of degree n, withbasis consisting of the functions 1, cosx, sin x, ..., cosnx, sin nx).

Since this set of functions is orthogonal on [0, 2π] , with respect to theweight function w (x) = 1, we have

g∗ (x) = a∗0 +n∑

k=1

(a∗k cos kx+ b∗k sin kx) ,

where

a∗0 =

∫ 2π

0

f (x) dx,

a∗k = 1π

∫ 2π

0

f (x) cos kxdx,

b∗k = 1π

∫ 2π

0

f (x) sin kxdx,


for k = 1, ..., n. So a∗k and b∗k are the Fourier coefficients of f.The discrete case can be treated similarly. Let f be a given function with

available data f (xi) , i = 0, ..., m, and A be a set of approximations of f.According to the least squares principle, we have to determine the functiong∗ ∈ A that satisfies

m∑

i=0

w (xi) [f (xi)− g∗ (xi)]2 dx = min

g∈A

m∑

i=0

w (xi) [f (xi)− g (xi)]2 .

If A is a n-dimensional linear space and g1, ..., gn is a basis of A, thensuch an element g∗ exists and it is given by

g∗ =

n∑

k=1

a∗kgk,

where (a∗1, ..., a∗n) is the solution of the following system of equations:

n∑

k=1

ak

m∑

i=0

w (xi) gk (xi) gj (xi) =m∑

i=0

w (xi) gj (xi) f (xi) , j = 1, ..., n.

(2.2.69)Since ak, k = 1, ..., n are the coordinates of the minimizing point of thefunction

G (a1, ..., an) =

m∑

i=0

w (xi)

[f (xi)−

n∑

k=1

akgk

]2

,

the system (2.2.69) is equivalent to

∂G

∂aj(a1, ..., an) = 0, j = 1, ..., n.

Remark 2.2.63. Often n is taken much less than m. Notice that, for n =m− 1 one has

m∑

i=0

w (xi) [f (xi)− g∗ (xi)]2 = 0,

implyingg (xi) = f (xi) , i = 0, ..., m,

so g∗ is a function that interpolates f at the points xi.


We also consider here two particular cases.Case C1. A = Pn and gk = ek, k = 0, ..., n.We have

g∗ (x) =n∑

k=0

a∗kxk,

where (a∗0, ..., a∗n) is the solution of the system of equations:

n∑

k=0

ak

m∑

i=0

w (xi) xk+ji =

m∑

i=0

w (xi)xjif (xi) , j = 0, ..., n. (2.2.70)

Denoting by

sk =m∑

i=0

w (xi) xki , k = 0, ..., 2n,

tp =

m∑

i=0

w (xi) xpi f (xi) , p = 0, ..., n,

system (2.2.70) becomes

n∑

k=0

sk+jak = tj , j = 0, ..., n.

Since its matrix is symmetric, special methods can be used to solve the lattersystem of equations.

Remark 2.2.64. One of the major applications of the least squares ap-proximation is the so-called data smoothing. Whenever it is necessary toapproximate a function f on a discrete data set, and no other requirementsare given, we can achieve this with a polynomial p ∈ C∞ (R) , i.e., a verysmooth function.

Case C2. g0, ..., gn are orthogonal polynomials on the points set xi,i = 0, ..., m, with respect to the weight functions w (xi) , i = 0, ..., m, namely

m∑

i=0

w (xi) gk (xi) gj (xi) =

0, k 6= j,cj > 0, k = j.

In this case, the system of equations (2.2.69) becomes

ajcj =

m∑

i=0

w (xi) gj (xi) f (xi) , j = 0, ..., n

2.3. Some elements of multivariate interpolation operators 103

and its solution is given by

a∗k =1

ck

m∑

i=0

w (xi) gk (xi) f (xi) , k = 0, ..., n.

2.3 Some elements of multivariate interpola-

tion operators

The basic goal of this section is to construct multivariate interpolation oper-ators based on univariate ones.

As we have already seen, most of the univariate interpolation operators(polynomial, spline, Shepard, etc.) are commuting projectors. This propertyis essential in construction of multivariate operators.

2.3.1 Interpolation on a rectangular domain

Let Ωn ⊂ R be a rectangular domain,

Ωn =n∏i=1

[ai, bi], ai, bi ∈ R, ai < bi, i = 1, ..., n.

Let B be a set of real-valued functions defined on Ωn, and f ∈ B. One con-siders the univariate projectors Pi, Pi : B →Gi, i = 1, ..., n, that interpolatethe function f with respect to the variables xi, i = 1, ..., n, respectively. Ofcourse, each set Gi, i = 1, ..., n, is a set of (n− 1) free variables functions.

Let Pn be the set of all commuting projectors generated by the projectorsP1, ..., Pn (i.e., L of (1.1.6) generated by L=P1, ..., Pn).

For example,P2 = P1, P2, P1P2, P1 ⊕ P2

and

P3 = P1, P2, P3, P1P2, P1P3, P2P3, P1 ⊕ P2, P1 ⊕ P3, P2 ⊕ P3,

P1(P2 ⊕ P3), P2(P1 ⊕ P3), P3(P1 ⊕ P2), P1 ⊕ P2P3, P2 ⊕ P1P3,

P3 ⊕ P1P2, P1P2P3, P1 ⊕ P2 ⊕ P3.

Let (Pn,≤) be the lattice generated by P1, ..., Pn, where ” ≤ ” is the orderrelation defined in Proposition 1.1.30. In this lattice P = P1...Pn is theminimal element and S = P1 ⊕ ...⊕ Pn is the maximal element.


Let Ri = I − Pi, i = 1, ..., n, be the corresponding remainder operators.The operator Pi being projectors, from Remark 1.1.23 it follows that Ri isalso a projector and Pi and Ri commute, i.e., (PiRi = RiPi). Using theidentities

I = P +RP ,

with

P = P1...Pn,

RP = R1 ⊕ ...Rn,

respectively,I = S +RS,

with

S = P1 ⊕ ...⊕ Pn,RS = R1...Rn,

one obtains the algebraic minimal interpolation formula:

f = Pf +RPf,

respectively, the algebraic maximal interpolation formula:

f = Sf +RSf,

characteristics given by the extremal properties of P and S in the lattice(Pn,≤).

The extremal projectors P and S of (Pn,≤) can also be characterized bytheir approximation order.

From the expression of the corresponding remainder operators,

Rp = R1 ⊕ ...⊕ Rn

= R1 + ...+Rn −R1R2 − ...− Rn−1Rn + ... + (−1)n−1R1...Rn

andRS = R1...Rn,

it follows thatord(P ) = minord(P1), ..., ord(Pn), (2.3.1)


respectively,

ord(S) =n∑

i=1

ord(Pi) (2.3.2)

andord(P ) < ord(Q) < ord(S), ∀Q ∈ Pn.

So, the remarkable property of the boolean sum operator S, in Pn, is thatit has the highest approximation order. One the other hand, taking intoaccount that

S = P1 + ...+ Pn − P1P2 − ...− Pn−1Pn + ... + (−1)n−1P1...Pn,

it follows that the interpolation function Sf is a sum of functions of (n− 1)free variables (Pif, i = 1, ..., n), (n − 2) free variables (PiPjf, i, j = 1, ..., n;i 6= j) and so on, only the last one P1...Pnf being a scalar approximation (itdoesn’t contain free variable), while the operator P, with the lowest approx-imation order, gives a scalar approximation Pf.

Remark 2.3.1. From the non-scalar approximation Sf it can be obtaineda scalar approximation using several levels of approximation.

We illustrate this for the two dimensional case. Let P 11 and P 1

2 be theinterpolation operators used in the first approximation level and R1

1, respec-tively R1

2, the corresponding remainder operators (R11 = I−P 1

1 ; R12 = I−P 1

2 ).We have

f = P 11 ⊕ P 1

2 f + R11R

12f, (2.3.3)

orf = (P 1

1 + P 12 − P 1

1P12 )f +R1

1R12f.

As P 11 f and P 1

2 f contain a free variable, using in a second level of approxi-mation the operators P 2

1 , P22 , with R2

1, R22 the corresponding remainders, one

obtains:

f = [P 11 (P 2

2 +R22) + P 1

2 (P 21 +R2

1)− P1P12 ]f +R1

1R12f,

respectively,

f = (P 11P

22 + P 2

1P12 − P 1

1P12 )f + (P 1

1R22 + P 1

2R21 +R1

1R12)f, (2.3.4)

whereQ = P 1

1P22 + P 2

1P12 − P 1

1P12 , (2.3.5)


is the interpolation operator, while

RQ = P 11R

22 + P 1

2R21 +R1

1R21 (2.3.6)

is the remainder operator.This way, one obtains the formula (2.3.4), which is a scalar interpolation

formula and Q is a scalar interpolation operator.As it can be seen, the remainder operator RQ contains three terms. It

follows that

ord(Q) = minord(P 21 ), ord(P 2

2 ), ord(P 11 ) + ord(P 1

2 ).

Remark 2.3.2. 1) If

ord(P 21 ) ≥ ord(P 1

1 ) + ord(P 12 ) = ord(P 1

1 ⊕ P 12 )

and

ord(P 22 ) ≥ ord(P 1

1 ) + ord(P 12 ) = ord(P 1

1 ⊕ P 12 ),

then (2.3.4) is called a consistent approximation formula.2) If

ord(P 21 ) = ord(P 2

2 ) = ord(P 11 ⊕ P 1

2 ),

then (2.3.4) is called a homogeneous approximation formula.

Next, we give some examples of product and boolean sum formulas.Let us denote by Dh = [0, h] × [0, h], h > 0, the so-called standard

rectangle and let consider f : Dh → R.

Example 2.3.3. Let Lx1 and Lx1 be the univariate Lagrange-type interpola-tion operators defined by

(Lx1f)(x, y) = h−xhf(0, y) + x

hf(h, y), (2.3.7)

respectively,

(Ly1f)(x, y) = h−yhf(x, 0) + y

hf(x, h). (2.3.8)

It is easy to check that

(Lx1f)(0, y) = f(0, y), (Lx1f)(h, y) = f(h, y), y ∈ [0, h] (2.3.9)

(Ly1f)(x, 0) = f(x, 0), (Ly1f)(x, h) = f(x, h), x ∈ [0, h].


Let Rx1 = I−Lx1 and Ry

l = I−Ly1 be the corresponding remainder operators,i.e.,

(Rx1f)(x, y) = x(x−h)

2f (2,0)(ξ, y), for f (2,0)(·, y) ∈ C2[0, h], y ∈ [0, h] (2.3.10)

(Ry1f)(x, y) = y(y−h)

2f (0,2)(x, η), for f (0,2)(x, ·) ∈ C2[0, h], x ∈ [0, h]

with ξ, η ∈ [0, h] and

|(Rx1f)(·, y)| ≤ h2

8

∥∥f (2,0)(·, y∥∥∞ , (2.3.11)

|(Ry1f)(x, ·)| ≤ h2

8

∥∥f (0,2)(x, ·)∥∥∞ .

Hence,ord(Lx1) = ord(Ly1) = 2.

Because in this case P2 is

P2 = Lx1 , Ly1, Lx1Ly1, Lx1 ⊕ Ly1

the bivariate operators are only Lx1Ly1 and Lx1⊕Ly1, i.e., the extremal operators

of the lattice (P2,≤).So, we have the algebraic minimal approximation formula:

f = Lx1Ly1f +Rx

1 ⊕Ry1f, (2.3.12)

respectively, the algebraic maximal approximation formula:

f = Lx1 ⊕ Ly1f +Rx1R

y1f. (2.3.13)

From (2.3.7) and (2.3.8) it follows that

(Lx1Ly1f)(x, y) = (h−x)(h−y)

h2 f(0, 0) + x(h−y)h2 f(h, 0) (2.3.14)

+ y(h−x)h2 + xy

h2 f(h, h)

and

(Lx1 ⊕ Ly1f)(x, y) =h−xhf(0, y) + x

hf(h, y) + h−y

hf(x, 0) (2.3.15)

+ yhf(x, h)− (h−x)(h−y)

h2 f(0, 0)

− x(h−y)h2 f(h, 0)− y(h−x)

h2 f(0, h)− xyh2 .


Also, from (2.3.10), one obtains

(Rx1 ⊕Ry

1f)(x, y) =x(x−h)2

f (2,0)(ξ, y) + y(y−h)2

f (0,2)(x, η) (2.3.16)

− xy(x−h)(y−h)4

f (2,2)(ξ1, η1)

and(Rx

1Ry1f)(x, y) = xy(x−h)(y−h)

4f (2,2)(ξ, η). (2.3.17)

Remark 2.3.4. 1) The approximation Lx1Ly1f of f uses the information

Λ(f) = f(0, 0), f(h, 0), f(0, h), f(h, h) of f , which is a scalar information.So, (2.3.12) is a scalar approximation formula.

The information on f used by the approximation Lx1 ⊕y1 f is Λ(f) =f(0, 0), f(h, 0), f(0, h), f(h, h), f(0, y), f(h, y), f(x, 0), f(x, h), for x, y ∈ [0, h],that is not a scalar information, i.e., (2.3.13) is a non-scalar approximationformula.

2) Lx1Ly1f interpolates the function f at the vertices Vi, i = 1, ..., 4 of Dh:

(Lx1Ly1f)(Vi) = f(Vi), i = 1, ..., 4,

i.e., the product Lx1Ly1 is a punctual interpolation operator.

Lx1 ⊕ Ly1f interpolates f at the border ∂Dh of Dh, i.e.:

Lx1 ⊕ Ly1f |∂Dh= f |∂Dh

.

So, Lx1 ⊕ Ly1 is a transfinite interpolation operator, also called a blendinginterpolation operator.

3) Regarding the approximation order, from (2.3.16), (2.3.17) and (2.3.11),it follows

‖Rx1 ⊕Ry

1f‖∞ ≤ h2

8

(∥∥f (2,0)∥∥∞ +

∥∥f (0,2)∥∥∞ + h2

8

∥∥f (2,2)∥∥∞

),

respectively,‖Rx

1Ry1f‖∞ ≤ h4

64

∥∥f (2,2)∥∥∞ ,

i.e.,

ord(Lx1Ly1) = 2,

ord(Lx1 ⊕ Ly1) = 4,

which is in concordance with (2.3.1) and (2.3.2).


Now, using a second approximation level, from the boolean sum inter-polation formula (2.3.13), we obtain a homogenous formula. Taking intoaccount that ord(Lx1 ⊕ Ly1) = 4, in a second level, we must use interpolationoperators of order 4. Such operators are, for example, the Hermite operatorsHx

3 and Hy3 that interpolate the function f at the double nodes 0 and h, i.e.,

(Hx3 f)(x, y) = h00(x)f(0, y)+h01(x)f

(1,0)(0, y)+h10(x)f(h, y)+h11(x)f(1,0)(h, y),

respectively,

(Hy3f)(x, y) = h00(y)f(x, 0)+h01(y)f

(0,1)(x, 0)+h10(y)f(x, h)+h11(y)f(0,1)(x, h),

where

h00(x) = 1h3 (x− h)2(2x+ h), h01(x) = x(x−h)2

h2 ,

h10(x) = 1h3x

2(3h− 2x), h11(x) = x2(x−h)h2 .

For the remainder terms, we have

(Rx3f)(x, y) = x2(x−h)2

24f (4,0)(ξ, y),

(Ry3f)(x, y) = y2(y−h)2

24f (0,4)(x, η),

with

ord(Hx3 ) = ord(Hy

3 ) = 4.

So,

ord(Hx3 ) = ord(Hy

3 ) = ord(Lx1 ⊕ Ly1) = 4.

hence,

f = (Lx1Hy3 +Hx

3Ly1 − Lx1Ly1)f + (Lx1R

y3 + Ly1R

x3 +Rx

1Ry1)f

is a homogenous interpolation formula of order 4.

Remark 2.3.5. From a boolean sum interpolation formula, it can always beobtained a homogenous formula. The condition is to choose, in the secondlevel of approximation, some interpolation operators of a suitable degree.This is always possible.


2.3.2 Interpolation on a simplex domain

We consider only the bivariate case, with the standard triangle

Th = (x, y) ∈ R2 | x ≥ 0, y ≥ 0, x+ y ≤ h, h > 0.

Let us consider the univariate operators P1, P2 and P3 which interpolate agiven function f : Th → R on the set (0, y), (h− y, y), (x, 0), (x, h− x)and, respectively, on (x+ y, 0), (0, x+ y), i.e.,

(P1f)(x, y) = h−x−yh−y f(0, y) + x

h−yf(h− y, y), (2.3.18)

(P2f)(x, y) = h−x−yh−y f(x, 0) + y

h−xf(x, h− x),(P3f)(x, y) = x

x+yf(x+ y, 0) + y

x+yf(0, x+ y).

Using the product and the boolean sum of these operators, we constructbivariate operators with punctual or transfinite interpolation properties.

Example 2.3.6. Let

P = P1P2P3.

We have

(Pf)(x, y) = h−x−yh

f(0, 0) + xhf(h, 0) + y

hf(0, h).

So, P is a punctual interpolation operator that interpolates f at the verticesof Th. Also,

dex(P ) = 1,

i.e.,

Pf = f, for f ∈ P21.

Letf = Pf +Rf (2.3.19)

be the corresponding interpolation formula, where R denotes the remainderoperator.

If f ∈ B1,1[0, 0] then by Peano’s theorem, one obtains

(Rf)(x, y) =

∫ h

0

ϕ20(x, y; s)f(2,0)(s, 0)dt+

∫ h

0

ϕ02(x, y; t)f(0,2)(0, t)dt

+∫∫Th

ϕ11(x, y; s, t)f(1,1)(s, t)dsdt,


whereϕ20(x, y; s) = (x− s)+ − x

h(h− s) ≤ 0, s ∈ [0, h]

ϕ02(x, y; t) = (y − t)+ − yh(h− t) ≤ 0, t ∈ [0, h]

ϕ11(x, y; s, t) = (x− s)0+(y − t)0

+ ≥ 0, (s, t) ∈ Th.As the Peano’s kernels ϕ20, ϕ02 and ϕ11 do not change the sign on [0, h],respectively on Th, by the mean value theorem, we obtain:

(Rf)(x, y) = x(x−h)2

f (2,0)(ξ, 0) + y(y−h)2

f (0,2)(0, η) + xyf (1,1)(ξ1, η1),

where ξ, η ∈ [0, h] and (ξ1, η1) ∈ Th. It follows that:

|(Rf)(x, y)| ≤ h2

4

[12

∥∥f (2,0)(·, 0)∥∥L∞[0,h]

+ 12

∥∥f (0,2)(0, ·)∥∥L∞[0,h]

+∥∥f (1,1)

∥∥L∞(Th)

],

which implies that (2.3.19) is a homogenous interpolation formula of order 2(ord(P ) = 2).

Remark 2.3.7. We have

1) Pi ⊕ Pjf |∂Th= f |∂Th

, ∀i, j = 1, 2, 3; i 6= j

2) dex(Pi ⊕ Pj) = 2.

These properties are easy to verify.

Example 2.3.8. If f (1,0)(0, y), y ∈ [0, h] exists, one considers the Birkhofftype operator B1, defined by:

(B1f)(x, y) = f(h− y, y) + (x+ y − h)f (1,0)(0, y).

For y ∈ [0, h], it is easy to check that

(B1f)(h− y, y) = f(h− y, y),(B1f

(1,0)(0, y) = f (1,0)(0, y).

Let Q := B1 ⊕ P2. We have1)

(Qf)(x, 0) = f(x, 0),

(Qf)(h− y, y) = f(h− y, y),(Qf)(1,0)(0, y) = f (1,0)(0, y), x, y ∈ [0, h].


2)dex(Q) = 2.

Indeed,

(Qf)(x, y) =h−x−yh−x f(x, 0) + y

h−xf(x, h− x)+ (x+ y − h)f (1,0)(0, y) + (h− x− y)

yh2 [f(0, h)− f(0, 0)]

+h−yhf (1,0)(0, 0) + y

h[f (1,0)(0, h)− f (0,1)(0, h)]

,

and the properties 1) and 2) are verified by a straightforward computation.One considers the interpolation formula

f = Qf +Rf, (2.3.20)

with Rf the remainder term. If f ∈ B12[0, 0] then by Peano’s theorem, oneobtains

|(Rf)(x, y)| ≤ h3

27

[23

∥∥f (0,3)(0, ·)∥∥L∞[0,h]

+ 12

∥∥f (1,2)∥∥L∞(Th)

].

So, (2.3.20) is a homogenous blending interpolation formula of Birkhoff-typeand

ord(Q) = 3.

2.3.3 Bivariate Shepard interpolation

Let X ⊂ R2 be an arbitrary domain, f : X → R and Z = zi | zi =(xi, yi) ∈ X, i = 1, ..., N be the set of the interpolation nodes. In 1968,Donald Shepard introduced a new kind of interpolation procedure, wherethe fundamental interpolation functions on a point z := (x, y) are definedusing the distances from the point (x, y) to the interpolation nodes (xi, yi),i = 1, ..., N.

The bivariate Shepard operator, denoted by S0, is defined by:

(S0f)(x, y) =N∑

i=1

Ai(x, y)f(xi, yi), (2.3.21)

with

Ai (x, y) =

N∏j=1j 6=i

ρµj (x, y)

N∑k=1

N∏j=1j 6=k

ρµj (x, y)

, (2.3.22)


where µ ∈ R+ and ρ is a metric on R2. Usually,

ρj(x, y) = ((x− xj)2 + (y − yj)2)1/2.

The functions Ai, i = 1, ..., N, can also be written as

Ai(x, y) =

1ρµ

i (x,y)

N∑j=1

1ρµ

j (x,y)

and

(S0f)(x, y) =

(N∑

i=1

f(xi, yi)

ρµi (x, y)

)/(N∑

j=1

1

ρµi (x, y)

).

It is easy to verify thatAi(xj , yj) = δij (2.3.23)

andN∑

i=1

Ai(x, y) = 1. (2.3.24)

The main properties of Shepard operator S0 are:

• Interpolation properties:

(S0f)(xi, yi) = f(xi, yi), i = 1, ..., N.

• Degree of exactness is:dex(S0) = 0.

mini=1,...,N

f(xi, yi) ≤ (S0f)(x, y) ≤ maxi=1,...,N

f(xi, yi).

These properties are implied by the relations (2.3.23) and (2.3.24).We notice that the drawbacks mentioned for the univariate Shepard method

are also maintained in bivariate case, namely:• high computational cost;• low degree of exactness.Two ways to overcome these drawbacks are:


⋆ Modifying basis functions, for example, using local version of Shepardformula, introduced by Franke and Nielson. In this case the Shepard operatoris of the form:

(Swf) (x, y) =

N∑i=0

Wi (x, y) f (xi, yi)

N∑i=0

Wi (x, y)

, (2.3.25)

with

Wi (x, y) =

[(Rw − ri)+

Rwri

]2

, (2.3.26)

where Rw is a radius of influence about the node (xi, yi) and it is varyingwith i.

⋆ Increasing the degree of exactness by combining the Shepard operatorwith another interpolation operator. In this way its degree of exactnessis increased and there are used another sets of functionals. The generalprocedure is presented in Section 2.2.6.2.

We have some similar results with that given for univariate case.

Remark 2.3.9. If Pi, i = 0, ..., N, are linear operators then SP is also alinear operator.

Remark 2.3.10. Let Pi, i = 0, ..., N, be some linear operators. If

dex (Pi) = ρi, i = 0, ..., N,

thendex (SP ) = min ρ0, ..., ρN .

Remark 2.3.11. As Ai(x, y) ≥ 0, for (x, y) ∈ X, it follows that S0 is apositive operator.

Remark 2.3.12. The exponent µ ∈ R+ can be chosen arbitrary. If 0 ≤ µ ≤ 1the function S0f has peaks at the nodes, for µ > 1 it is level at nodes andfor µ large enough S0f becomes a step function.

We illustrate this phenomenon by some examples.

Example 2.3.13. Let f : [−1, 1] × [−1, 1] → R, f(x, y) = −(x2 + y2) andconsider the nodes z1 = (−1,−1), z2 = (−1, 1), z3 = (1,−1), z4 = (1, 1),z5 = (−0.5,−0.5), z6 = (−0.5, 0.5), z7 = (0.5,−0.5), z8 = (0.5, 0.5), z9 =(0, 0). In Figure 2.2 we plot S0f , for µ = 1, 2, 20.


−1

0

1

−1−0.8−0.6−0.4−0.200.20.40.60.81−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

(a) Function f.

−1−0.5

00.5

1

−1−0.500.51−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

(b) Interpolant S0f , µ = 1.

−1

−0.5

0

0.5

1

−1−0.5

00.5

1−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

(c) Interpolant S0f , µ = 2.

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

(d) Interpolant S0f , µ = 20.

Figure 2.2: Bivariate Shepard interpolants.

2.3.3.1 The bivariate Shepard-Lagrange operator

Let f : X → R, X ⊂ R2 and Z = zi | zi = (xi, yi) ∈ X, i = 1, ..., N be the

set of the interpolation nodes. Consider the set of Lagrange type functionals

ΛL(f) = λi | λi(f) = f(xi, yi), i = 1, . . . , N .

Taking into account that dex(S0) = 0, our goal is to construct Shepard-type operators of higher degree of exactness. To do this, it will be used acombination of the Shepard operator S0 with some Lagrange operator forbivariate functions.

Let Lif be the bivariate n degree Lagrange polynomial that interpolatesthe function f, respectively, at the sets of points

Zm,i := zi, zi+1, . . . , zi+m−1 , i = 1, . . . , N , m < N (2.3.27)

with zN+i = zi, i = 1, . . . , m−1 and m := (n+1)(n+2)/2 being the number


of the coefficients of a bivariate polynomial of the degree n,

Pn(x, y) =∑

i+j≤naijx

iyj.

Remark 2.3.14. For given N , it can be considered only operators Lni withn such that (n + 1)(n + 2)/2 < N , i.e., for n ∈ 1, . . . , ν, where ν =integer[(

√8N + 1−3)/2]. The existence and the uniqueness of the operators

Lni are assured by the following theorem.

Theorem 2.3.15. Let zi := (xi, yi), i = 1, . . . , (n + 1)(n + 2)/2 be differentpoints in plane that do not lie on the same algebraic curve of n-th degree(∑

i+j≤n aijxiyj = 0). Then, for every function f defined at the points zi,

i = 1, . . . , (n+1)(n+2)/2 there exists a unique polynomial Qn of n-th degreethat interpolates f at zi, i.e.,

Qn(xi, yi) = f(xi, yi), i = 1, . . . , (n+ 1)(n+ 2)/2.

Hence, if the points zk, k = i, . . . , i + m − 1; i = 1, . . . , N, of the set(2.3.27), do not lie on an algebraic curve of n-th degree, then Lni exists andit is unique for all i = 1, . . . , N .

Suppose that the existence and uniqueness conditions of the operatorsLni , i = 1, . . . , N , are satisfied.

We have

(Lni f)(x, y) =

i+m−1∑

k=i

lk(x, y)f(xk, yk), i = 1, . . . , N,

where lk are the corresponding cardinal polynomials

lk(xj , yj) = δkj, k, j = i, . . . , i+m− 1.

The main properties of the Lagrange interpolation operators Lni are:

(Lni f)(xk, yk) = f(xk, yk), k = i, . . . , i+m− 1 (2.3.28)

and

dex(Lni ) = n, i = 1, . . . , N. (2.3.29)


Definition 2.3.16. The operator SLn given by

(SLn f)(x, y) =N∑

i=1

Ai(x, y)(Lni f)(x, y) (2.3.30)

is called the bivariate Shepard-Lagrange operator.

Remark 2.3.17. As the Lagrange operators Lni , i = 1, . . . , N and the Shep-ard operator S0 are linear operators, it follows that the combined operatorSLn is also linear.

Theorem 2.3.18. Let f : D → R be a given function, zi := (xi, yi) ∈ D,i = 1, . . . , N and m := (n + 1)(n + 2)/2. If the points zi, . . . , zi+m−1 do notlie on an algebraic curve of n-th degree, for all i = 1, . . . , N (zN+k := zk,k = 1, . . . , m − 1), then the combined operator SLn exists and it have thefollowing properties:

(SLn f)(xj, yj) = f(xj , yj), j = 1, . . . , N (2.3.31)

anddex(SLn ) = n. (2.3.32)

Proof. From (2.3.28) it follows

(SLn f)(xj, yj) =

N∑

i=1

Ai(xj, yj)(Lni f)(xj , yj) =

N∑

i=1

Ai(xj , yj)f(xj, yj)

and (2.3.23) implies the interpolation property (2.3.31).Relation (2.3.32) follows by Remark 2.3.10, taking into account that

dex(Lni ) = n.Next, one considers two particular cases: n = 1 and n = 2.1) Case n = 1. We have

(SL1 f)(x, y) =

N∑

i=1

Ai(x, y)(L1i f)(x, y), (2.3.33)

where

(L1i f)(x, y) = li(x, y)f(xi, yi)+ li+1(x, y)f(xi+1, yi+1)+ li+2(x, y)f(xi+2, yi+2),


for i = 1, . . . , N (zN+1 := z1, zN+2 := z2). One obtains

li(x, y) = (yi+1−yi+2)x+(xi+2−xi+1)y+xi+1yi+2−xi+2yi+1

(xi−xi+1)(yi+1−yi+2)−(xi+1−xi+2)(yi−yi+1),

li+1(x, y) = (yi+2−yi)x+(xi−xi+2)y+xi+2yi−xiyi+2

(xi+1−xi+2)(yi+2−yi)−(xi+2−xi)(yi+1−yi+2),

li+2(x, y) = (yi−yi+1)x+(xi+1−xi)y+xiyi+1−xi+1yi

(xi+2−xi)(yi−yi+1)−(xi−xi+1)(yi+2−yi).

Remark 2.3.19. The existence and uniqueness condition of L1i is that the

points zi, zi+1, zi+2 do not lie on a line Ax+By +C = 0, or have to be thevertices of a non-degenerate triangle ∆i, for all i = 1, . . . , N .

Remark 2.3.20. The functions SL1 f and S0f use the same information aboutf (f(xi, yi), i = 1, . . . , N), but dex(SL1 ) = 1, while dex(S0) = 0.

2) Case n = 2. From (2.3.30), we obtain

(SL2 f)(x, y) =

N∑

i=1

Ai(x, y)(L2i f)(x, y), (2.3.34)

where L2i f are two-degree polynomials that interpolate f at zi, . . . , zi+5, re-

spectively.In accordance with Theorem 2.3.18, the interpolation nodes zi, . . . , zi+5

must not lie, respectively, on an algebraic curve of second degree gi, i =1, . . . , N . If, for some j, 1 ≤ j ≤ N this condition is not satisfied, the indexorder of zi, i = 1, . . . , N can be changed.

Remark 2.3.21. A Shepard-Lagrange operator of two-degree of exactnesscan be obtained using more information about f . Roughly speaking, theonly problem is to exist some two-degree Lagrange type polynomials whichinterpolate f at zk, k = 1, . . . , N.

A sufficient condition for the existence and uniqueness of such polynomi-als, say L2

i f , is that the six interpolation nodes to be the vertices zi, zi+1, zi+2

of the triangle ∆i and the midpoints, say ξi, ξi+1, ξi+2, of the sides of ∆i, fori = 1, . . . , N .

Using the polynomials L2i f, i = 1, . . . , N, one can define the Shepard

operator SL2 :

(SL2 f)(x, y) =

N∑

i=1

Ai(x, y)(L2i f)(x, y),

which interpolates f at zi, i = 1, . . . , N and dex(SL2 ) = 2.


Of course, SL2 f also uses the values of the function f at the midpoints

ξi, i = 1, . . . , N + 2. But, the advantage of SL2 f is that the existence and

uniqueness conditions of the polynomials L2i f are more simple to verify.

An extension of the Shepard-Lagrange operator type SLn , is obtained if theLagrange polynomials Lif, used in combination with the Shepard operatorS0, are of different degree, say ni, i = 1, . . . , N, such that (ni+1)(ni+2)/2 <N .

So, letZmi,i := zi, zi+1, . . . , zi+mi−1 , i = 1, . . . , N ,

be some sets of mi interpolation nodes, respectively, with mi = (ni +1)(ni +2)/2. Using the given information on f at the nodes zi ∈ Zmi,i, there areconstructed the corresponding Lagrange interpolation operators Lni

i , i =1, . . . , N .

Definition 2.3.22. The operator SLn1,...,nNdefined by

(SLn1,...,nNf)(x, y) =

N∑

i=1

Ai(x, y)(Lni

i f)(x, y) (2.3.35)

is called a general Shepard-Lagrange operator.

Remark 2.3.23. For n1 = · · · = nN = n the operator SLn1,...,nNbecomes SLn .

Theorem 2.3.24. Suppose that SLn1,...,nNexists. Then

(SLn1,...,nNf)(xi, yi) = f(xi, yi), i = 1, . . . , N

anddex(SLn1,...,nN

) = minn1, . . . , nN.

Proof. The proof follows by (2.3.23) and Remark 2.2.43.

Example 2.3.25. For f the same as in Example 2.3.13 and the same nodes,in Figure 2.3 we plot SL1 f, for µ = 1; 2.


−1

−0.5

0

0.5

1

−1−0.5

00.5

1−2

−1.5

−1

−0.5

0

0.5

(a) Interpolant SL

1f, µ = 1.

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−2

−1.5

−1

−0.5

0

0.5

(b) Interpolant SL

1f, µ = 2.

Figure 2.3: Bivariate Shepard-Lagrange interpolants.

2.3.3.2 The bivariate Shepard-Hermite operator

A new extension of Shepard operator S0 can be obtained using not onlyLagrange type information on f , but also some of its partial derivatives.

Let X ⊂ R2 be an arbitrary domain, f : X → R and Z = zi | zi =(xi, yi) ∈ X, i = 1, ..., N be the set of the interpolation nodes. Considerthe set of Hermite type functionals:

ΛH(f) =λp,qi | λp,qi f = f (p,q)(zi), (p, q) ∈ N

2, p+ q ≤ mi, i = 1, . . . , N,

with mi ∈ N∗, i = 1, ..., N .First, denote by ΛH,k ⊂ ΛH the functionals corresponding to the point

zk, i.e.,

ΛH,k(f) :=λp,qk | λp,qk f = f (p,q)(zk), (p, q) ∈ N

2, p+ q ≤ mk

.

Let Tmk

k be the bivariate Taylor interpolation operator corresponding tothe set ΛH,k:

(Tmk

k f)(x, y) =∑

i+j≤mk

(x−xk)i(y−yk)j

i!j!f (i,j)(xk, yk).

As ΛH,k(f), k = 1, . . . , N, is an admissible sequence of functionals, the oper-ator Tmk

k exists, for all k = 1, . . . , N .

Definition 2.3.26. The operator STm1,...,mNgiven by

(STm1,...,mNf)(x, y) =

N∑

i=1

Ai(x, y)(Tmi

i f)(x, y) (2.3.36)

is called the bivariate Shepard-Taylor operator.


Remark 2.3.27. The operator ST1 , given by

(ST1 f)(x, y) =

N∑

i=1

Ai(x, y)(T1i f)(x, y)

or

(ST1 f)(x, y) =n∑

i=1

Ai(x, y)[f(xi, yi)+(x−xi)f (1,0)(xi, yi)+(y−yi)f (0,1)(xi, yi)]

was defined and studied by Shepard himself.

For µ > 1, it is easy to verify that

(ST1 f)(p,q)(xi, yi) = f (p,q)(xi, yi), p+ q ≤ 1, i = 1, . . . , N

anddex(ST1 ) = 1.

Theorem 2.3.28. For µ > maxm1, . . . , mN we have

(STm1,...,mNf)(p,q)(xi, yi) = f (p,q)(xi, yi), p+q ≤ mi, i = 1, . . . , N, (2.3.37)

respectively,dex(STm1,...,mN

) = minm1, . . . , mN. (2.3.38)

Proof. First, we see thatA

(p,q)i (xk, yk) = 0, k = 1, . . . , N ; 0 ≤ p+ q ≤ mk, i 6= k

A(p,q)i (xi, yi) = 0, p+ q ≥ 1,

(2.3.39)

for all i = 1, . . . , N . Indeed, let us consider

Ak =gkhk

with

gk(x, y) =N∏

j=1j 6=k

ρµj (x, y)

hk(x, y) =

N∑

k=1

N∏

j=1j 6=k

ρµj (x, y).


After a straightforward computation, one obtains

g(p,q)k (xi, yi) = 0, 1 ≤ p+ q ≤ mi,

for µ > maxm1, . . . , mN, i = 1, . . . , N, i 6= k and

g(p,q)k (xk, yk) = h

(p,q)k (xk, yk), 1 ≤ p+ q ≤ mk.

So, (2.3.39) follows. Now, relations (2.3.39) imply the interpolation prop-erty (2.3.37). The degree of exactness property (2.3.38) is a consequence ofRemark 2.3.10, namely of the fact that dex(Tmk

k ) = mk, k = 1, ..., N.Next, for νk ∈ N∗, νk < N , let us consider the set

ΛH,νk(f) :=λp,qk+j | λp,qk+jf = f (p,q)(zk+j), (p, q) ∈ N

2, p+ q ≤ mk,

j = 0, 1, . . . , νk − 1

with zN+i = zi, i ∈ N∗, i.e., let ΛH,νk(f) ⊂ ΛH(f) be the set of functional

corresponding to the nodes zk, zk+1, . . . , zk+νk−1, regarding function f.If |ΛH,νk

(f)| = mk let Hnk

k be the Hermite interpolation operator cor-responding to the set ΛH,νk

(f), k = 1, . . . , N, with nk such that mk =(nk + 1)(nk + 2)/2.

Definition 2.3.29. Suppose that Hnk

k exists for all k = 1, . . . , N. The oper-ator SHn1,...,nN

defined by

(SHn1,...,nNf)(x, y) =

N∑

k=1

Ak(x, y)(Hnk

k f)(x, y)

is called the bivariate Shepard-Hermite operator.

Theorem 2.3.30. If ΛH,νk(f) ⊂ ΛH(f), k = 1, . . . , N is an admissible se-

quence, then SHn1,...,nNexists and it has the following interpolation properties:

(SHn1,...,nNf)(p,q)(xi, yi) = f (p,q)(xi, yi), p+ q ≤ mi, i = 1, . . . , N, (2.3.40)

for µ > maxm1, ..., mN and

dex(SHn1,...,nN) = minn1, . . . , nN (2.3.41)


Proof. Relations (2.3.40) follows by (2.3.39), taking into account the inter-polation properties of the polynomials Hnk

k f, k = 1, ..., N.Relation (2.3.41) follows by Remark 2.2.43.

Particular cases

• if νk = 1, for all k = 1, . . . , N, then SHn1,...,nNbecomes the Shepard-Taylor

operator (2.3.36)

• for n1 = · · · = nN = n, all polynomials Hnk

k , k = 1, . . . , N are of degreen and one obtains the Shepard-Hermite operator SHn :

(SHn f)(x, y) =

N∑

k=1

Ak(x, y)(Hnk f)(x, y). (2.3.42)

Example 2.3.31. Let

ΛH(f) =λp,qi | λp,qi f = f (p,q)(xi, yi), (p, q) ∈ N

2, p+ q ≤ 1, i = 1, . . . , N

(2.3.43)and

ΛH,νk(f) =

λ00k , λ

10k , λ

01k , λ

00k+1, λ

00k+2, λ

00k+3

, (2.3.44)

with νk = 4 and λN+1 = λ1, λN+2 = λ2, λN+3 = λ3. The existence anduniqueness of the Hermite operator H2

k are based on the following auxiliaryresult.

Lemma 2.3.32. Let zk+i = (xk+i, yk+i), i = 0, 1, 2, 3 be some given points inplane such that:

• xk+i 6= xk, for i = 1, 2, 3• if lk+i is the line determined by the points zk and zk+i, i = 1, 2, 3 then

lk+i 6= lk+j, for i, j = 1, 2, 3, i 6= j.

Then for every function f with the given informationf(zk), f

(1,0)(zk), f(0,1)(zk), f(zk+1), f(zk+2), f(zk+3)

there exists a unique

polynomial P2 of second degree such that

P

(i,j)2 (zk) = f (i,j)(zk), i, j = 0, 1 i 6= j ≤ 1P2(zk+i) = f(zk+i), i = 1, 2, 3

(2.3.45)

Proof. Let P2 be an arbitrary polynomial of second degree:

P2(x, y) = Ax2 +Bxy + Cy2 +Dx+ Ey + F. (2.3.46)


From (2.3.45) one obtains a 6 × 6 linear algebraic system. Let M be itsmatrix. Then, after some permissible transformation, one obtains

detM = −3∏

i=1

(xk+i − xi)2

∣∣∣∣∣∣

1 αk+1 α2k+1

1 αk+2 α2k+2

1 αk+3 α2k+3

∣∣∣∣∣∣,

where

αk+i =yk+i − ykxk+i − xk

, i = 1, 2, 3. (2.3.47)

The hypothesis from Lemma (2.3.32) implies that detM 6= 0, i.e., the systemhas a unique solution.

Theorem 2.3.33. If ΛH,νk, k = 1, . . . , N, given by (2.3.44), is an admissible

sequence, then there exists the Shepard-Hermite operator SH2 , defined by

(SH2 f)(x, y) =

N∑

k=1

Ak(x, y)(H2kf)(x, y),

where H2k is the interpolation operator corresponding to the set ΛH,νk

.

Proof. The proof follows from Lemma 2.3.32. In order to give effectivelythe function SH2 f , we have to determine the Hermite polynomials H2

kf , k =1, . . . , N . For this, we consider H2

kf in the form:

H2kf =h00

kkf(xk, yk) + h10kkf

(1,0)(xk, yk) + h01kkf

(0,1)(xk, yk)

+ h00kkf(xk+1,k+1) + h00

kkf(xk+2,k+2) + h00kkf(xk+3,k+3),

where hp,qi,j are the corresponding fundamental interpolation polynomials.Hence, each polynomial hp,qi,j is a bivariate polynomial of the second degree, asin (2.3.46), that satisfies the cardinal interpolation conditions. For example,

h00kk(xk, yk) = 1h10kk(xk, yk) = 0h01kk(xk, yk) = 0h00kk(xk+1, yk+1) = 0h00kk(xk+2, yk+2) = 0h00kk(xk+3, yk+3) = 0

which is a linear algebraic 6× 6 system.


Example 2.3.34. Let

ΛH(f) =λ0,0k , λ1,0

k , λ0,1k , λ1,1

k |k = 1, . . . , N. (2.3.48)

One considers the sequence of subsets of ΛH :

ΛH,νk(f) =

λ0,0k , λ1,0

k , λ0,1k , λ1,1

k , λ0,0k+1, λ

0,0k+2

, νk = 3, k = 1, . . . , N,

(2.3.49)with

λN+1 = λ1, λN+2 = λ2.

Lemma 2.3.35. If• xk+1 6= xk and xk+2 6= xk,• αk+1 6= αk+2 and αk+1 6= −αk+2, for αk+1and αk+2 given in (2.3.47)then (2.3.49) is an admissible sequence.

Proof. For P2 as in (2.3.46) we have to study the linear system:

P

(i,j)2 (xk, yk) = f (i,j)(xk, yk),P2(xk+1, yk+1) = f(xk+1, yk+1),P2(xk+2, yk+2) = f(xk+2, yk+2),

with (i, j) ∈ (0, 0), (0, 1), (1, 0), (1, 1) . The determinant of this system is:

D = (xk+1 − xk)2(xk+2 − xk)2(α2k+2 − α2

k+1)

and the proof follows.

Remark 2.3.36. From Lemma 2.3.35 it follows that the Shepard-Hermiteoperator, corresponding to the set of functionals (2.3.48), is given by

SH2 f =

n∑

i=0

AiH2kf,

where H2k is the Hermite interpolation operator regarding ΛH,νk

. This oper-ator exists and is unique.

Example 2.3.37. For f the same as in Example 2.3.13 and the same nodes,in Figure 2.4 we plot ST1 f, for µ = 1; 2.


−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−2

−1.5

−1

−0.5

0

0.5

1

(a) Interpolant ST

1f, µ = 1.

−1

−0.5

0

0.5

1

−1−0.5

00.5

1−2

−1.5

−1

−0.5

0

0.5

(b) Interpolant ST

1f, µ = 2.

Figure 2.4: Bivariate Shepard-Taylor interpolants.

Example 2.3.38. Let g : [−2, 2]× [−2, 2]→ R,

g(x, y) = xe−(x2+y2),

and consider 10 random nodes, uniformly distributed on the domain of g. InFigure 2.5 we plot SH2 g, for µ = 1; 2.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2−1

01

2−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

(a) Function g.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1

0

1

2

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

(b) SH

2g, µ = 1.

−2

−1

0

1

2

−2

−1

0

1

2−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

(c) SH

2g, µ = 2.

Figure 2.5: Bivariate Shepard-Hermite interpolants.

2.3.3.3 The bivariate Shepard-Birkhoff operator

Let X ⊂ R2 be an arbitrary domain, f : X → R and Z = zi | zi = (xi, yi) ∈X, i = 1, ..., N be the set of the interpolation nodes. Consider the set ofBirkhoff type functionals:

ΛB(f) =λp,qk |λp,qk f = f (p,q)(zk), (p, q) ∈ Ik ⊂ N

2, k = 1, . . . , N.

and the sequence Λk,νk, k = 1, . . . , N of the subsets of ΛB:

ΛB,νk(f) =

λp,qk+j ∈ ΛB(f) |(p, q) ∈ Ik+j, j = 0, 1, . . . , νk − 1

, (2.3.50)


with νk < N, k = 1, . . . , N and zN+i = zi, i ≥ 1.Denote by Bnk

k the interpolation operator of the total degree nk corres-ponding to the subset ΛB,νk

, i.e.,

λp,qk+j(Bnk

k ) = λp,qk+j(f), (p, q) ∈ Ik+j, j = 0, 1, . . . , νk − 1. (2.3.51)


(Bnk

k )(x, y) =∑

i+j≤nk

aijxiyj,

the interpolation condition (2.3.51) gives rise to a linear algebraic system inthe unknowns aij , i, j ∈ N, i+ j ≤ nk. Let Mk := M(ΛB,νk

) be the matrix ofthis system. If |ΛB,νk

| = mk := (nk+1)(nk+2)/2 then Mk is a square matrix.Hence, for detMk 6= 0 the system has a unique solution. It follows that, ifdetMk 6= 0, for all k = 1, . . . , N, then (2.3.50) is an admissible sequence.

Definition 2.3.39. Suppose that detMk 6= 0, k = 1, . . . , N . The operatorSBf defined by

(SBn1,...,nNf)(x, y) =

N∑

k=1

Ak(x, y)(Bnk

k f)(x, y) (2.3.52)

is called the bivariate Shepard-Birkhoff operator.

Theorem 2.3.40. If ΛB,νk, k = 1, . . . , N is an admissible sequence then, for

µ > M := max1≤k≤N

µk | µk = max p+ q| (p, q) ∈ Ik we have

(SBn1,...,nNf)(p,q)(xk, yk) = f (p,q)(xk, yk), (p, q) ∈ Ik, k = 1, . . . , N (2.3.53)

anddex(SB) = m := min nk| k = 1, . . . , N . (2.3.54)

Proof. For µ > M, the relation (2.3.39) implies:

(SBn1,...,nNf)(p,q)(xi, yi) =

N∑

k=1

(AkBnk

k f)(p,q)(xi, yi)

=N∑

k=1

Ak(xi, yi)(Bnk

k f)(p,q)(xi, yi)

Now, from (2.3.23) and (2.3.51) the interpolation property (2.3.53) follows.Relation (2.3.54) follows by Remark 2.2.43.


Remark 2.3.41. A particular case of the operator SBn1,...,nNis obtained for

nk = n, k = 1, . . . , N, i.e., when all the polynomials Bnk

k f have the samedegree n. One obtains

(SBn f)(x, y) =N∑

k=1

Ak(x, y)(Bnkf)(x, y).

Remark 2.3.42. The Shepard operator of Birkhoff type SBn is an extensionof all previous Shepard operators of Lagrange, Taylor and Hermite type.

The difficult problem which appears in the Hermite and Birkhoff casesis to prove the admissibility of the considered sequence of the subsets ofΛB. Theorem 2.3.15 gives sufficient condition for the admissibility of suchsequences in the Shepard operators of Lagrange type. Also, the problemof sequence admissibility is completely solved for the Shepard operators ofTaylor type.

Example 2.3.43. For g the same as in Example 2.3.38 and 10 randomnodes, uniformly distributed on the domain of g, in Figure 2.6 we plot SB2 g,for µ = 2; 20.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

02

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

(a) Interpolant SB

2g, µ = 2.

−2

−1

0

1

2

−2

−1

0

1

2−3

−2

−1

0

1

2

3

(b) Interpolant SB

2g, µ = 20.

Figure 2.6: Bivariate Shepard-Birkhoff interpolants.

2.4 Uniform approximation

Let f : [a, b] → R be an integrable function, λk(f), k = 0, 1, ...m someinformation about f , w : [a, b] → R+ a weight function, also integrable on

2.4. Uniform approximation 129

[a, b], and

I(f) =

∫ b

a

w(x)f(x)dx.

2.4.1 The Weierstrass theorems

Let B be a given class of real-valued functions defined on an interval [a, b] ⊂ R

and A ⊂ B. One considers the following problem: for some given f ∈ B andε ∈ R+ find a function g ∈ A such that

|f(x)− g(x)| < ε, ∀x ∈ [a, b].

The solution of this problem is based on the Weierstrass theorems, usu-ally called ”First Weierstrass Theorem”, respectively, ”Second WeierstrassTheorem”.

Theorem 2.4.1. (First Weierstrass Theorem) For any f ∈ C[a, b] and ε > 0there exists an algebraic polynomial P (P ∈ P) such that

|f(x)− P (x)| < ε, ∀x ∈ [a, b].

In other words, the set P (P ⊂ C[a, b]) is dense in the Banach space C[a, b].

Next we give some other equivalent formulations of this theorem.

Theorem 2.4.2. Any function f ∈ C[a, b] is the limit of a sequence ofalgebraic polynomials, uniform convergent to f, on [a, b].

Theorem 2.4.3. For any function f ∈ C[a, b] a series of algebraic polyno-mials absolute and uniform convergent to f, on [a, b], can be found.

Theorem 2.4.4. (Second Weierstrass Theorem) The set T of all trigonomet-ric polynomials is dense in C(T ), where C(T ) denotes the class of continuousfunctions of period 2π.

Remark 2.4.5. As Theorem 2.4.4 can be obtained as a consequence of The-orem 2.4.1, we shall focus on the First Weierstrass Theorem and present someof the many existing proofs.

A proof of Weierstrass theorem based on the probability theory.In what follows, we present the well-known proof given by S. Bernstein in1912, exactly in the way it was given. Because he used probabilistic tools, theinterval I = [0, 1] is considered. Under these circumstances, we give anotherenounce of the Weierstrass theorem:


Theorem 2.4.6. If F (x) is a continuous function everywhere in the interval[0, 1], it is always possible to determine a polynomial of degree n,

En(x) = a0xn + a1x

n−1 + · · ·+ an,

with n large enough, such that

|F (x)− En(x)| < ε

at every point of [0, 1], for any ε, no matter how small it is.

Proof. We consider an event A of probability x. Suppose that we make n

experiences and we pay to a gambler the amount of money F(mn

)if the

event A appears m times. Under this circumstances, the mean value, En, ofthe gambler will have the following expression:

En =n∑

m=0

F(mn

)Cmn x

m(1− x)n−m. (2.4.1)

From the continuity of function F (x), it is possible to fix a number, δ,such as the inequality

|x− x0| ≤ δ

generates

|F (x)− F (x0)| < ε2.

Denoting with F (x) the maximum and F (x) the minimum of F (x) in(x− δ, x+ δ), we get

F (x)− F (x) < ε2, F (x)− F (x) < ε

2. (2.4.2)

Let η be the probability of inequality∣∣∣x− m

n

∣∣∣ > δ and L the maximum

of |F (x)| in [0, 1]. Then, we have:

F (x)(1− η)− Lη < En < F (x)(1− η) + Lη. (2.4.3)

But, according with Bernoulli’s Theorem, we may take n so big in orderto get

η < ε4L. (2.4.4)


Then, the inequality (2.4.3) can be written as follows:

F (x)+(F (x)−F (x))−η(L+F (x)) < En < F (x)+(F (x)−F (x))+η(L−F (x))

and thenF (x)− ε

2− 2L

4Lε < En < F (x) + ε

2+ 2L

4Lε,

and so|F (x)−En| < ε. (2.4.5)

Or, En is a polynomial of degree n. So, the theorem is proved.Remark on the Bernstein polynomial. The Bernstein polynomials

can be written in the more general form

b(x) =

n∑

k=0

βk(nk

)xk(1− x)n−k, (2.4.6)

where β0, . . . , βn are given coefficients. The case βk = 1, k = 0, n generatesthe relation

n∑

k=0

(nk

)xk(1− x)n−k = (x+ (1− x))n = 1,

fact which is already well-known, being the sum of the probabilities in thedistribution of the random variable introduced by Bernstein (even not veryclearly emphasized!). What is more interesting, is the fact that, if b = 0(identically equal), all coefficients βn are zero.

By induction, the base case n = 0 is obvious. When n > 0, we see thatthe derivative of a Bernstein polynomial can be written as another Bernsteinpolynomial:

b′(t) = n

n−1∑

k=0

(βk+1 − βk)(n−1k

)xk(1− x)n−1−k. (2.4.7)

In particular, if b = 0, it follows by the induction hypothesis that all βkare equal, and then they are all zero, by (2.4.2).

In other words, the polynomials xk(1 − x)n−k, k = 0, . . . , n are linearlyindependent, and hence they span the n+1 dimensional space of polynomialsof degree ≤ 1. Thus, all polynomials can be written as Bernstein polynomials.

A proof of Weierstrass theorem using beta functions. Let usdefine, for 0 < α < 1,

Uα(x) = xα(1− x)1−α.


These functions are the well known Beta functions. They are increasingon the interval [0, α] and decreasing on the interval [α, 1].

Lemma 2.4.7. Let 0 ≤ a < b ≤ α or α ≤ b < a ≤ 1. Then

Uα(a)

Uα(b)≤ e−2(a−b)2 .

Proof. Let us assume that 0 ≤ a < b ≤ α. It will be done if we prove thatthe function

H(x) = ln(Uα(x))− ln(Uα(b)) + 2(x− b)2

is negative for 0 ≤ x < b.A simple computation of the derivative of H gives us

H ′(x) = α−xx(1−x) + 4(x− b) ≥ 4(α− x) + 4(x− b) ≥ 0

because x(1− x) ≤ 1

4. This ends the proof, since F (b) = 0.

The case α ≤ b < a ≤ 1 can be proved in a similar way.

Lemma 2.4.8. Let ε > 0, µ > 0 and 0 ≤ p ≤ n. There exists N such thatfor n > N ∑

|k−p|>2µn

(nk

)xk(1− x)n−k < ε,

forx ∈

(pn− µ, p

n+ µ).

Proof. Let A be a set of all 0 ≤ k ≤ n such that |k − p| ≥ 2µn. Let B bea part of A of elements less than p, let C be a part of A of elements greaterthan p.

Defines = p−µn

n, r = p−2µn

n.

Notice that

xk(1− x)n−k = (Uα(x))n, where α = k

n.

For k ∈ B and x ∈(pn− µ, p

n+ µ)

we have α ≤ r ≤ r + η ≤ s < x so

that Lemma 2.4.7 can be applied:

xk(1− x)n−k = (Uα(x))n ≤ (Uα(s))

n ≤(e−2µ2

Uα(r))n

= e−2µ2nrk(1− r)n−k ≤ ε2rk(1− r)n−k


if

n >

log

2

ε

2µ2 .

Finally,

∑

k∈B

(nk

)xk(1− x)n−k ≤

∑

k∈B

(nk

)ε2rk(1− r)n−k

≤ ε2

n∑

k=0

(nk

)rk(1− r)n−k = ε

2.

A similar inequality can be derived with a set C instead of B. Adding bothinequalities, we receive the assertion of Lemma 2.4.8.

Remark 2.4.9. If the summation of the statement of Lemma 2.4.8 extendsover all 0 ≤ k ≤ n, then the result of this summation is 1.

A proof of Weierstrass theorem using the concept of convolu-tion.

Definition 2.4.10. If f and g are some suitable functions on R, then theconvolution f ∗ g is the function defined by

f ∗ g(x) =

∫ ∞

−∞f(x− t)g(t)dt,

where ”suitable” means that the integral exists.

Remark 2.4.11. In what follows, we need the Delta-Dirac function, δ, whichhas the properties:

δ(x) = 0 if x 6= 0 and

∫ ∞

−∞δ(t)dt = 1.

We should think of it as ”The density function of a unit mass at theorigin”.

For example,

f ∗ δ(x) =

∫ ∞

−∞f(x− t)δ(t)dt = f(x),

since δ(t) = 0 except at t = 0.


Remark 2.4.12. Having in mind the previous remark, what we have to donow is to find a sequence of functions (Kn) which approximate the δ-function.The sequence (Kn ∗ f) will then approximate δ ∗ f = f .

Definition 2.4.13. The n-th Landau kernel function is

Kn = cn(1− x2)n for x ∈ [−1, 1] and 0 otherwise,

where cn is chosen so that ∫ ∞

−∞Kn = 1.

Lemma 2.4.14. If f is a continuous function on the interval [−1, 1], thenKn ∗ f is a polynomial.

Proof. We have

Kn ∗ f(x) =

∫ 1

−1

Kn(x− t)f(t)dt

and Kn is a polynomial, and so Kn(x− t) can be expanded as

g0(t) + g1(t)x+ · · ·+ g2n(t)x2n

and so the integral is a polynomial in x.

Lemma 2.4.15. The sequence

(Kn ∗ f)→ f for n→∞.

Proof. We need to show that Kn has ”most of its area” concentrated nearx = 0.

First we estimate how big cn is:

∫ 1

−1

(1− t)2ndt = 2

∫ 1

0

(1− t)n(1 + t)ndt ≥ 2

∫ 1

−1

(1− t)ndt = 2n+1

.

Since

∫ ∞

−∞Kn = 1 we must have cn ≤

n+ 1

2.

(In fact, cn grows like a multiple of√n. For large n, cn is approximately

0, 565√n.)


Concerning the area under Kn which is not near 0 we have

∫ 1

δ

Kn(t)dt =

∫ 1

δ

cn(1− t2)ndt ≤ n+12

∫ 1

δ

(1− δ2)ndt,

since Kn is decreasing on [δ, 1] and this is

n+12

(1− δ2)n(1− δ).

If r = 1− δ2 then (n + 1)rn → 0 as n→∞.The function f is continuous and bounded by M .If x ∈ [0, 1] then given ε > 0 we may find δ > 0 such that if |t| < δ then

|f(x− t)− f(x)| < ε. So now look at the convolution Kn ∗ f :

|f(x)−Kn ∗ f(x)| =∫ 1

−1

|f(x)− f(x− t)|Kn(t)dt︸︷︷︸A

=

∫ −δ

−1

A+

∫ δ

−δA+

∫ 1

δ

A.

Now, on [−1,−δ] and on [δ, 1], we have that Kn(t) is small if we chooseδ small. In fact, we can choose δ so that Kn(t) <

εM

and then the first andthe third integral are less than ε.

For the middle integral,

∫ δ

−δ|f(x)− f(x− t)|Kn(t)dt ≤

∫ δ

−δεKn(t)dt < ε,

since ∫ 1

−1

Kn(t)dt = 1.

Thus, |f(x) −Kn ∗ f(x)| is small when n is large and we have our con-vergence. This complete the proof of Weierstrass approximation theorem.

2.4.2 The Stone-Weierstrass theorem

The generalized approach of the Weierstrass approximation theorem is knownas the Stone-Weierstrass theorem.

Theorem 2.4.16. Let X be a compact metric space and let C0(X,R) be thealgebra of continuous real functions defined over X. Let A be a subalgebra ofC0(X,R) for which the following conditions hold:


(1) ∀x, y ∈ X, x 6= y, ∃f ∈ A : f(x) 6= f(y),(2) 1 ∈ A.Then A is dense in C0(X,R).

Proof. Let A denote the closure ofA in C0(X,R), according with the uniformconvergence topology. We want to show that, if conditions 1 and 2 aresatisfied, then A = C0(X,R).

1. Firstly, we show that, if f ∈ A, then |f | ∈ A.Since f is a continuous function on a compact space, f must be bounded:

there exists constants a and b such that a ≤ f ≤ b. By the Weierstrassapproximation theorem, for every ε > 0, there exists a polynomial such that∣∣∣P (x)− |x|

∣∣∣ < ε when x ∈ [a, b].

Define g : X → R by g(x) = P (f(x)). Since A is an algebra, g ∈ A.

For all x ∈ X,∣∣∣g(x) − |f(x)|

∣∣∣ < ε. Since A is closed under the uniform

convergence topology, this implies that |f | ∈ A.A corollary of the fact just proven is that if f, g ∈ A, then max(f, g) ∈ A

and min(f, g) ∈ A, because one can write

max(a, b) =1

2(|a+ b|+ |a− b|),

min(a, b) =1

2(|a+ b| − |a− b|).

2. Secondly, we shall show that, for every f ∈ C0(X,R), x ∈ X and ε > 0,there exists gx ∈ A such that gx ≤ f + ε and g(x) > f(x). By condition 1, ify 6= x, there exists a function hxy ∈ A such that hxy(x) 6= hxy(y). Define hxyby

hxy(z) = phxy(z) + q,

where the constants p and q are chosen so that

hxy(x) = f(x) + ε/2,

hxy(y) = f(y)− ε/2.

By condition 2, hxy ∈ A.For every y 6= x, define the set

Uxy = z ∈ X | hxy(z) < f(z) + ε.


Since f and hxy are continuous, Uxy is an open set. Because x ∈ Uxy andy ∈ Uxy, Uxy | y ∈ X \ x is an open cover of X. By the definition ofa compact space, there must exist a finite subcover. In other words, thereexists a finite subset y1, y2, . . . , yn ⊂ X such that

X =

n⋃

m=0

Uxym.

Define gx = minhxy1, . . . , hxyn. By the corollary of the first part of the

proof, gx ∈ A. By construction,

gx(x) = f(x) + ε/2 and g < f + ε.

3. Thirdly, we shall show that, for every f ∈ C0(X,R) and every ε > 0,there exists a function g ∈ A such that f ≤ g < f + ε. This will completethe proof because it implies that A = C0(X,R). For every x ∈ X, define theset Vx as

Vx = z ∈ X | gx(z) > f(x),where gx is defined as before. Since f and gx are continuous, Vx is an openset. Because gx(x) = f(x) + ε/2 > f(x), x ∈ Vx. Hence Vx | x ∈ X is anopen cover of X. By the definition of a compact space, there must exist afinite subcover. In other words, there exists a finite subset x1, . . . , xn ⊂ Xsuch that

X =

n⋃

m=0

Vxm.

Define g asg(z) = maxgx1(z), . . . , gxn

(z).By the corollary of the first part of the proof, g ∈ A. By construction,

g > f . Sincegx < f + ε for every x ∈ X, g < f + ε.

So, the Stone-Weierstrass theorem is proved.

2.4.3 Positive linear operators

Another method that can be used to prove density is based on what is calledthe Popoviciu-Bohman-Korovkin Theorem. A primitive form of this theorem


was proved by Bohman, in 1952. His proof, and the main idea of his approach,was a generalization of Bernstein’s proof of Weierstrass theorem.

One year later, Korovkin proved the same theorem for integral type ope-rators. Korovkin’s original proof is in fact based on positive singular integralsand there are very obvious links to Lebesgue’s work on singular operator that,in turn, was motivated by various of the proofs of Weierstrass theorem. In1950, Popoviciu gave also a proof of Weierstrass’ theorem using interpolationpolynomials.

2.4.3.1 Popoviciu-Bohman-Korovkin Theorem

Let (Ln) be a sequence of positive linear operators mapping C[a, b] into itself.Assume that

limn→∞

Ln(xi) = xi, i = 0, 1, 2

and the convergence is uniform on [a, b]. Then

limn→∞

(Lnf)(x) = f(x),

uniformly on [a, b], for every f ∈ C[a, b].

Remark 2.4.17. A similar result holds in the periodic case C[0, 2π], where”test functions” are 1, sin x and cosx.

How can the Popoviciu-Bohman-Korovkin Theorem be applied to obtaindensity results? In theory it can be easily applied. If Un = spanu1, . . . , un,n = 1, 2, . . . is a nested sequence of finite-dimensional subspaces of C[a, b],and Ln is a positive linear operator mapping C[a, b] into Un that satisfiesthe condition of the above theorem, then the (uk)

∞k=1 span a dense subset of

C[a, b]. In practice it is all too rarely applied in this manner.

Remark 2.4.18. The importance of Popoviciu-Bohman-Korovkin Theoremis primarily in that it presents conditions implying convergence, and also inthat it provides calculable error bounds on the rate of approximation.

Remark 2.4.19. One immediate application of the theorem is a proof of theconvergence of the Bernstein polynomials Bn(f) to f for each f ∈ C[0, 1].We may consider the (Bm) as a sequence of positive linear operators mappingC[0, 1] into Pn, the space of algebraic polynomials of degree at most n. It is


verified that

Bn(1; x) = 1,

Bn(x; x) = x,

Bn(x2; x) = x2 +

x(1− x)n

for all n ≥ 2.

Thus, by Popoviciu-Bohman-Korovkin Theorem it follows that Bnf convergeuniformly to f on [0, 1].

2.4.3.2 Modulus of continuity

The modulus of continuity is one of the basic characteristics of continuousfunctions. For a continuous function f on a closed interval, it is defined as

ω(f, δ) = max|h|≤δ

maxx|f(x+ h)− f(x)|, for δ ≥ 0.

This definition was introduced by N. Lebesgue in 1910, although in essencethe concept was known earlier. If the modulus of continuity of a function fsatisfies the condition

ω(f, δ) ≤Mδα,

where 0 < α < 1, then f is said to satisfy a Lipschitz condition of order α.

Remark 2.4.20. The modulus of continuity can be defined also in the fol-lowing manner:

ω(f, δ) = sup|f(u)− f(v)|, u, v ∈ I, |u− v| ≤ δ,

for f ∈ C(I), when I is an interval.

Remark 2.4.21. In what follows, we show some applications of the modulusof continuity in the study of the rapidity of convergence for a sequence oflinear operators, Ln, to a function f , according to the Popoviciu-Bohman-Korovkin Theorem.

2.4.3.3 Popoviciu-Bohman-Korovkin Theorem in a quantitativeform

We shall estimate the rapidity of convergence of Ln(f) to f in terms of therapidity of convergence of Ln(1) to 1, Ln(x) to x and Ln(x

2) to x2.


Theorem 2.4.22. Let −∞ < a < b < ∞, and let L1, L2, . . . be linear posi-tive operators, all having the same domain D which contains the restrictionsof 1, t, t2 to [a, b]. For n = 1, 2, . . . suppose Ln(1) is bounded. Let f ∈ D becontinuous in [a, b], with modulus of continuity ω. Then, for n = 1, 2, . . .

‖f − Ln(f)‖ ≤ ‖f‖ · ‖L(1)− 1‖+ ‖Ln(1) + 1‖ω(µn), (2.4.8)

where

µn = ‖(Ln([t− x]2))(x)‖12 , (2.4.9)

and ‖ · ‖ stands for the sup norm over [a, b].In particular, if Ln(1) = 1, as is often the case, (2.4.8) reduces to

‖f − Ln(f)‖ ≤ 2ω(µn). (2.4.10)

Remark 2.4.23. In forming Ln([t − x]2), the x is held fixed, while t formsthe functions t, t2 on which Ln operates.

Remark 2.4.24. Observe that (2.4.9) implies, for n = 1, 2, . . .

µ2n ≤ ‖Ln(t2)(x)− x2‖+ 2c‖Ln(t)(x)− x‖+ c2‖Ln(1)− 1‖,

where c = max(|a|, |b|). Hence, if Ln(tk)(x) converges uniformly to xk in

[a, b], for k = 0, 1, 2, then µn → 0 and we have a simple estimate of µn interms of ‖Ln(tk)(x)− xk‖, k = 0, 1, 2.

Proof. Let x ∈ [a, b] and let δ be a positive number. Let t ∈ [a, b]. If|t− x| > δ then

|f(t)− f(x)| ≤ ω(|t− x|) = ω(|t− x|δ−1δ)

≤ (1 + |t− x|δ−1)ω(δ) ≤ [1 + (t− x)2δ−2]ω(δ).

The inequality

|f(t)− f(x)| ≤ [1 + (t− x)2δ−2]ω(δ)

holds also if |t− x| ≤ δ. Let n be a positive integer. Then

|(Lnf − f(x)Ln(1))(x)| ≤ ω(δ)[(Ln(1) + δ−2Ln([t− x]2))(x)]≤ ω(δ)[Ln(1)(x) + (µn/δ)

2].


If µn > 0, take δ = µn. Then

|[Ln(f)− f(x)Ln(1)](x)| ≤ ω(µn)‖Ln(1) + 1‖, (2.4.11)

| − f(x) + f(x)Ln(1)(x)| ≤ ‖f‖ · ‖Ln(1)− 1‖.

Adding, we obtain (2.4.8). If µn = 0, we have for every positive δ,

|(Ln(f)− f(x)Ln(1))(x)| ≤ ω(δ)Ln(1)(x).

Making δ → 0+0, we obtain (Lnf)(x) = f(x)Ln(1)(x). Then, by (2.4.11),

|[f − Ln(f)](x)| ≤ ‖f‖ · ‖Ln(1)− 1‖,

which implies (2.4.8).

Example 2.4.25. Let D be the set of all real functions with domain [0, 1].For n = 1, 2, . . . let Ln be the linear positive operator defined by

(Lnφ)(x) =n∑

k=0

(n

k

)xk(1− x)n−kφ

(k

n

).

Let f be a real function with domain [0, 1], continuous there, with mod-ulus of continuity ω. Let n be a positive integer. Then

Ln(1) = 1, [Ln(t)](x) = x, [Ln(t2)](x) =

(n− 1)x2

n− x

n,

(Ln([t− x]2))(x) =x− x2

n

and by Theorem 2.4.22,

max0≤x≤1

|f(x)− (Lnf)(x)| ≤ 2ω

(1

2√n

)≤ 2ω√

n. (2.4.12)

We have thus obtained the known result of Bernstein polynomials. Forsome universal constant C and for n = 1, 2, . . . , the left-hand-side of (2.4.12)

is bounded above byCω√n

.

Remark 2.4.26. For more details about this constant C, see the bibliogra-phy.


Theorem 2.4.27. Let −∞ < a < b < ∞ and let L1, L2, . . . be linear pos-itive operators, all having the same domain D which contains the boundedfunctions f0, f1, f2 with domain [a, b]. We assume that the restriction of 1to [a, b] belongs to D, and that, for n = 1, 2, . . . , Ln(1) is bounded. Leta0(x), a1(x), a2(x) be real functions, defined and bounded in [a, b], and as-sume that for every t, x ∈ [a, b],

F (t, x) =2∑

k=0

ak(x)fk(t) ≥ K(t− x)2, F (x, x) = 0, (2.4.13)

where K is a positive constant depending on neither x nor t. Let f ∈ D becontinuous in [a, b], with modulus of continuity ω. Then, for n = 1, 2, . . . wehave again

‖f − Lnf‖ ≤ ‖f‖ · ‖Ln(1)− 1‖+ ‖Ln(1) + 1‖ω(µn), (2.4.14)

whereµn = (‖LnF (t, x)(x)‖/K)

12 ,

and ‖ · ‖ stands for the sup norm over [a, b]. In particular, if Ln(1) = 1, asis often the case, (2.4.14) reduces to

‖f − Lnf‖ ≤ 2ω(µn).

Remark 2.4.28. Since

‖(LnF (t, x))(x)‖ ≤2∑

k=0

‖ak(x)‖ · ‖fk − Lnfk‖ (n = 1, 2, . . . ),

if Lnfk converges uniformly to fk in [a, b], for k = 0, 1, 2, then µn → 0, andone can estimate the rapidity of convergence of µn in terms of those of Lnf0,Lnf1 and Lnf2.

Proof. Let x ∈ [a, b] and let δ > 0. Let t ∈ [a, b]. If |t− x| > δ, then

|f(t)− f(x)| ≤ ω(|t− x|)= ω(|t− x|δ−1δ)

≤ (1 + |t− x|δ−1)ω(δ)

≤ [1 + (t− x)2δ−2]ω(δ)

≤ [1 +K−1F (t, x)δ−2]ω(δ).


The inequality

|f(t)− f(x)| ≤ [1 +K−1F (t, x)δ−2]ω(δ)

holds, obviously, also if |t− x| ≤ δ. Let n be a positive integer. Then

|(Lnf − f(x)Ln(1))(x)| ≤ ω(δ)[(Ln(1) + δ−2K−1LnF (t, x))(x)]

≤ ω(δ)[(Ln(1))(x) + (µn/δ)2].

We proceed, then, as in the proof of Theorem 2.4.22.

Remark 2.4.29. For f0(t) = 1, f1(t) = t and f2(t) = t2 we observe thatTheorem 2.4.22 is a special case of Theorem 2.4.27.

Chapter 3

Numerical integration

Definition 3.0.30. Formula

I(f) = Q(f) +R(f), (3.0.1)

where

Q(f) =

m∑

k=0

Akλk(f),

is called a numerical integration formula (of function f) or a quadrature for-mula, Ak, k = 0, 1, ..., m are called the coefficients of the quadrature formulaand R(f) is the remainder term.

Definition 3.0.31. The natural number r such that R(f) = 0, f ∈ Pr andfor which there exists g ∈ Pr+1 such that R(g) 6= 0, is called the degree ofexactness of the quadrature Q, i.e.,

dex(Q) = r.

Remark 3.0.32. The linearity of the remainder operator R implies thatdex(Q) = r if and only if

R(ei) = 0, i = 0, 1, ..., r

andR(er+1) 6= 0,

with ei(x) = xi.

145

146 Chapter 3. Numerical integration

Remark 3.0.33. Usually, the information λk(f), k = 0, 1, ...m, are the valuesof f or of certain of its derivatives at some points xi ∈ [a, b], i = 0, 1, ...n,called the quadrature nodes.

So, let f ∈ Hr,2[a, b], and

ΛB =λkj : Hr,2[a, b]→ R

∣∣λkj(f) = f (j)(xk), k = 0, 1, ..., m; j ∈ Ik,

with Ik ⊆ 0, 1, ..., rk, rk ∈ N, rk < r, k = 0, 1, ..., m, be the set of Birkhoff-type functionals (which is the most general punctual evaluation of f andsome of its derivatives), i.e.,

ΛB(f) = λkj(f) | k = 0, 1, ..., m; j ∈ Ik.In this case the quadrature formula is:

I(f) = Qn(f) +Rn(f), (3.0.2)

where

Qn(f) =m∑

k=0

∑

j∈Ik

Akjf(j)(xk),

with n = |I0 |+...+| Im| − 1. The problem which appears is to find the co-efficients Akj and the nodes xk, k = 0, 1, ..., m; j ∈ Ik, and to study theremainder term (the approximation error) for the obtained values of coeffi-cients and nodes. Relatively to the conditions used to find the quadratureparameters Akj and xk, k = 0, 1, ..., m; j ∈ Ik, the quadrature formulas canbe classified as follows:

- quadrature formulas of interpolatory type, that are obtained integratingan interpolation formula;

- quadrature of Gauss type or with the maximum degree of exactness;- quadrature of Chebyshev type or with equal coefficients and of the

maximum degree of exactness;- optimal quadrature formulas.Next, we are dealing with the last class.

3.1 Optimal quadrature formulas

Definition 3.1.1. For a given f ∈ Hr,2[a, b], the quadrature formula (3.0.2)for which the error |Rn(f)| takes the minimum value with regard to the pa-rameters Akj and xk, k = 0, 1, ..., m; j ∈ Ik, is called the local optimalquadrature formula with regard to the error.

3.1. Optimal quadrature formulas 147

Definition 3.1.2. A quadrature formula for which

supf∈Hr,2[a,b]

|Rn(f)|

takes the minimum value with regard to its parameters Akj, xk; k = 0, 1, ..., m;j ∈ Ik, is called a global optimal formula with regard to the error.

Remark 3.1.3. The parameters A∗kj and x∗k, k = 0, 1, ..., m of an optimal

quadrature formula are called the (local or global) optimal coefficients, re-spectively, the optimal nodes.

Remark 3.1.4. Some of the parameters of a quadrature formula can befixed for the beginning. Such formulas are the quadrature formulas withequal coefficients or with fixed nodes, (usually equally spaced).

Very often, there are studied the quadrature formulas with a given degreeof exactness, say r. In such a case the parameters of a quadrature formulasmust satisfy the following relations:

m∑

k=0

∑

j∈Ik

Akjxik =

∫ b

a

w(x)xidx, i = 0, 1, ..., r. (3.1.1)

If r = n + 1 then (3.1.1) becomes an algebraic (n + 1) × (n + 1) system.If this system has a solution then all the parameters are determinated, (notfree parameters remained), and the optimality problem is superfluous. Suchexamples are the quadrature formulas of Gauss type or of Chebyshev type.

3.1.1 Optimality in the sense of Sard

This problem will be treated first for some classes of linear functionals, in-cluding numerical integration of functions.

Let λ : Hm,2 [a, b] → R be a linear functional that commutes with thedefined integral, i.e.,

λ

∫ b

a

=

∫ b

a

λ,

andΛ =

λi | λi : Hm,2 [a, b]→ R, i = 1, ..., n

be a set of given linear functionals such that λ, λ1, ..., λn are linearly inde-pendent. For f ∈ Hm,2 [a, b] , one considers the approximation formula:


λ (f) =

n∑

i=1

Aiλi (f) +R (f) . (3.1.2)

Definition 3.1.5. Formula (3.1.2) is called optimal in the sense of Sard if:

1. R (ej) = 0, j = 0, 1, ..., m− 1,

2.∫ baK2 (t) dt is minimum,

(3.1.3)

where ej (x) = xj and K is the Peano’s kernel of the remainder functionalRn, i.e.,

K (t) = R[

(·−t)m−1+

(m−1)!

].

The coefficients of the optimal formula, denoted A∗i , i = 1, ..., n, are called

optimal coefficients.

Remark 3.1.6. From 1. of (3.1.3) it follows that formula (3.1.2) has thedegree of exactness at least m− 1.

Remark 3.1.7. If n = m (n ≥ m) the parameters Ai, i = 1, ..., n can bedetermined from the conditions 1. of (3.1.3), which become a m ×m linearalgebraic system. Conditions 2. are superfluous.

First, one supposes that Λ is a set of Lagrange-type functionals:

Λ := ΛL = λi | λi (f) = f (xi) , i = 1, ..., n ,

with xi ∈ [a, b] , xi 6= xj for i 6= j, i, j = 1, ..., n.Formula (3.1.2) becomes

λ (f) =

n∑

i=1

Aif (xi) +R (f) , (3.1.4)

and we have

K (t) = λx[

(x−t)m−1+

(m−1)!

]−

n∑

i=1

Ai(xi−t)m−1

+

(m−1)!. (3.1.5)

Theorem 3.1.8. If n > m then the formula of the form (3.1.4), that isoptimal in the sense of Sard, exists and is unique.


Proof. Using the Lagrange’s multiplier method, we must find the minimumof the functional

F (A, γ) = 12

∫ b

a

K2 (t) dt+m−1∑

j=0

γjR (ej) ,

where A = (A1, ..., An) ∈ Rn and γ = (γ0, ..., γm−1) ∈ Rm. So, we have tostudy the linear algebraic system:

∂F (A,γ)∂Ap

= 0, p = 1, ..., n,∂F (A,γ)∂γq

= 0, q = 0, 1, ..., m− 1,

or

∫ b

a

−λx

[(x−t)m−1

+

(m−1)!

]+

n∑i=1

Ai(xi−t)m−1

+

(m−1)!

(xp−t)m−1

+

(m−1)!dt+

m−1∑j=0

γjxjp = 0,

p = 1, ..., nR (eq) = 0, q = 0, 1, ..., m− 1.

Using the identities:

(x−t)m−1+

(m−1)!= (x−t)m−1

(m−1)!+ (−1)m

(t−x)m−1+

(m−1)!,

(xi−t)m−1+

(m−1)!= (xi−t)m−1

(m−1)!+ (−1)m

(t−xi)m−1+

(m−1)!,

and the relation

−λx[

(x−t)m−1

(m−1)!

]+

n∑

i=1

Ai(xi−t)m−1

(m−1)!= 0,

based on the Remark 3.1.6, one obtains

(−1)mb∫

a

−λx

[(t−x)m−1

+

(m−1)!

]+

n∑i=1

Ai(t−xi)

m−1+

(m−1)!

(xp−t)m−1

+

(m−1)!dt+

m−1∑j=0

γjxjp = 0,

p = 1, ..., nn∑i=1

Aixqi = λ (eq) , q = 0, 1, ..., m− 1.


Taking into account the commuting property of λ and∫ ba, the above system

can be written in the form:

m−1∑j=0

Bjxjp +

n∑i=1

Ai(xp−xi)

2m−1+

(2m−1)!= λx

[(xp−x)2m−1

+

(2m−1)!

], p = 1, ..., n

n∑i=1

Aixqi = λ (eq) , q = 0, 1, ..., m− 1,

(3.1.6)

whereBj = (−1)m+1 γj, j = 0, 1, ..., m− 1

and

(xp−xi)2m−1+

(2m−1)!=

∫ b

a

(t−xi)m−1+

(m−1)!

(xp−t)m−1+

(m−1)!dt,

(xp−x)2m−1+

(2m−1)!=

∫ b

a

(t−x)m−1+

(m−1)!

(xp−t)m−1+

(m−1)!dt.

On the other hand, for f ∈ Hm,2 [a, b] the coefficients of the interpolationspline function of Lagrange type,

(Sf) (x) =

m−1∑

j=0

ajxj +

n∑

i=1

bi(x−xi)

2m−1+

(2m−1)!,

are obtained as a solution of the system

(Sf) (xp) :=m−1∑j=0

ajxjp +

n∑i=1

bi(xp−xi)

2m−1+

(2m−1)!= f (xp) , p = 1, ..., n

(Sf)(q) (α) :=n∑i=1

bi(α−xi)

2m−q−1+

(2m−q−1)!= 0, q = m, ..., 2m− 1; α > xn.

(3.1.7)As the last m equations from (3.1.7) can be written in the equivalent form

n∑

i=1

bixqi = 0, q = 0, 1, ..., m− 1,

the two systems, (3.1.6) and (3.1.7), are equivalent. Hence, the uniqueness ofthe spline function Sf, implies that the system (3.1.6) has a unique solution,say (A∗, γ∗) , i.e., A∗ = (A∗

1, ..., A∗n) is the unique minimum point of the

integral from (3.1.3).


Theorem 3.1.9. For f ∈ Hm,2 [a, b] , if

f = Sf +Rf

is the spline interpolation formula corresponding to Λ := ΛL, with n > m,then the approximation formula of the form (3.1.4), that is optimal in senseof Sard, is

λ (f) = λ (Sf) + λ (Rf) . (3.1.8)

Proof. As n := |ΛL| > m formula (3.1.8) exists, is unique and we have

f (x) =

n∑

i=1

si (x) f (xi) +

∫ b

a

ϕ (x, t) f (m) (t) dt,

where

ϕ (x, t) =(x−t)m−1

+

(m−1)!−

n∑

i=1

si (x)(xi−t)m−1

+

(m−1)!.

It follows that

λ (f) =n∑

i=1

A∗i f (xi) +R∗ (f) , (3.1.9)

withA∗i = λ (si) , i = 1, ..., n

and

R∗ (f) =

∫ b

a

K∗ (t) f (m) (t) dt, (3.1.10)

where

K∗ (t) = λx[

(x−t)m−1+

(m−1)!

]−

n∑

i=1

A∗i

(xi−t)m−1+

(m−1)!.

Formula (3.1.9) is optimal in the sense of Sard if there are satisfied conditions(3.1.3). The first condition follows directly from (3.1.10), i.e.,

R∗ (ej) = 0, j = 0, 1, ...m− 1.

To prove the second condition, let us suppose that the formula (3.1.9) is notoptimal. Let

λ (f) =

n∑

i=1

Aif (xi) + R (f) (3.1.11)


be the optimal formula. For the remainder term, we have (condition 1. from(3.1.3)):

R (f) =

∫ b

a

K (t) f (m) (t) dt,

with

K (t) = λx[

(x−t)m−1+

(m−1)!

]−

n∑

i=1

Ai(xi−t)m−1

+

(m−1)!.

One considersg = K −K∗,

i.e.,

g (t) =n∑

i=1

[Ai − A∗

i

] (xi−t)m−1+

(m−1)!. (3.1.12)

We haveg (t) = 0, for t ∈ [a, x1) ∪ (xn, b]. (3.1.13)

For t < x1, we have

g (t) =

n∑

i=1

(Ai − A∗i )

(xi−t)m−1

(m−1)!.

As the degree of exactness of both formulas (3.1.9) and (3.1.11) is (m− 1) ,it follows that g (t) = 0, for t < x1.

For t > xn we have (xi − t)+ = 0, i = 1, ..., n, and we also get g (t) = 0.

Now, let us define the function s such that s(m) = g. From (3.1.12) and(3.1.13) it follows that

s|(xi,xi+1)

∈ P2m−1, i = 1, ..., n− 1,

s|[a,x1)∪(xn,b]∈ Pm−1,

ands ∈ C2m−2 [a, b] .

Whence, we have s ∈ S (Λ) . Since

Sf = f, f ∈ S (Λ) ,

we have

R∗ (s) :=

∫ b

a

K∗ (t) s(m) (t) dt = 0,


i.e., ∫ b

a

K∗ (t) g (t) dt = 0

or ∫ b

a

K∗ (t)[K (t)−K∗ (t)

]dt = 0.

It follows that∫ b

a

(K (t)

)2dt =

∫ b

a

[K (t)−K∗ (t) +K∗ (t)

]2dt

≥∫ b

a

(K∗ (t))2 dt+

∫ b

a

[K (t)−K∗ (t)

]2dt,

whence, ∫ b

a

(K (t)

)2dt ≥

∫ b

a

(K∗ (t))2 dt.

As K is the kernel of the optimal formula it must be K = K∗, i.e., g = 0 orAi = A∗

i , i = 1, ..., n.

Remark 3.1.10. Analogous results can be proved when Λ is a set of Hermiteor Birkhoff-type functionals for which the corresponding spline interpolationformula exists.

Suppose now that Λ is a Hermite-type set of functionals on Hm,2 [a, b] ,

Λ := ΛH =λij | λij = f (j) (xi) , i = 1, ..., r; j = 0, ..., νi

,

with 0 ≤ νi < m and n = r + ν1 + ... + νr.

Remark 3.1.11. For f ∈ Hm,2 [a, b] formula (3.1.2) becomes

λ (f) =

r∑

i=1

νi∑

j=0

Aijf(j) (xi) +R (f) , (3.1.14)

where

R (f) =

∫ b

a

K (t) f (n) (t) dt,

with

K (t) = λx[

(x−t)m−1+

(m−1)!

]−

r∑

i=1

νi∑

j=0

Aij(xi−t)m−j−1

+

(m−j−1)!.


Theorem 3.1.12. If n > m formula (3.1.14), that is optimal in sense ofSard, exists and is unique.

Proof. As in the previous case, one considers the functional FH (A, γ) givenby

FH (A, γ) = 12

∫ b

a

K2 (t) dt+m−1∑k=0

γkR(ek

k!

),

where A = (A10, ..., A1ν1 , ..., Ar0, ..., Arνr) and γ = (γ0, ..., γm−1) . The para-

meters Aij, i = 1, ..., r; j = 0, 1, ..., νi and γk, k = 0, 1, ..., m−1 are determinedas a solution of the linear algebraic system:

∂F (A,γ)∂Apq

= 0, p = 1, ..., r; q = 0, 1, ..., νp,∂F (A,γ)∂γµ

= 0, µ = 0, 1, ..., m− 1.(3.1.15)

Let SH be the spline interpolation operator, corresponding to the set ΛH,i.e.,

(SHf) (x) =

m−1∑

k=0

akxk +

r∑

i=1

νi∑

j=0

bij(x−xi)

2m−j−1+

(2m−j−1)!,

where the coefficients ak, k = 0, 1, ..., m−1 and bij , i = 1, ..., r; j = 0, 1, ..., νi

are obtained as a solution of the following linear algebraic system:

(SHf)(j) (xk) = f (j) (xk) , k = 1, ..., r; j = 0, 1, ..., νk(SHf)(p) (α) = 0, p = m, ..., 2m− 1, α > r.

(3.1.16)

After some transformations (as in the previous case) we can see that thesystems (3.1.15) and (3.1.16) are equivalent. But, for n > m the splinefunction SHf exists and is unique, hence the system (3.1.16) has a uniquesolution. It means that the system (3.1.15) has also a unique solution andthe proof follows.

Remark 3.1.13. A theorem analogous to Theorem 3.1.12 can be also provedwhen Λ is a set of functionals of Birkhoff-type in the additional condition thatΛ contains a subset of at least m Hermite-type functionals (condition underwhich the corresponding spline interpolation function exists and is unique).

In particular, let λ be the defined integral functional:

λ (f) := I (f) =

∫ b

a

f (x) dx, for f ∈ Hm,2 [a, b] .


For

Λ =λi | λi : Hm,2 [a, b]→ R, i = 1, ..., n

,

one considers the quadrature formula

I (f) = Qn (f) +Rn (f) , (3.1.17)

where

Qn (f) =n∑

i=1

Aiλi (f)

and Rn (f) is the remainder term.

The quadrature formula (3.1.17) is optimal in the sense of Sard if

Rn (ej) = 0, j = 0, 1, ..., m− 1

andb∫

a

K2 (t) dx is minimum,

with

K (t) := Rn

[(·−t)m−1

+

(m−1)!

].

Now, if Λ is a Birkhoff-type set of functionals that contains a subset ofat least m functionals of Hermite-type then the corresponding spline inter-polation formula,

f = Sf +Rf,

exists and is unique.

By Theorem 3.1.9, it follows that the formula

∫ b

a

f (x) dx =

∫ b

a

(Sf) (x) dx+

∫ b

a

(Rf) (x) dx

is the optimal quadrature formula in sense of Sard corresponding to the setΛ.

Next, there will be given some examples of such optimal quadrature for-mulas for Lagrange-type, Hermite-type and Birkhoff-type sets of functionals.


Example 3.1.14. Let f ∈ C [0, 1] and

ΛL(f) =f(in

)| i = 0, ..., n

.

We construct the optimal quadrature formula in sense of Sard, using linearsplines (m = 1).

We havef = S1f +R1f,

where

(S1f) (x) =n∑

i=0

si (x) f(in

)

and

(R1f) (x) =

∫ 1

0

ϕ1 (x, t) f ′ (t) dt,

with

ϕ1 (x, t) = (x− t)0+ −

n∑

i=0

si (x)(in− t)0+.

Next, we determine the functions si, i = 0, 1, ..., n. We have that

si (x) = ai0 +n∑

j=0

bij(x− j

n

)+, i = 0, 1, ...., n,

with the coefficients ai0, bi0, ..., b

in, determined, for any i = 0, 1, ..., n, from the

conditions: si(kn

)= δik, k = 0, 1, ..., n

s′i (α) = 0, α > 1, (we take α = 2).

One obtains

s0 (x) = 1− nx+ n(x− 1

n

)+

s1 (x) = nx− 2n(x− 1

n

)+

+ n(x− 2

n

)+

s2 (x) = n(x− 1

n

)+− 2n

(x− 2

n

)+

+ n(x− 3

n

)+

. . .

sn−1 (x) = n(x− n−2

n

)+− 2n

(x− n−1

n

)+

+ n (x− 1)+

sn (x) = n(x− n−1

n

)+− n (x− 1)+ .


It follows that

∫ 1

0

f (x) dx =n∑

i=0

A∗i f(in

)+R∗ (f) ,

where

A∗i =

∫ 1

0

si (x) dx, i = 0, ..., n,

i.e.,

A∗0 = 1

2n

A∗1 = 1

n

...

A∗i = 1

n

...

A∗n−1 = 1

n

A∗n = 1

2n,

and

R∗ (f) =

∫ 1

0

K1 (t) f ′ (t) dt,

with

K1 (t) = 1− t−n∑

i=0

A∗i

(in− t)0+.

Hence, the optimal quadrature formula is

∫ 1

0

f (x) dx = 1n

[12f (0) + f

(1n

)+ ...+ f

(n−1n

)+ 1

2f (n)

]+R∗ (f) .

For the remainder term we have

|R∗ (f)| ≤ ‖f ′‖2∫ 1

0

K21(t)dt = ‖f ′‖2

n∑

i=1

∫ in

i−1n

K21,i(t)dt ≤ 1

2n√

3‖f ′‖2 ,

where K1,i = K1|[ i−1n, in ].


Example 3.1.15. Let f ∈ C2 [0, 1] and

ΛH(f) =f (0) , f ′ (0) , f

(12

), f (1) , f ′ (1)

be given. We construct the optimal quadrature formula (in the sense ofSard). To this end, one considers the cubic spline interpolation formula:

f = S3f +R3f,

where

(S3f) (x) = s00 (x) f (0)+s01 (x) f ′ (0)+s10 (x) f(

12

)+s20 (x) f (1)+s21 (x) f ′ (1)

and the fundamental interpolation splines have the expressions

skj (x) = akj0 +akj1 x+bkj00x3+bkj01x

2+bkj10(x− 1

2

)3+

+bkj20 (x− 1)3++bkj21 (x− 1)2

+ ,

for (k, j) ∈ (0, 0) ; (0, 1) ; (1, 0) ; (2, 0) ; (2, 1) . For (k, j) = (0, 0) we have thesystem:

s00 (0) := a000 = 1

s′00 (0) := a001 = 0

s00

(12

):= a00

0 +1

2a00

1 +1

8b0000 + 1

4b0001 = 0

s00 (1) := a000 + a00

1 + b0000 + b0001 + 18b0010 = 0

s′00 (1) := a001 + 3b0000 + 2b0001 +

3

4b0010 = 0

s′′00 (2) := 12b0000 + 2b0001 + 9b0010 + 6b0020 + 2b0021 = 0s′′′00 (2) := b0000 + b0010 + b0020 = 0.

Solving this system, one obtains

s00 (x) = 1 + 10x3 − 9x2 − 16(x− 1

2

)3+.

The systems that generate the coefficients of the other fundamental in-terpolation spline functions have the same matrix, free terms being changedsuccessively in (0, 1, 0, 0, 0, 0, 0) , (0, 0, 1, 0, 0, 0, 0) , (0, 0, 0, 1, 0, 0, 0) ,(0, 0, 0, 0, 1, 0, 0).

So, we have

s01 (x) = x+ 3x3 − 72x− 4

(x− 1

2

)3+,

s10 (x) = −16x3 + 12x2 + 32(x− 1

2

)3+,

s20 (x) = 6x3 − 3x2 − 16(x− 1

2

)3+,

s21 (x) = −x3 + 12x2 + 4

(x− 1

2

)3+.


For the remainder term, we have:

(R3f) (x) =

∫ 1

0

K2 (x, t) f ′′ (t) dt,

where

K2 (x, t) = (x− t)+ − s10 (x)(

12− t)+− s20 (x) (1− t)− s21 (x) .

It follows that∫ 1

0

f (x) dx = A∗00f (0)+A∗

01f′ (0)+A∗

10f(

12

)+A∗

20f (1)+A∗21f

′ (1)+R∗ (f) ,

with

A∗00 =

∫ 1

0

s00 (x) dx = 14,

A∗01 =

∫ 1

0

s01 (x) dx = 148,

A∗10 =

∫ 1

0

s10 (x) dx = 12,

A∗20 =

∫ 1

0

s20 (x) dx = 14,

A∗21 =

∫ 1

0

s21 (x) dx = − 148,

and

R∗ (f) =

∫ 1

0

K2 (t) f ′′ (t) dt,

whereK2 (t) = (1−t)2

2−A∗

10

(12− t)+−A∗

20 (1− t)− A∗21.

For the remainder we have

|R∗ (f)| ≤(∫ 1

0

K22 (t) dt

)1/2

‖f ′′‖2 .

But, ∫ 1

0

K22 (t) dt =

∫ 1/2

0

K22 (t) dt+

∫ 1

1/2

K22 (t) dt.


We obtain∫ 1

0

f (x) dx =[f (0) + 1

48f ′ (0) + 1

2f(

12

)+ f (1)− 1

48f ′ (1)

]+R∗ (f) ,

with|R∗ (f)| ≤ 1

48√

5‖f ′′‖2 .

Example 3.1.16. Let f ∈ C2 [0, 1] and

ΛB(f) =f ′ (0) , f

(12

), f ′ (1)

be given. We need to construct the optimal quadrature formula (in the senseof Sard). Similarly to the Hermite case, we consider the cubic spline formula

(S3f) (x) = s01 (x) f ′ (0) + s10 (x) f(

12

)+ s21 (x) f ′ (1) ,

and we determine the functions s01, s10 and s21.The coefficients of this functions are solutions of a 5× 5 system. For s01

we have

s′01 (0) := a011 = 1

s01

(12

):= a01

0 + 12a01

1 + 14b0101 = 0

s′01 (1) := a011 + 2b0101 + 3

4b0110 = 0

s′′01 (2) := 12b0101 + 9b0110 + 2b0121 = 0s′′′01 (2) := 6b0110 = 0.

It follows that

s01 (x) = −38

+ x− 12x2 + 1

2(x− 1)2

+ .

In the same way we obtain the other functions:

s10 (x) = 1,

s21 (x) = −18

+ 12x2 − 1

2(x− 1)2+ .

For the remainder term, we have:

(R3f) (x) =

∫ 1

0

K2 (x, t) f ′′ (x, t) dt,

whereK2 (x, t) = (x− t)+ −

(12− t)+− s21 (x) .


One obtains the optimal formula:

∫ 1

0

f (x) dx = A∗01f

′ (0) + A∗10f(

12

)+ A∗

21f′ (1) +R∗ (f) ,

where

A∗01 =

∫ 1

0

s01 (x) dx = − 124,

A∗10 =

∫ 1

0

s10 (x) dx = 1,

A∗21 =

∫ 1

0

s21 (x) dx = 124,

and

R∗ (f) =

∫ 1

0

K∗2 (t) f ′′ (t) dt,

with

K∗2 (t) =

∫ 1

0

K2 (x, t) dx = (1−t)22−(

12− t)+− 1

24.

Finally, we get

∫ 1

0

f (x) dx = − 124f ′ (0) + f

(12

)+ 1

24f ′ (1) +R∗ (f) ,

with|R∗ (f)| ≤ 1

12√

5‖f ′′‖2 .

3.1.2 Optimality in the sense of Nikolski

Consider a quadrature formula of the form given in (3.0.2). One suppose thatall the parameters of the quadrature formula are free. So, we have to find thecoefficients and the nodes of the quadrature formula such that |Rn(f)| takesthe minimum value. Sometimes there can be supposed that the quadratureformula has a given degree of exactness, i.e., some relations as in (3.1.1) aresatisfied. Here there are used two ways to solve such an optimality problem.

A first way is based on the relationship between the optimality in thesense of Sard and the spline interpolation. For the beginning, suppose thatthe quadrature nodes are fixed and find the coefficients such that |Rn(f)| is


minimum. But, this is a problem of optimality in the sense of Sard and itcan be solved using the suitable spline interpolation formula. So, if

f = Srf +Rrf, (3.1.18)

is the spline interpolation formula, with

(Srf)(x) =

m∑

k=0

∑

j∈Ik

skj(x)f(j)(xk),

and

(Rrf)(x) =

∫ b

a

ϕr(x, t)f(r)(t)dt,

then it follows that∫ b

a

f(x)dx =m∑

k=0

∑

j∈Ik

Akjf(j)(xk) +Rn(f), (3.1.19)

with

Akj =

∫ b

a

skj(x)dx

and

Rn(f) =

∫ b

a

Kr(t)f(r)(t)dt, (3.1.20)

where

Kr(t) =

∫ b

a

ϕr(x, t)dx = (b−t)r

r!−

m∑

k=0

∑

j∈Ik

Akj(xk−t)2r−j−1

+

(2r−j−1)!

is the optimal in the sense of Sard quadrature formula.As Akj are functions of variables xk, k = 0, 1, ..., m, i.e.,

Akj = Fkj(x0, ...xm), (3.1.21)

it follows that Rn(f) is a function of the nodes,

Rn(f) = F (x0, ..., xm). (3.1.22)

Now, we have to minimize the function F, with respect to the variablesx0, ..., xm. Let (x∗0, ..., x

∗m) be the minimum point of F. It follows that x∗0, ..., x

∗m

are the optimal nodes, in the sense of Nikolski.


Then, from (3.1.21) and (3.1.22), one obtains the optimal coefficients:

A∗kj = Fkj(x

∗0, ..., x

∗m), k = 0, 1, ..., m; j ∈ Ik, (3.1.23)

respectively, the optimal remainder term:

R∗n(f) = F (x∗0, ..., x

∗m). (3.1.24)

Thus, the quadrature formula

∫ b

a

f(x)dx =

m∑

k=0

∑

j∈Ik

A∗kjf

(j)(x∗k) +R∗n(f)

is optimal in the sense of Nikolski.

Conclusion 3.1.17. The optimal quadrature in the sense of Nikolski, can beobtained in two steps:

- it is constructed the optimal quadrature in the sense of Sard (3.1.19), forfixed nodes, using for example the relationship with the spline interpolation.

- it is minimized the error functional (3.1.20), that depends only of thenodes, with respect to these parameters. There are obtained the optimal nodes,x∗k, k = 0, 1, ..., m. Then (3.1.23) and (3.1.24) give the optimal coefficientsA∗kj, k = 0, 1, ..., m, ; j ∈ Ik, respectively, the optimal remainder term, R∗

n(f).

A second way is given by the ϕ-function method and it is based on theminimal norm properties of orthogonal polynomials.

Let f ∈ Cr[a, b] and a = x0 < x1 < ... < xn = b. The ϕ-functionmethod consists in associating to each interval [xk−1, xk] of a function ϕk, with

ϕ(r)k = 1, k = 1, ..., m.

Then,

∫ b

a

f(x)dx =

m∑

k=1

∫ xk

xk−1

f(x)dx =

m∑

k=1

∫ xk

xk−1

ϕ(r)k (x)f(x)dx,

and applying r times the formula of integrating by parts to the last integrals,one obtains:∫ b

a

f(x)dx =m∑k=1

[ϕ(r−1)k (x)f(x)− ϕ(r−2)

k (x)f ′(x) + ...+ (−1)r−1ϕk(x)f(r−1)(x)]

∣∣∣xkxk−1

+ (−1)r∫ xk

xk−1

ϕk(x)f(r)(x)dx,


and further,

∫ b

a

f(x)dx =

r−1∑

j=0

(−1)j+1ϕ(r−j−1)1 (x0)f

(j)(x0)

+m−1∑

k=1

r−1∑

j=0

(−1)j(ϕk − ϕk+1)(r−j−1)(xi)f

(j)(xi)

+

r−1∑

j=0

(−1)jϕ(r−j−i)m (xm)f (j)(xm)

+ (−1)rm∑

k=1

∫ xk

xk−1

ϕk(x)f(r)(x)dx.

Therefore, considering ϕ∣∣[xk−1,xk] = ϕk, k = 1, ..., m, , we have

∫ b

a

f(x)dx =m∑

k=0

r−1∑

j=0

Akjf(j)(xk) +Rn(f), (3.1.25)

where

A0j = (−1)j+1ϕ(r−j−1)1 (x0), j = 0, 1, ..., r − 1

Akj = (−1)j(ϕk − ϕk+1)(r−j−1)(xk),

k = 1, ..., m− 1;j = 0, 1, ..., r − 1

Amj = (−1)jϕ(r−j−1)m (xn) , j = 0, 1, ..., r − 1

(3.1.26)

and

Rn(f) = (−1)r∫ b

a

ϕ(x)f (r)(x)dx, (3.1.27)

with

ϕ(x) = (xm−x)r

r!+ (−1)r+1

m∑

k=0

r−1∑

j=0

Akj(xk−x)r−j−1

+

(r−j−1)!.

If f ∈ Hm,2[a, b], then

|Rn(f)| ≤∥∥f (r)

∥∥2

(∫ b

a

ϕ2(x)dx

)1/2

.


This way, the optimal quadrature formula in the sense of Nikolski is de-termined by the coefficients A := (Akj)k=0,m;j=0,r−1 and the nodes X :=(xk)k=0,m, for which the functional

F (A,X) :=

∫ b

a

ϕ2(x)dx

takes the minimum value.

Remark 3.1.18. From (3.1.27) it follows that the quadrature formula (3.1.25)has the degree of exactness at least (r − 1).

Next, we must find the coefficients A and the nodes X that minimize thefunctional F (A,X). We have,

F (A,X) =

m∑

k=1

∫ xk

xk−1

ϕ2k(x)dx,

where

ϕk(x) =(xm − x)r

r!+ (−1)r+1

m∑

k=0

r−1∑

j=0

Akj(xk−x)r−j−1

+

(r−j−1)!

is a polynomial of r degree. So, we have to find the polynomial ϕk with theminimum L2

w[xk−1, xk] norm (w = 1). This is the Legendre polynomial:

lr,k(x) =r!

(2r)!

dr

dxr[(x− xk−1)

r(x− xk)r],

i.e., the coefficients Akj of ϕk, say Akj, are obtained from the identity

ϕk = lr,k, for all k = 1, ..., m.

Evidently, Akj = Akj(x1, ..., xm−1). One obtains

F (A,X) =

m∑

k=1

∫ xk

xk−1

l2r,k(x)dx,

that must be minimized with respect to the nodes xk, k = 1, ..., m − 1,(x0 = a, xm = b). Let X∗ = (a, x∗, ..., x∗m−1, b) be the minimum point. Itfollows that X∗ = (a, x∗, ..., x∗m−1, b) are the optimal nodes, respectively,

A∗kj = Akj(a, x

∗, ..., x∗m−1, b), k = 0, 1, ..., m; j = 0, 1, ..., r − 1


are the optimal coefficients.For the remainder term, one obtains

|R∗m(f)| ≤ [F (A∗, X∗)]1/2

∥∥f (r)∥∥

2.

Example 3.1.19. Let f ∈ H1,2[0, 1], 0 = x0 < x1 < ... < xm = 1 be apartition of the interval [0, 1], and

I(f) :=

∫ 1

0

f(x)dx =

m∑

k=0

Akf(xk) +Rm(f).

Applying the ϕ-function method (ϕ′k = 1), one obtains

∫ 1

0

f(x)dx = −ϕ1(0)+m−1∑

k=1

(ϕk−ϕk+1)(xk)f(xk)+ϕm(1)f(1)−∫ 1

0

ϕ(x)f ′(x)dx,

with

ϕ|[xk−1,xk]

= ϕk(x) = x−k−1∑

i=0

Ai.

We have

A0 = −ϕ1(0),

Ak = (ϕk − ϕk+1)(xk), k = 1, ..., m− 1,

Am = ϕm(1),

and

Rm(f) = −∫ 1

0

ϕ(x)f ′(x)dx.

Here

F (A,X) :=

∫ 1

0

ϕ2(x)dx =

m∑

k=1

∫ xk

xk−1

ϕ2k(x)dx,

with ϕk being a polynomial of the first degree, for all k = 1, ..., m. But it isknown that ‖ϕk‖2 takes the minimum value when ϕk ≡ l1,k, the Legendrepolynomial of first degree. It follows that

k−1∑

i=0

Ai =xk−1+xk

2(3.1.28)

3.2. Some optimality aspects in numerical integration of bivariate functions167

and

F (A,X) = 112

m∑

k=1

(xk − xk−1)3.

From the system of equations

∂F (A,X)

∂xk:= 1

4[(xk − xx−1)

2 − (xk+1 − xk)2] = 0, k = 1, ..., m,

it follows that

x∗k − x∗k−1 = x∗k+1 − x∗k, k = 1, ..., m− 1,

respectively,x∗k = k

m, k = 0, 1, ..., m.

The relations (3.1.28) become

k−1∑

i=0

A∗i =

x∗k−1+x

∗k

2= 2k−1

2, k = 1, ..., m.

whence,

A∗0 = 1

2m,

A∗1 = ... = A∗

m−1 = 1m,

A∗m = 1

2m.

We also haveF (A∗, X∗) = 1

12m2 ,

respectively,|R∗

m(f)| ≤ 12m

√3‖f ′‖2 .

3.2 Some optimality aspects in numerical in-

tegration of bivariate functions

Let D be a given domain in R2, f : D → R an integrable function on D andΛ(f) = λ1(f), ..., λN(f) a set of information on f. Next, one supposes thatλi(f), i = 1, ..., N, are the values of f or of certain of its derivatives at somepoints from D, called the cubature nodes.


One considers the cubature formula:

Ixyw f :=∫∫D

w(x, y)f(x, y)dxdy =

N∑

i=1

Ciλi(f) +RN (f),

where w is a given weight function (often w(x, y) = 1 on D), Ci, i = 1, ..., Nare its coefficients and RN(f) is the remainder term.

The problem to construct such a cubature formula consists in the de-termination of the coefficients Ci, i = 1, ..., N and its nodes, in some givenconditions, and to evaluate the corresponding remainder term.

For some particular cases, a cubature formula can be constructed usingthe product or the boolean sum of two quadrature rules.

The most results have been obtained when D is a regular domain in R2

(rectangle, triangle, etc.) and the cubature nodes are regularly spaced.First, one supposes that D is a rectangle, D = [a, b]× [c, d] and w(x, y) =

1, (x, y) ∈ D. If

Λx(f) = λxi f | i = 0, 1, ..., m,Λy(f) = λyjf | j = 0, 1, ..., n,

are sets of partial information on f, with respect to x, respectively to y, oneconsiders the quadrature formulas:

Ixf :=

∫ b

a

f(x, y)dx = (Qx1f)(·, y) + (Rx

1f)(·, y)

and

Iyf =

∫ d

c

f(x, y)dy = (Qy1f)(x, ·) + (Ry

1f)(x, ·),

where the quadrature rules Qx1 and Qy

1 are given by:

Qx1f)(·, y) =

m∑

i=0

Ai(λxi f)(·, y),

respectively,

(Qy1f)(x, ·) =

m∑

j=0

Bj(λyjf)(x, ·),


with Rx1 and Ry

1 the corresponding remainder operators,

Rx1 = Ix −Qx

1 ,

Ry1 = Iy −Qy

1.

It is easy to check the following decompositions of the double integral oper-ator Ixy:

Ixy = Qx1Q

y1 + (IyRx

1 + IxRy1 − Rx

1Ry1), (3.2.1)

andIxy = (Qx

1Iy +Qy

1Ix −Qx

1Qy1) +Rx

1Ry1. (3.2.2)

The identities (3.2.1) and (3.2.2) generate the product cubature formula:

Ixyf := QPf +RPf = Qx1Q

y1f + (Rx

1Iy + IxQy

1 − Rx1R

y1)f, (3.2.3)

respectively, the boolean sum cubature formula:

Ixyf := QSf +RSf = (Qx1I

y +Qy1I

x −Qx1Q

y1)f +Rx

1Ry1f. (3.2.4)

For example, if λxi f = f(xi, y), λyjf = f(x, yj), i = 0, 1, ..., m; j = 0, 1, ...n

and Qx1 , respectively Qy

1, are the trapezoidal rules, i.e.,

(Qx1f)(·, y) = b−a

2[f(a, y) + f(b, y)]

(Qy1f)(x, ·) = d−c

2[f(x, c) + f(x, d)]

then the formulas (3.2.3) and (3.2.4) become

∫ b

a

∫ d

c

f(x, y)dxdy = (b−a)(d−c)4

[f(a, c) + f(a, d) + f(b, c) + f(b, d)] +RP (f),

(3.2.5)with

RP (f) =− (b−a)312

∫ d

c

f (2,0)(ξ1, y)dy − (d−c)312

∫ b

a

f (0,2)(x, η1)dx−

− (b−a)312

(d−c)312

f (2,2)(ξ2, η2),

respectively,∫ b

a

∫ d

c

f(x, y)dxdy = b−a2

∫ d

c

[f(a, y) + f(b, y)]dy + d−c2

∫ b

a

[f(x, c) + f(x, d)]dx

(3.2.6)

− (b−a)2

(d−c)2

[f(a, c) + f(a, d) + f(b, c) + f(b, d)] +RS(f),


withRS(f) = − (b−a)3

12(d−c)3

12f (2,2)(ξ, η),

whereξ, ξ1, ξ2 ∈ [a, b] and η, η1, η2 ∈ [c, d].

This way, there are obtained the trapezoidal product cubature formula (3.2.5)and the trapezoidal boolean-sum cubature formula (3.2.6).

From (3.2.3) and (3.2.4), it follows that

ord(QP ) = minord(Qx1), ord(Qy

1),

andord(QS) = ord(Qx

1) + ord(Qy1).

Remark 3.2.1. The boolean-sum cubature operator has the remarkableproperty of having high approximation order, which can be seen as an opti-mality property. But, the boolean-sum formula contains the simple integralsIxf and Iyf. These simple integrals can be approximated, in a second levelof approximation, using new quadrature operators, say Qx

2 and Qy2.

From (3.2.4) there is obtained

Ixyf = Qf +RQf, (3.2.7)

withQ = Qx

1Qy2 +Qx

2Qy1 −Qx

1Qy1 (3.2.8)

andRQ = Qx

1Ry2 +Qy

1Rx2 +Rx

1Ry1, (3.2.9)

where

Rx2 = Ix −Qx

2 ,

Ry2 = Iy −Qy

2.

From (3.2.9) it follows that

ord(Q) = minord(Qy2) + 1, ord(Qx

2) + 1, ord(QS)].

The quadrature operators Qx2 and Qy

2 can be chosen in many ways. First ofall, it depends on the given information of the function f.


Remark 3.2.2. A natural way to select them is such that the approximationorder of the initial boolean-sum operators to be preserved. It is obvious thatits approximation order cannot be increased.

Definition 3.2.3. A cubature formula of the form (3.2.7), derived from theboolean-sum formula (3.2.4), which preserves the approximation order of QS,(ord(Qx

2) + 1 ≥ ord(QS) and ord(Qy2) + 1 ≥ ord(QS)), is called a consistent

cubature formula.

As the approximation order of the boolean-sum cubature operator QS

cannot be increased, it is preferable to choose the quadrature operators Qx2

and Qy2 such that each term of the remainder of (3.2.7) to have the same

approximation order.

Definition 3.2.4. A cubature formula of the form (3.2.7), for which

ord(Qx2) = ord(Qy

2) = ord(Qs)− 1,

is called a homogeneous cubature formula.

Example 3.2.5. Let Dh = [0, h]× [0, h] and f ∈ B2,2(0, 0). From the trape-zoidal boolean-sum cubature formula (3.2.6) it can be obtained a homoge-nous cubature formula, if in the second level of approximation it is used theSimpson quadrature formula:

∫ h

0

g(t)dt = h6[g(0) + 4g

(h2

)+ g(h)]− h5

2880g(4)(ξ), ξ ∈ [0, h],

where g ∈ C4[a, h].One obtains the following trapezoidal-Simpson homogeneous formula:

∫∫Dh

f(x, y)dxdy =h2

4

−1

3[f(0, 0) + f(0, h) + f(h, 0) + f(h, h)]

+ 43[f(0, h

2) + f(h, h

2) + f(h

2, 0) + f(h

2, h)] +R(f),

where

R(f) = − h5

144

[120f (4,0)(ξ1, η1) + 1

20f (0,4)(ξ2, η2) + f (2,2)(ξ3, η3)

],

with (ξi, ηi) ∈ Dh, i = 1, 2, 3.


Example 3.2.6. Also, from (3.2.6), using in the second level of approxima-tion, the following quadrature formula:

∫ h

0

g(t)dt = h2[g(0) + g(h)] + h2

12[g′(0)− g′(h)] +R(g),

withR(g) = h5

720g(4)(ξ), ξ ∈ [0, h], g ∈ C4[0, h],

it is obtained:

∫∫Dh

f(x, y)dxdy =h2

4[f(0, 0) + f(0, h) + f(h, 0) + f(h, h)]+ (3.2.10)

+ h3

24[(f (1,0) + f (0,1))(0, 0) + (f (1,0) − f (0,1))(0, h)

+ (f (0,1) − f (1,0))(h, 0)− (f (1,0) + f (0,1))(h, h)] +R(f),

withR(f) = h5

144[15f (4,0)(ξ1, η1) + 1

5f (0,4)(ξ2, η2)− f (2,2)(ξ3, η3)],

for (ξi, ηi) ∈ Dh, i = 1, 2, 3. This is a homogenous cubature formula.

Remark 3.2.7. Formula (3.2.10) contains the same nodes as the initial one(3.2.6), while in the trapezoidal-Simpson formula (3.2.8) there appear somenew nodes.

Example 3.2.8. Let

Ixyf =

∫ h

0

∫ h

0

f(x, y)dxdy,

and

(Qx1f)(·, y) = hf(h

2, y),

(Qy1f)(x1·) = hf(x, h

2),

be the gaussian quadrature rules. The suitable boolean-sum cubature for-mula is:

Ixyf = h

∫ h

0

f(h2, y)dy + h

∫ h

0

f(x, h2)dx− h2f(h

2, h

2) +RS(f), (3.2.11)

whereRS(f) = h6

576f (2,2)(ξ, η).


In order to get a homogeneous cubature formula in a second level of ap-proximation we must use some quadrature rules Qx

2 , Qy2, with ord(Qx

2) =ord(Qy

2) = 5. Such quadrature rules can be

(Qx2f)(·, η) = h

2[f(0, y) + f(h, y)] + h2

12[f (1,0)(0, y)− f (1,0)(h, y)] +Rx

2(f),

with(Rx

2f)(·, y) = h5

720f (4,0)(ξ, y),

respectively,

(Qy2f)(x, ·) = h

2[f(x, 0) + f(x, h)] + h2

12[f (0,1)(x, 0)− f (0,1)(x, h)] +Ry

2(f),

with(R2ξf)(x, ·) = h5

720f (0,4)(x, η).

Theorem 3.2.9. If f ∈ B22(0, 0) then we have the following homogeneouscubature formula, derived from (3.2.11):

∫∫Dh

f(x, y)dxdy =h2

2[f(h

2, 0) + f(h

2, h) + f(0, h

2) + f(h, h

2)− 2f(h

2, h

2)]

+ h3

12[f (1,0)(0, h

2)− f (1,0)(h, h

2) + f (0,1)(h

2, 0)

− f (0,1)(h2, h)] +R(f),

whereR(f) = h6

144[15f (4,0)(h

2, η) + 1

5f (0.4)(ξ, h

2) + 1

4f (2,2)(ξ1, η1)].

Another optimality criterion for the cubature formulas is that regardingits number of nodes. The optimality problem is to find the cubature formulawith the minimum number of nodes from a class of cubature formulas of thesame degree of exactness.

The product and boolean-sum cubature formulas can be constructed ap-plying the integral operator Ixy to both members of the product, respectivelyboolean-sum interpolation formulas, corresponding to the function f and thedata λxi f, i = 0, 1, ..., m and λyjf, j = 0, 1, ..., n. Moreover, if the operatorIxy is applied to a homogeneous interpolation formula it is obtained a ho-mogeneous cubature formula. Next, we give some examples for a triangulardomain. Let us consider the standard triangle:

Th = (x, y) ∈ R2, x ≥ 0, y ≥ 0, x+ y ≤ h, for h > 0.


Example 3.2.10. The operator

P = P1P2P3,

with

(P1f(x, y) = h−x−yh−y f(0, y) + x

h−y (f(h− y, y),(P2f)(x, y) = h−x−y

h−x f(x, 0) + yh−xf(x, h− x),

(P3f)(x, y) = xx+y

f(x+ y, 0) + yx+y

f(0, x+ y),

generates the homogenous interpolation formula:

f = Pf +Rf,

with(Pf)(x, y) = h−x−y

hf(0, 0) + x

hf(h, 0) + y

hf(0, h),

and

(Rf)(x, y) =

∫ h

0

ϕ20(x, y; s)f(2,0)(s, 0)ds+

∫ h

0

ϕ02(x, y; t)f(0,2)(0, t)dt+

+∫∫Th

ϕ11(x, y; s, t)f(11)(s, t)dsdt,

for f ∈ B11(0, 0),where ϕij are the Peano’s kernels. Then

∫∫Th

f(x, y)dxdy =∫∫Th

(Pf)(x, y)dxdy +∫∫Th

(Rf)(x, y)dxdy

is a homogeneous cubature formula.Indeed, we have

∫∫Th

f(x, y)dxdy = h2

6[f(0, 0) + f(h, 0) + f(0, h)] +R(f),

whereR(f) = h4

24[f (2,0)(ξ, 0) + f (0,2)(0, η)− f (1,1)(ξ1, η1)],

for ξ, η ∈ [0, h] and (ξ1, η1) ∈ Th.

Chapter 4

Numerical linear systems

Many problems of applied mathematics reduce to a set of linear equations,or a linear system

A · x = b,

with the matrix A and vector b given and the vector x to be determined, sothe solving of a linear system is one of the principal problem of numericalanalysis. An extensive set of algorithms have been developed for doing this,several of them being presented in the sequel.

4.1 Triangular systems of equations

Let us consider the system of equations

A · x = b, (4.1.1)

where A ∈Mn×n(R) is a lower triangular matrix of the form

a11 0 ... 0a21 a22 ... 0...an1 an2 ... ann

,

x, b ∈ Mn×1(R), and∏n

i=1 aii 6= 0.

175

176 Chapter 4. Numerical linear systems

Then, the solution of (4.1.1) is obtained by forward substitution, accord-ing with the relations:

xi =

(bi −

i−1∑

k=1

aikxk

)/aii, i = 1, n. (4.1.2)

Let us consider the system

A · x = b, (4.1.3)

with x, b ∈ Mn×1(R), and A ∈ Mn×n(R) an upper triangular matrix ofthe form

a11 a12 ... a1n

0 a22 ... a2n...0 0 ... ann

,

with∏n

i=1 aii 6= 0.Then, the solution of (4.1.3) is obtained by backward substitution, ac-

cording with the relations:

xi =

(bi −

n∑

k=i+1

aikxk

)/aii, i = 1, n.

4.1.1 The Blocks version

If we consider the system (4.1.1) and x1 has been found, then, after substi-tution of x1 into the equations from the second to the n-th, we obtain a new(n− 1)× (n− 1) lower triangular system

a22 0 . . . 0a32 a33 . . . 0...an2 an3 . . . ann

x2

x3...xn

=

b2 − a21x1

b3 − a32x1...

bn − an1x1

.

Continuing in this way, we obtain the solution of system (4.1.1).Note 1. In the same way, we may consider the column version. In

that situation, matrix A is considered upper triangular, and by substituting

4.2. Direct methods for solving linear systems of equations 177

xn into the equations from the first to the (n − 1)-th, we obtain a new(n− 1)× (n− 1) upper triangular system.

Note 2. The block version of the triangular systems is more appropriateto the parallel calculus.

4.2 Direct methods for solving linear systems

of equations

Let us consider a system of linear equations

A · x = b, (4.2.1)

where A ∈Mn×n(R), x, b ∈ n×1(R).Here A is any matrix of the form

a11 a12 . . . a1n

a21 a22 . . . a2n...an1 an2 . . . ann

.

The direct methods for solving such systems give the exact solution, letit be x∗ ∈Mn×1(R).

In the literature, many direct methods for solving systems of type (4.2.1)are known. The Cramer method is one of it. It is a performant method,from computational point of view, for systems of small dimension. For largersystems, there are other methods, more appropriate, which generate efficientalgorithms, both for serial and for parallel computers. Generally, all thesemethods are based on some factorization techniques, as can be seen in whatfollows.

4.2.1 Factorization techniques

These type of methods are based on the idea of converting A into productsof the form LU or LDU , where L is zero above the main diagonal, U iszero below it and D has only diagonal elements different from zero. If L orU has all diagonal elements equal to 1, it is called unit triangular. Severalmethods, as Gauss, Gauss-Jordan, Doolittle, Crout, Cholesky, etc. producefactorization.


4.2.1.1 The Gauss method

The Gauss method, known also as the method of partial elimination, is one ofthe oldest algorithm and still one of the most popular. Many authors presentit. We remind here the main characteristics of this method, in principal dueto the fact that it generates a factorization of the matrix A.

Let write the system (4.2.1) in the detailed form:

a11x1 + a12x2 + · · ·+ a1nxn = b1a21x1 + a22x2 + · · ·+ a2nxn = b2...an1x1 + an2x2 + · · ·+ annxn = bn.

(4.2.2)

By Gauss method, the system (4.2.2) can be solved in two stages.The first stage. Replacing the equations by combinations of equations,

(by using algebraic elementary transformations on the extended matrix A),the following triangular system is obtained:

a(1)11 x1 + a

(1)12 x2 + · · ·+ a

(1)1nxn = b

(1)1

a(2)21 x2 + · · ·+ a

(2)2nxn = b

(2)2

...

a(n)nnxn = b(n)

n

(4.2.3)

The elements of the new matrix are given by the relations:

a(1)ij = aij , b

(1)i = bi, i, j = 1, n

and

a(p+1)ij = a

(p)ij − a(p)

pj ·a

(p)ip

a(p)pp

b(p+1)i = b

(p)i − b(p)p ·

a(p)ip

a(p)pp

p = 1, n− 1, i, j = p+ 1, n,

for a(p)pp 6= 0, p = 1, n.

The second stage. The components of the vector x are easily found,one after another, by a process called back-substitution. So, the last equationdetermines xn, which is then substituted in the next-to-last equation to getxn−1, and so on.


The relations are:

xn =b(n)n

a(n)nn

xp =1

a(p)pp

(b(p)p −

n∑

j=p+1

a(p)pj xj

). (4.2.4)

The process of elimination can be described more accurate by means ofa sequence of matrices. If we denote b

(p)p by a

(p)p,n+1, for p = 1, n, and use the

extended matrix of the system (4.2.3),

A =

a11 a12 ... a1n a1,n+1

0 a22 a2n a2,n+1...0 0 0 ann an,n+1

, (4.2.5)

we get the following sequence of matrices: A(1), A

(2), . . . , A

(n), where A

(1)is

matrix A given in (4.2.5) and A(p)

, for every p = 2, n contains the elements:

a(p)ij =

a(p−1)ij , when i = 1, p− 1 and j = 1, n+ 1

0, when i = p, n and j = 1, p− 1

a(p−1)ij −

a(p−1)i,p−1

a(p−1)p−1,p−1

· a(p−1)p−1,j , when i = 1, p− 1 and j = 1, n+ 1.

Remark 4.2.1. The Gauss method yields a factorization of matrix A in theform

A = L · U,where U is the upper triangular matrix shown in (4.2.3) and L is a lowertriangular one with 1 on the diagonal.

Remark 4.2.2. If A is nonsingular, then condition (4.2.4) can always be

fulfilled. So, by using the technique of pivoting, even if one a(p)pp = 0, we look

fora

(p)ipjp = max|a(p)

ij | | i, j = p, nThen, replacing the rows p and ip, respectively the columns p and jp, the

element a(p)ipjp

will take the place of a(p)pp . Obviously, a

(p)ipjp6= 0.


Remark 4.2.3. On opposite, if A is singular, then Ax = b will have eitherno solution at all or infinitely many solutions. In fact, the Gauss method canbe used to prove the fundamental theorem of algebra, which deals with thequestion of whether or not a solution exists.

4.2.1.2 The Gauss-Jordan method

The Gauss-Jordan method, known also as ”the method of complete elimi-nation”, is based in fact on the same idea of computation as in the Gaussmethod, with the only difference that in the process of elimination, the sys-tem matrix will become a diagonal matrix, so the system (4.2.1) will be,finally, of the form:

a(1)11 x1 = b

(1)1

a(2)22 x2 = b

(2)2

. . ....

a(n)nnxn = b

(n)n .

Then the components of solution x∗ will be:

xi =b(i)i

a(i)ii

, i = 1, n.

Obviously, all a(i)ii 6= 0.

4.2.1.3 The LU methods

Under certain conditions, the system matrix A of (4.2.1) can be expressed inthe form of a product of a lower triangular matrix L and an upper triangularmatrix U and, as the result, we have to solve two systems with triangularmatrices.

So,A = LU,

then the system (4.2.1) becomes:

LUx = b. (4.2.6)

which can be solved in two steps:


Step 1. Solve L · y = b.Step 2. Solve U · x = y.The matrix L will have the form:

L =

l11 0 ... 0l21 l22 ... 0...ln1 ln2 lnn

and the matrix U will have the form:

U =

u11 u12 ... u1n

0 u22 ... u2n...0 0 unn

.

The LU decompositions are basically modified forms of Gaussian elimination.The way in which the entries are computed is given in what follows, knownas the Doolittle algorithm.

4.2.1.4 The Doolittle algorithm

This algorithm does the elimination column by column starting from the left,by multiplying A to the left with lower triangular matrices. It results in aunit lower triangular matrix and an upper triangular matrix. Moreprecisely, for system (4.2.1), with akk 6= 0, k = 1, n we denote

li,k := −a

(k−1)i,k

a(k−1)k,k

, i = k + 1, n,

t(k) = [0 . . . 0︸︷︷︸k zeros

lk+1,k . . . ln,k]

andMk = I − t(k)eTk , (4.2.7)

where

ek =

0...1...0


is the k-unit vector.Then

Mkx =

1 . . . 0 0 . . . 0...

. . ....

.... . .

...0 . . . 1 0 . . . 00 . . . lk+1,k 1 . . . 0...

. . ....

.... . .

...0 . . . ln,k 0 . . . 1

x1...xkxk+1

...xm

=

x1...xk0...0

.

Definition 4.2.4. A matrix Mk of the form (4.2.7) is called a Gauss matrix,the components li,k, i = k + 1, n are called Gauss multipliers and the vectort(k) is called the Gauss vector. The transformation defined with the Gaussmatrix Mk is called the Gauss transformation.

Note 1. If A ∈Mn×n(R) then the Gauss matrices M1, . . . ,Mn−1 can befound such that

Mn−1 ·Mn−2 . . .M2 ·M1 · A = U

is upper triangular.Moreover, if we choose

L = M−11 ·M−1

2 . . .M−1n−1

thenA = L · U

and so we obtain our factorization.

Example 4.2.5. Find the LU factorization of the matrix

A =

(2 18 7

)

using the Doolittle algorithm.

Solution. Find the Gauss matrix M1 for A:

M1 = I − t(1)eT1 =

(1 00 1

)−(

08

2

)·(

1 0)

=

(1 00 1

)−(

0 04 0

)=

(1 0−4 1

).


Thus

M1 · A =

(1 0−4 1

)·(

2 18 7

)=

(2 10 3

)= U

M−11 =

(1 04 1

)= L.

So

A =

(2 18 7

)= L · U =

(1 04 1

)·(

2 10 3

).

Note 2. There is another algorithm, called the Crout Algorithmwhich is slightly different. It constructs a lower triangular matrix and anunit upper triangular matrix.

4.2.1.5 Block LU decomposition

The Block LU decomposition generates a lower block triangular matrix L andan upper block triangular matrix U . This decomposition is used in order toreduce the complexity computation and is appropriate to parallel calculus.

Consider a block matrix:(A BC D

)=

(I

CA−1

)· A ·

(I A−1B

)+

(0 00 D − CA−1B

),

when matrix A is assumed to be non-singular, I is an identity matrix withproper dimension, and O is a null matrix.

We can also rewrite the above equation using the half matrices:

(A BC D

)=

(A

12

CA− 12

)(A

∗2 A− 1

2B)

+

(0 0

0 Q12

)(0 00 Q

∗2

),

where Q = D−CA−1B is called the Schur complement of A, and the halfmatrices can be calculated by means of Cholesky decomposition (see section4.2.1.6).

The half matrices satisfy that

A12 · A ∗

2 = A, A12 · A− 1

2 = I, A− ∗2 · A ∗

2 = I,

Q12Q

∗2 = Q.


Thus, we have (A BC D

)= LU,

where

LU =

(A

12 0

CA− ∗2 0

)(A

∗2 A− 1

2B0 0

)+

(0 0

0 Q12

)(0 0

0 Q12

).

The matrix LU can be decomposed in an algebraic manner into

L =

(A

12 0

CA− ∗2 Q

12

)and U =

(A

∗2 A− 1

2B0 Q

∗2

).

4.2.1.6 The Cholesky decomposition

The Cholesky decomposition is a matrix decomposition of a symmetric positive-definite matrix into a lower triangular matrix and the transpose of the lowertriangular matrix. The lower triangular matrix is the Cholesky triangle ofthe original, positive definite, matrix.

As we saw in the previous section, any square matrix A can be written asthe product of a lower triangular matrix L and an upper triangular matrix U .However, if A is symmetric and positive definite, we can choose the factorssuch that U is the transpose of L.

Remark 4.2.6. When is applicable, the Cholesky decomposition is twice asefficient as the LU decomposition.

Let’s consider the system

A · x = b, (4.2.8)

where A ∈ Mn×n(R) is symmetric (A = AT ) and positive definite (xTAx >0, ∀ x ∈Mn×1(R)). Also, b ∈Mn×1(R).

In order to solve (4.2.8), first we have to compute the Cholesky decom-position

A = L · LT ,then we solve

L · y = b

to get y ∈Mn×1(R) andLT · x = y


to get x, the solution of the initial system.The Cholesky algorithmIt is a modifed version of the Gauss recursive algorithm.Step 1. i := 1, A(1) := AFor i := 1 to n doStep i.

A(i) =

Ii−1 0 00 aii bTi0 bi B(i)

,

where Ii−1 denotes the identity matrix of dimension i− 1

Li :=

Ii−1 0 00

√aii 0

01√aiibi In−i

.

So,A(i) = Li · A(i+1) · LTi ,

where

A(i+1) =

Ii−1 0 00 1 0

0 0 B(i) − 1

aiibib

Ti

.

After n steps, we get A(n+1) = I. Hence, the lower triangular matrix L weare looking for is calculated as

L := L1L2 . . . Ln.

Remark 4.2.7. The following formula for the entries of L holds

l11 =√a11

li1 = ai1/l11, i = 2, n

lii =

[aii −

i−1∑

k=1

l2ik

]1/2

, i = 2, n

lij =1

ljj

[aij −

j−1∑

k=1

likljk

], i = 3, n, j = 2, i− 1.

Remark 4.2.8. The expression under the square root is always positive asA is real and positive definite.


4.2.1.7 The QR decomposition

The QR decomposition of a matrix is a method which permits the writingof matrix A into a product of an orthogonal matrix and a triangular one.

So, let’s consider the system of linear equations

A · x = b, (4.2.9)

where A ∈Mn×n(R), x, b ∈ Mn×1(R).

According with the QR decomposition, matrix A can be written

A = Q ·R,

where Q is an orthogonal matrix (meaning that QT · Q = I) and R is anupper triangular matrix.

Remark 4.2.9. This factorization is unique if we require that the diagonalelements of R to be positive.

There are several methods for computing the QR decomposition, suchas: by means of Givens rotations, Householder transformation or the Gram-Schmidt decomposition.

Computing QR by means of Gram-Schmidt decomposition. Ac-cording with the Gram-Schmidt method, the matrix A can be written ascolumns:

A =(a1 . . . an

).

Then, denoting by 〈·, ·〉 the inner vectorial product, we have

u1 = a1, e1 =u1

‖u1‖u2 = a2 − 〈e1, a2〉, e2 =

u2

‖u2‖...

uk = ak −k−1∑

j=1

〈ej , ek〉, ak =uk‖uk‖


or, rearranging the terms,

a1 = e1‖u1‖a2 = 〈e1, a2〉+ e‖u2‖a3 = 〈e1, a3〉+ 〈e2, a3〉+ e3‖u3‖...

ak =k−1∑

j=1

〈ej, ak〉+ ek‖uk‖.

In the matrix form, these equations can be written:

(e1 . . . en

)

‖u1‖ 〈e1, a2〉〈e1, a3〉 . . .0 ‖u2‖ 〈e2, a3〉 . . .0 0 ‖u3‖ . . ....

......

.

But the product of each row and column of the matrices above gives us arespective column of A that we started with. So, the matrix Q is the matrixof eks:

Q =(e1 . . . en

).

Then,

R = QT · A =

〈e1, a1〉〈e1, a2〉〈e1, a3〉 . . .0 〈e2, a2〉〈e2, a3〉 . . .0 0 〈e3, a3〉 . . ....

......

.

Note that 〈ej, aj〉 = ‖uj‖, 〈ej , ak〉 = 0 for j > k and QQT = I, so QT = Q−1.

Example 4.2.10. Let us consider the decomposition of

A =

12 −51 46 167 −68−4 24 −41

.

We compute the matrix Q by means of Gram-Schmidt method, as follows:

U =(u1 u2 u3

)

12 −69 −586 158 6−4 30 −165

,


Q =

(u1

‖u1‖u2

‖u2‖u3

‖u3‖)

=

6/7 −69/175 −58/1753/7 158/175 6/175−2/7 6/35 −33/35

.

Thus, we haveA = Q ·QT

︸︷︷︸I

·A = Q ·R,

so

R = QT · A =

14 21 −140 175 −700 0 35

.

Computing QR by means of Householder reflections. A House-holder reflection (or Householder transformation) is a transformation thattakes a vector and ”reflects” it about some plane. We can use this propertyto calculate the QR factorization of a matrix.

Q can be used to reflect a vector in such a way that all coordinates butone disappear.

Let x be an arbitrary m-dimensional column vector such that ‖x‖ = |α|for a scalar α.

Then, for e1 = (1, 0, . . . , 0)T and ‖ · ‖ the Euclidean norm, set

u = x− αe1,v =

u

‖u‖ ,

Q = I − 2v · vT ,

Q is a Householder matrix and

Qx = (α, 0, . . . , 0)T .

This can be used to gradually transfer an m-by-n matrix A to uppertriangular form. First, we multiply A with the Householder matrix Q1 thatwe obtain when we choose the first matrix column for x. This results in amatrix QA with zeros in the left column (except for the first row)

Q1A =

α1 ∗ . . . ∗0... A′

0

.


This can be repeated for A′ (obtained from A by deleting the first rowand first column) resulting in a Householder matrix Q′

2. Note that Q′2 is

smaller than Q1. Since we want it really to operate on Q1A instead of A′ weneed to expand it to the upper left, filling in a 1, or in general:

Qk =

(Ik−1 00 Q′

k

).

After t iterations of the process, t = min(m− 1, n),

R = Qt . . . Q2Q1A

is an upper triangular matrix. So, with

Q = Q1Q2 . . . Qt,

A = QR,

so we have needed QR decomposition of A.

Example 4.2.11. Let us consider the matrix A like in Example 4.2.10:

A =

12 −51 46 167 −68−4 24 −41

.

We decompose A = Q · R by means of the Householder transformation.First, we need to find a reflection that transforms the first column of

matrix A, vector a1 = ( 12 6 −4 )T , to ‖a1‖ e1 = ( 14 0 0 )T .Now, for α = 14 and x = a1 = ( 12 6 −4 )T , we compute

u = x− αe1 =(−2 6 −4

)T

and

v =u

‖u‖ =1√14

(−1 3 −2

)T.

Then,

Q1 = I − 2√14 ·√

14

−13−2

( −1 3 −2

)

= I − 1

7

1 −3 2−3 9 −62 −6 4

=

6/7 3/7 −2/73/7 −2/7 6/7−2/7 6/7 3/7

.


Now observe:

Q1 ·A =

14 21 −140 −49 −140 168 −77

.

Further, we need to make zero the (3, 2) entry. We apply the same processto

A′ = M11 =

(−49 −14168 −77

).

Finally, we get the matrix of the Householder transformation:

Q2 =

1 0 00 −7/25 24/250 24/25 7/25

.

We find

Q = Q1Q2 =

6/7 −69/175 58/1753/7 158/175 −6/175−2/7 6/35 33/35

R = QT · A =

14 21 −140 175 −700 0 −35

.

The matrix Q is orthogonal and R is upper triangular, so A = QR is therequired QR-decomposition.

Computing QR by means of Givens rotations. The QR decompo-sition can also be computed with a series of Givens rotations, where a Givensrotation is defined by

R(θ) :=

(cos θ sin θ− sin θ cos θ

)

and rotates the vector x ∈ R2 with an angle θ. Applying such a rotation tomatrix A in the system, this will make zeros an element in the sub-diagonalof the matrix. The concaternation of all the given rotations forms the or-thogonal Q matrix.

Example 4.2.12. Let us consider the same previous example

A =

12 −51 46 167 −68−4 24 −41

.


First, we need to form a rotation matrix that will zero the element a31 =−4. We get this matrix using the Givens rotation method, and call thematrix G1. We rotate the vector (6,−4), using the angle θ = arctan

(−4

6

).

Then,

G1 =

1 0 00 cos θ sin θ0 − sin θ cos θ

=

1 0 00 0, 83 −0, 550 0, 55 0, 83

and

G1A =

12 −51 47, 21 125, 6 −33, 83

0 112, 6 −71, 83

.

The matrix G2 and G3 will make zeros the sub-diagonal elements a21 anda32, forming a rectangular matrix R. The orthogonal matrix QT is formedfrom the concatenation of all the given matrices:

QT = G3G2G1.

Thus, we have G3G2G1A = QTA = R and the QR-decomposition is

A = Q · R.

Remark 4.2.13. The Givens method for a QR decomposition is more ap-propriate to parallelization than the previous methods.

4.2.1.8 The WZ decomposition

Let us consider the system of linear equations

A · x = b

where A ∈Mn×n(R), x, b ∈Mn×1(R).

The WZ decomposition will decompose matrix A, dense and nonsingular,into a product of matrices W and Z, consisting of following columns wi and


rows zi, i.e.:

wi = (0 . . . 01︸︷︷︸i

wi+1,i . . . wn−i,i0 . . . 0)T , i = 1, m,

wi = 0 . . . 01︸︷︷︸i

0 . . . 0)T , i = p, q

wi = (0 . . . 0︸︷︷︸n−i+1

wn−i+2,i . . . wi−1,i10 . . . 0)T , i = q + 1, n,

zi = (0 . . . 0︸︷︷︸i−1

zii . . . zi,n−i+10 . . . 0), i = 1, p

zi = (0 . . . 0zi,n−i+1 . . . zii0 . . . 0), i = p+ 1, n

wherem = [(n− 1)/2], p = [(n+ 1)/2], q = [(n+ 1)/2].

For example, for n = 5, we have:

W =

1 0 0 0 0w21 1 0 0 w25

w31 w32 1 w34 w35

w41 0 0 1 w45

0 0 0 0 1

, Z =

z11 z12 z13 z14 z150 z22 z23 z24 00 0 z33 0 00 z42 z43 z44 0z51 z52 z53 z54 z55

.

After factorization, we can solve the two linear systems:

Wc = b

andZx = c,

instead of one.Computing the WZ-decomposition. The WZ algorithm consists of

two parts: reduction of the matrix A (and the vector b) to the matrix Z (andthe vector c), and next, solving equation Z · x = c.

Reduction of matrix AStep 1. We zero the elements from the 2nd to n− 1st rows in the 1st and

nth columns. Finally, we can write this as follows:Step 1.1. We compute wi1 and win from the linear system

a11wi1 + an1win = −ai1a1nwi1 + annwin = −ain, for i = 2, n− 1


and we get the matrix:

W (1) =

1 0w21 1 w2n...

. . ....

wn−1,1 1 wn−1,n

0 1

.

Step 1.2. Compute

A(1) = W (1)A, b(1) = W (1)b.

So, we get the linear system A(1)x = b(1), where

A(1) =

a11 a12 . . . a1,n−1 a1n

0 a(1)22 . . . a

(1)2,n−1 0

......

......

0 a(1)n−1,2 . . . a

(1)n−1,n−1 0

an1 an2 . . . an,n−1 ann

, b(1) =

b1b(1)2...

b(1)n−1

bn

and

a(1)ij = aij + wi1a1j + winanj , j = 2, n− 1, i = 2, n− 1,

b(1)i = bi + wi1b1 + winbn, j = 1, n− 1.

Similarly, we carry out the second step (and the next steps) which consistsin the same operations as before, but only on the submatrices of A(1) obtainedby deleting the first and the last rows and columns of the matrix A(1).

After m such steps we get the matrix Z = A(m) (as defined in (4.2.1.8))and the vector c = b(m). We have

W (m) . . .W (1)A = Z,

so

A = (W (1))−1 . . . (W (m))−1Z = WZ.

Step 2. The second part of the method is to solve the linear systemZx = b(m), which consists in solving a linear system with two unknownquantities xp and xq and next updating the vector b. Formally, we have:


Step 2.1. We find xp and xq from the system

zppxp + zpqxq = b

(m)p

zqpxp + zqqxq = b(m)q .

(4.2.10)

Step 2.2. Compute

c(1)i = ci − zipxp − ziqxq, i = 1, . . . , p− 1, q + 1, . . . , n.

For the odd n, system (4.2.10) consists (in the last step) of one equation.Similarly, we make the next steps for the inner two-equations system. Thereare m+ 1 such steps.

Remark 4.2.14. The WZ-decomposition can be very well parallelized, aswe shall see in Chapter 7.

4.3 Iterative methods

The iterative methods for solving a linear system of equations of the form

A · x = b, (4.3.1)

with A ∈Mn×n(R), x, b ∈ n×1(R), generate an approximation of the solutionvector x, by means of a sequence of successive approximations, in case of theconvergence of the method. So, in order to solve iteratively a system ofequations, the following steps have to be performed:

Step 1. Determine the convergence of the iterative method.

In order to do this, we decompose matrix A of the system (4.3.1) in asum of three matrices:

A = L+D + U, (4.3.2)

where

L =

0 0 . . . 0a21 0 . . . 0...

. . ....

an1 an2 . . . 0

, U =

0 a12 . . . a1n

0 0 . . . a2n...

......

0 0 . . . 0

4.3. Iterative methods 195

and

D =

a11 0 . . . 00 a22 . . . 0...0 0 . . . ann

.

So, all iterative methods are of the type

Mx(k+1) = Nx(k) + b, (4.3.3)

where A = M −N and x(k), x(k+1) denote the approximation vectors of thesolution x at the step k, respectively k+1, based on an initial approximationx(0).

Recalling the notion of spectral radius of a matrix G as the quantity

ρ(G) = max|λ| | λ eingenvalue of G

the following result holds.

Theorem 4.3.1. Let A = M −N be the decomposition of the regular matrixA ∈Mn×n(R) and b ∈Mn×1(R). If the matrix M is regular and ρ(M−1N) <1, then the iterative method is convergent.

Remark 4.3.2. If the iterative method is convergent, then it makes sense tocontinue with Step 2, in order to get the approximate solution. Otherwise,the method cannot be applied.

Step 2. Starting with an initial value x(0), the sequence of successiveapproximations

x(1), x(2), . . . , x(k), . . . , x(n), . . . (4.3.4)

is generated, according with the iterative method used.Because the method is convergent, the sequence (4.3.4) will converge to

the exact solution x of system (4.3.1).Step 3. The stop condition is ‖x(n)−x(n−1)‖ ≤ ε, for given ε, where ‖ · ‖

is the Euclidean norm. Then x is approximated by x(n) or x(n−1).

Remark 4.3.3. The initial approximation x(0) does not influence the con-vergence of the method, only its speed-up.

In what follows, we shall present different ways of generating the sequence(4.3.4), according with different iterative methods.


4.3.1 The Jacobi method

Let us write the system (4.3.1) in the following form, having aii 6= 0, i = 1, n:

xi =

(bi −

n∑

j=1,j 6=iaijxj

)/aii, i = 1, n. (4.3.5)

Starting with an initial approximation x(0)i , i = 1, n, Jacobi’s iterative process

is defined by the algorithm

x(k+1)i =

(bi −

n∑

j=1,j 6=iaijx

(k)j

)/aii, i = 1, n, (4.3.6)

or, in matriceal form, based on decomposition (4.3.2) of A,

Dx(k+1) = −(L+ U)x(k) + b. (4.3.7)

So, MJ = D and NJ = −(L+ U), and we have to verify for convergence theproperty

ρ(M−1J NJ) < 1.

4.3.2 The Gauss-Seidel method

Considering again the system (4.3.1) in the form (4.3.5), and starting with

x(0)i , i = 1, n, the initial approximation, the Gauss-Seidel iterative process is

defined by the algorithm:

x(k+1)i =

(bi −

i−1∑

j=1

aijx(k+1)j −

n∑

j=i+1

aijx(k)j

)/aii, i = 1, n (4.3.8)

or, in matriceal form, based on (4.3.2),

(D + L)x(k+1) = −Ux(k) + b,

so MGS = D + L and NGS = −U .To see if the method is convergent, we have to verify

ρ(M−1GSNGS) < 1.


Remark 4.3.4. If the matrix A ∈Mn×n(R) has a strictly dominant diago-nal, it means

|aii| >n∑

j=1j 6=i

|aij|, i = 1, n,

then the spectral radius ρ(M−1J NJ) < 1.

Also, if A ∈ Mn×n(R) is a symmetric positive definite matrix, then theGauss-Seidel process is convergent.

Example 4.3.5. Solve, with an error of 10−1, the following system:

A · x = b,

where

A =

1 1 −1−1 3 01 0 −2

and b =

02−3

,

using Jacobi method.

Let us represent the matrix A in the form

A = L+D + U

=

0 0 0−1 0 01 0 0

+

1 0 00 3 00 0 −2

+

0 1 −10 0 00 0 0

.

Then, we form the matrices MJ and NJ :

MJ = D =

1 0 00 3 00 0 −2

,

NJ = −(L+ U) =

0 −1 11 0 0−1 0 0

.

In order to determine the convergence of the method, we have to compute


ρ(M−1J NJ), where N−1

J NJ is:

M−1J NJ =

1 0 00 1

30

0 0 −12

0 −1 11 0 0−1 0 0

=

0 −1 113

0 012

0 0

.

The eigenvalues of M−1J NJ are:

λ(M−1J NJ) =

0,− 1√

6, 1√

6

andρ(M−1

J NJ) = 1√6< 1,

so the Jacobi method converges. It makes sense to continue by generatingthe successive approximation sequence, with

M−1J b =

1 0 00 1

30

0 0 −12

02−3

=

02332

and x0 =(

0 0 0)T

, according with the relation

x(k+1) = M−1J NJx

(k) +M−1J b.

We get

x(1) =

0 −1 113

0 012

0 0

000

+

02332

=

0

0.666671.5

x(2) =

0 −1 113

0 012

0 0

00.66667

1.5

=

0.833330.66667

1.5

,

and so on. After 5 steps, we obtain

x(5) =(

0.97226 0.99075 0.9861)T,

with the required error, 10−1, taking into account that the exact solution of

the system is x =(

1 1 2)T

.

Remark 4.3.6. Following the same scheme, we can solve the system withthe Gauss-Seidel method.


4.3.3 The method of relaxation

Normally, the Gauss-Seidel method is twice as fast as the Jacobi methodbut, if the quantity ρ(M−1

G NG) is smaller than one but close to one, theneven the Gauss-Seidel speed of convergence decreases. It arises a problem:how to accelerate the convergence of the sequence of approximations? Themethod which solve this problem is known as the method of relaxation (or,sometimes, called ”successive overrelaxation”, or SOR).

It is based on the algorithm:

x(k+1)i = ω

(bi −

i−1∑

j=1

aijx(k+1)j −

n∑

j=i+1

aijx(k)j

)/aii + (1− ω)x

(k)i , i = 1, n,

which can be written in matrix form:

(D + ωL)x(k+1) = ((1− ω)D − ωU)x(k) + ωb,

where MSOR = D + ωL and NSOR = (1− ω)D − ωU .The problem lies in finding the parameter ω so that ρ(M−1

SORNSOR) werethe least. According with Kahan’s theorem, the method converges for 0 <ω < 2.

Remark 4.3.7. For ω = 1, SOR method is the Gauss-Seidel method.

4.3.4 The Chebyshev method

In order to accelerate the convergence of an iterative process, the followingChebyshev’s method can be used.

Let us suppose that, using an iterative method, we have found the ap-proximation x(1), . . . , x(k) of the solution x of system (4.3.1).

Let be

y(k) =

k∑

j=0

νj(k)x(j). (4.3.9)

The problem lies in finding the coefficients ν(k)j in formula (4.3.9) so that

the error vector y(k) − x of the approximation y(k) is smaller than the errorvector x(k) − x. In case of

x(j) = x, j = 1, k,


it is natural to demand y(k) = x. This is just a case when

k∑

j=0

νj(k) = 1. (4.3.10)

In order to determine the coefficients needed, we can write

x(k) − x = (M−1N)ke(0),

then

y(k) − x =k∑

j=0

νj(k)(x(j) − x) =

k∑

j=0

νj(k)(M−1N)je(0)

=k∑

j=0

νj(k)G(j)e(0) = pk(G)e(0),

where G = M−1N and

pk(z) =

k∑

j=0

νj(k)zj . (4.3.11)

From (4.3.10) it follows thatpk(1) = 1. (4.3.12)

In addition,‖y(k) − x‖2 ≤ ‖pk(G)‖2‖e(0)‖2. (4.3.13)

In the case of a symmetric matrix G, it eigenvalues satisfy the followinginequalities:

−1 < α ≤ λn ≤ · · · ≤ λ1 ≤ β < 1.

If λ is the eigenvalue corresponding to the eigenvector x, then

pk(G)x =

k∑

j=0

νj(k)Gjx =

k∑

j=0

νj(k)λjx,

i.e., the vector x is also the eigenvector of the matrix pk(G) correspondingto the eigenvalue

k∑

j=0

νj(k)λj = pk(λ).


In case of G symmetric, matrix pk(G) is also symmetric. Hence,

‖pk(G)‖2 = maxλi∈λ(G)

∣∣∣∣∣

k∑

j=0

νj(k)λji

∣∣∣∣∣ ≤ maxλ∈[α,β]

|pk(λ)|. (4.3.14)

To decrease the quantity ‖pk(G)‖2 one must find such a polynomial pk(z)that has small values on the segment [α, β] and satisfies the condition (4.3.12).The polynomials which have such properties are the Chebyshev’s. They aredefined on the interval [−1, 1] by the recurrence relation

cj(z) = 2zcj−1(z)− cj−2(z), j = 2, 3, . . . ,

with c0(z) = 1 and c1(z) = z.

These polynomials satisfy the inequality

|cj(z)| ≤ 1, z ∈ [−1, 1]

and cj(1) = 1, and the values |cj(z)| grow quickly outside the segment [−1, 1].

Further, the polynomial

pk(z) =ck(−1 + 2(z − α)/(β − α))

ck(µ),

where

µ = −1 + 21− αβ − α = 1 + 2

1− ββ − α > 1

satisfies the condition (4.3.12) and

|pk(z)| ≤ 1 for z ∈ [α, β].

Taking into account relations (4.3.13) and (4.3.14), we find

‖y(k) − x‖2 ≤‖x(k) − x‖2|ck(µ)| .

Therefore, the greater µ is, the greater |ck(µ)| is and, consequently, thegreater will be the convergence of the method.


4.3.5 The method of Steepest Descendent

This method is a very important one, and also contains the main ideas ofanother well known iterative method: the Conjugate Gradient (CG) method.

For didactical reason, we choose to present in this section only the Steep-est Descendent method, because it is easy understandable, and maybe not sowell known as CG, in spite of the fact that it has comparable performances.

Let us consider the linear system of equations

A · x = b, (4.3.15)

where A ∈Mn×n(R) is a symmetric, positive-definite matrix, x, b ∈Mn×1(R).With the information given by (4.3.15) we generate the following quadratic

form:

f(x) =1

2xT · A · x− bT · x+ c, (4.3.16)

where c is a scalar constant, which will have the gradient

f ′(x) =1

2AT · x+

1

2Ax− b. (4.3.17)

If A is symmetric, (4.3.17) reduces to

f ′(x) = Ax− b. (4.3.18)

Remark 4.3.8. We recall that the gradient of a quadratic form is definedto be

f ′(x) =

∂∂x1f(x)

∂∂x2f(x)...

∂∂xn

f(x)

.

Setting the gradient f ′(x) to zero, in (4.3.18), we get exactly the system(4.3.15) we have to solve. Therefore, the solution to Ax = b is a criticalpoint of f(x). Under the assumption we made on A, then this solution is aminimum of f(x), so Ax = b can be solved by finding an x that minimizesf(x).

Computing the Steepest Descendent method. We start with aninitial approximation x(0) and we shall generate the sequence of successiveapproximation x(1), x(2), . . . choosing the direction in which f decreases most


quickly, which is the direction opposite f ′(x(i)). According with equation(4.3.18) this direction is

−f ′(x(i)) = b− Ax(i). (4.3.19)

Recalling the notion of the residual, denoted by

ri = b− Ax(i), (4.3.20)

which means how far we are from the correct value of b, and recalling thenotion of the error,

ei = x(i) − x, (4.3.21)

it is easy to see that

ri = −Aei (4.3.22)

or, more important,

ri = −f ′(x(i)). (4.3.23)

So, the residual can be considered the direction of the steepest descendent.

Under this considerations, with the initial approximation x(0) given, weshall determine

x(1) = x(0) + αr0, (4.3.24)

with α a real value chosen to minimize f along a line.

From a basic calculus, α minimizes f when the directional derivatived

dαf(x(1)) is equal to zero

d

dαf(x(1)) = f ′(x(1))T

d

dαx(1) = f ′(x(1))Tr0 = 0. (4.3.25)

Relation (4.3.25) indicates that α minimizes f if f ′(x(1)) and r0 are or-thogonal.


To determine explicitly α, note that f ′(x(1)) = −r1 and then we have:

rT1 · r0 = 0

(b− Ax(1))T r0 = 0

(b− A(x(0) + αr0))T r0 = 0

(b− Ax(0))T r0 − α(Ar0)T r0 = 0

(b− Ax(0))T r0 = α(Ar0)T r0

rT0 r0 = αrT0 (Ar0)

α =rT0 · r0

rT0 ·A · r0.

Putting it all together, the method of Steepest Descent is:

ri = b− A · x(i)

αi =rTi · ri

rTi · A · rix(i+1) = x(i) + αi · ri.

Remark 4.3.9. It can be proved that the method of Steepest Descent isconvergent.

4.3.6 The multigrid method

Originally introduced as a way to numerically solve elliptic boundary-valueproblems, multigrid methods and their various multiscale descendants havebeen developed and applied to various problems, in many disciplines. It isamong the fastest solution techniques known today.

Let us consider the linear system of equations

A · x = b, (4.3.26)

with A ∈ Mn×n(R), x, b ∈ Mn×1(R) which models our problem. So, x willbe exact solution. If it is difficult to determine analytically, we try to find anapproximation xh ∈Mn×1(R), which approximates x with the smallest errorpossible. In order to do this, let’s consider the discretized problem attachedto (4.3.26), denoted

A · xh = b, (4.3.27)


obtained by means of any discretization method (finite difference, finite ele-ments, etc.).

The continuous definition domain will be covered with a mesh of N points(for a 1D problem), or N2 point (for 2D problem), etc., with the mesh size

h =1

N, or h =

1

N2, etc.

Remark 4.3.10. As in the literature, we use the name ”grid” for this mesh.

Computing the Multigrid method. In this case, the algebraic erroris

e = x− xh (4.3.28)

and the residual is

r = b−A · xh. (4.3.29)

Also, the residual equation which may be deduced from (4.3.28) and (4.3.29)is:

A · e = r. (4.3.30)

which means that the algebraic error satisfies the system (4.3.26), when inr.h.s. we have the residual.

Having all this in mind, we may have a first, very intuitively, image ofthe multigrid method:

1. Determine an approximation xh for x using a classic iteration (Jacobi,Gauss-Seidel, SOR, etc.).

2. Compute the residual, according with (4.3.29).3. Solve the residual equation (4.3.30).4. Improve the approximation xh:

x = xh + e.

Having these scheme in mind, we need to understand why the method iscalled multigrid. The answer is find in the way in which the previous stepsaffect the approximation error.

Using the Fourier analysis, one notices that the error vector is composedby two types of waves: more oscillatory and smoother. In order to reduce theerror, we want to smooth as much as possible the components of the error,which means, in fact to reduce it.

So, the multigrid method consists in two parts: applying the smoothingproperty and the coarse grid correction.


The smoothing properties. It has been proved that the classical it-erations, as Jacobi, Gauss-Seidel, etc. has this property, it means that theytake off the oscilatory components of the error, but leave the smoother com-ponents unchanged. That’s why we start the solving of the system (4.3.26)with a number of steps of some classical iterative method, in order to obtaina first approximation, xh, of the solution.

The Coarse-grid correction. In order to take off the smoother com-ponents, too, the information is transferred from the fine grid to a coarserone, with half number of points, because there the components of the errorlook less smooth.

Remark 4.3.11. The transfer of information is made by means of two ope-rators: one for prolongation (Ih2h) and one for reduction (I2h

h ).

The prolongation operator makes the transfer between the coarse gridto the fine grid; according to the formulas:

Ih2hx2h = xh,

where

xh2j = 12x2hj , 0 ≤ j ≤ N

2− 1 (4.3.31)

xh2j+1 = 12(x2h

j + x2hj+1).

The restriction operator makes the transfer between the fine grid tothe coarse grid, according with the formulas:

I2hh · xh = x2h,

where

x2hj =

1

4(xh2j−1 + xh2j + xh2j+1), 1 ≤ j ≤ N

2− 1. (4.3.32)

Remark 4.3.12. Formulas (4.3.31) and (4.3.32) shows that

I2hh = c(Ih2h)

T , with c ∈ R.

This fact is very important from computational point of view.We may write, by short, as follows.The Two-Grid-Algorithm:Relax Ah · x = bh on fine grid Ωh and get xh an initial approximation.Compute r2h = I2h

h (bh − Ahxh).


Solve A2h · e2h = r2h on coarse grid Ω2h.Correct xh ← xh + Ih2h · e2h.The Multigrid Algorithm is a recursive one, and implies this change of

information among several grids, the coarsest one being formed only withone point, where the solution of the system can be determined exactly.

So, the Multigrid Algorithm (MV) is the following:2. Relax Ahx = bh and get xh.2. If Ωh is the coarsest grid, then go to 3.

Elseb2h ← I2h

h (bh − Ahxh)x2h ← 0x2h ← MV (x2h, b2h)

3. Correct xh ← xh + Ih2hx2h.

Remark 4.3.13. The relaxation procedure may appear also at the end ofstep 3. The previous Multigrid Algorithm is called V-cycle algorithm. Thereexists, also, W-cycle or combined multigrid algorithms.

Chapter 5

Non-linear equations on R

Let Ω ⊂ R and f : Ω→ R. Consider the equation

f(x) = 0, x ∈ Ω, (5.0.1)

and attach a mapping

F : D → D, D ⊂ Ωn.

Let (x0, ..., xn) ∈ D. Using the mapping F and the numbers x0, ..., xn−1 weconstruct iteratively the sequence

x0, x1, ..., xn−1, xn, ..., (5.0.2)

wherexi = F (xi−n, ...xi−1), i = n, n+ 1, ... (5.0.3)

The problem is to choose F and the numbers x0, ..., xn−1 ∈ D such that thesequence (5.0.2) converges to a solution of equation (5.0.1).

Definition 5.0.14. The method of approximating a solution of equation(5.0.1) by the elements of the sequence (5.0.2), computed as in (5.0.3), iscalled F -method attached to the equation (5.0.1) and to the values x0, ..., xn−1.Numbers x0, ..., xn−1 are called starting values, and the pth element of the se-quence (5.0.2) is called pth approximation order of the solution.

If the set of the starting values consists in a single element, then the corre-sponding F -method is called one-step method, otherwise it is called multistepmethod.

209

210 Chapter 5. Non-linear equations on R

Definition 5.0.15. If the sequence (5.0.2) converges to a solution of theequation (5.0.1), then the F -method is said to be convergent, otherwise it isdivergent.

Definition 5.0.16. Let x∗ ∈ Ω be a solution of the equation (5.0.1) and letx0, ..., xn, ... be a sequence generated by a given F -method. The number phaving the property that

limxi→x∗

x∗ − F (xi−n+1, ..., xi)

(x∗ − xi)p= C 6= 0,

is called the order of the F -method, i.e., ord(F ) = p and the constant C isthe asymptotical error.

In the sequel we describe some general procedures of constructing classesof one-step and multistep F -methods, and we discuss also some simple par-ticular cases.

5.1 One-step methods

Let F be an one-step method, i.e., for a given xi we have

xi+1 = F (xi). (5.1.1)

Theorem 5.1.1. If x∗ is a solution of the equation (5.0.1), V (x∗) is a neigh-borhood of x∗ and F ∈ Cp[V (x∗)], p ∈ N∗, then

ord(F ) = p,

if and only if

F (x∗) = x∗, F ′(x∗) = . . . = F (p−1)(x∗) = 0, F (p)(x∗) 6= 0. (5.1.2)

Proof. Since F ∈ Cp[V (x∗)], Taylor’s formula gives rise to

F (xi) = F (x∗) +

p−1∑

k=1

1k!

(xi − x∗)kF (k)(x∗) + 1p!

(xi − x∗)pF (p)(ξi), (5.1.3)

where xi ∈ V (x∗) and ξi belongs to the interval determined by xi and x∗.

5.1. One-step methods 211

Assume that conditions (5.1.2) are verified. Then (5.1.3) implies

F (xi) = x∗ + 1p!

(xi − x∗)pF (p)(ξi), (5.1.4)

and it follows

limxi−x∗

F (xi)− x∗(xi − x∗)p

= 1p!F (p)(x∗) 6= 0.

Thus,

ord(F ) = p.

Let now consider p = ord(F ). We have to prove that conditions (5.1.2) aresatisfied. Assume either that there exists a number r < p − 1 such thatF (r)(x∗) 6= 0, or that F (p)(x∗) = 0. In both cases, from (5.1.3) and (5.1.4),respectively, it follows that

ord(F ) 6= p.

Theorem 5.1.2. Let x∗ be a solution of the equation (5.0.1), x0 ∈ Ω,

I = x | |x− x∗| ≤ |x0 − x∗| , x ∈ Ω,

F ∈ Cp(I), p is the order of F and

1p!

∣∣F (p)(x)∣∣ ≤M, x ∈ I.

If

M |x0 − x∗|p−1 < 1 (5.1.5)

then

1) xi ∈ I, i = 0, 1, ...,

2) limi→∞

xi = x∗,

where

xi+1 = F (xi), i = 0, 1, ...

Proof. 1) Obviously, x0 ∈ I. Assume that xi ∈ I. Since,

ord(F ) = p


it follows that (5.1.4) holds, and we can write it as

xi+1 − x∗ = Mi(xi − x∗)p, Mi = 1p!F (p)(ξ). (5.1.6)

Using (5.1.6), the definition of I and inequality Mi ≤ M, on interval I, ityields

|xi+1 − x∗| ≤M |xi − x∗|p

≤M |x0 − x∗|p

= M |x0 − x∗|p−1 |x0 − x∗|< |x0 − x∗| ,

which implies xi+1 ∈ I and, according to complete induction, it follows

xi ∈ I, i = 0, 1, . . . .

2) One considers again the inequality:

|xi+1 − x∗| ≤M |xi − x∗|p , i = 0, 1, ...,

which implies successively

|xi − x∗| ≤M |xi−1 − x∗|p

≤MMp |xi−2 − x∗|p2

≤ ... ≤ M1+p+...+pi−1 |x0 − x∗|pi

.

Hence,|xi − x∗| ≤M− 1

p−1 (M |x0 − x∗|p−1)pi/(p−1), for p > 1,

|xi − x∗| ≤M i |x0 − x∗| , for p = 1,(5.1.7)

and, taking into account (5.1.5), it follows that the sequence (xi)i∈N convergesto x∗.

Remark 5.1.3. Inequalities (5.1.7) also provide upper bounds for the abso-lute error in approximating x∗ by xi.

Remark 5.1.4. If p = 1, the convergence condition is M < 1, i.e.,

|F ′(x)| < 1, x ∈ I.Remark 5.1.5. If p > 1 there exists always a neighborhood of x∗ where theF -method converges.

Next we present some construction procedures for F -methods.


5.1.1 Successive approximations method

One of the easiest way to construct F -method consists in writing the equation(5.0.1) in the equivalent form:

x = F (x), with F (x) = x+ f(x),

and applying successive approximations method for approximating the fixedpoints of F, namely,

xn+1 = F (xn), n = 0, 1, ...,

where x0 is given. From (5.0.1) it follows that f(x∗) = 0 if and only F (x∗) =x∗, whence, any root of f is a fixed point of F, and viceversa. This methodis based upon the following result.

Theorem 5.1.6. (Picard-Banach). Let (X, d) be a complete metric spaceand let F : X → X be a contraction (i.e., it satisfies Lipschitz condition witha constant α < 1). Then, the following statements hold:

i) The mapping F has a unique fixed point x∗.

ii) If x0 ∈ X, then the sequence xn+1 = F (xn), n = 0, 1, ... converges tox∗.

iii) One hasd(xn, x

∗) ≤ αn

1−αd(x0, x1).

Proof. From

d(xn, xn+1) = d(F (xn−1), F (xn))

≤ αd(xn−1, xn) ≤ ... ≤ αnd(x0, x1).

it follows that

d(xn, xn+p) ≤ d(xn, xn+1) + ... + d(xn+p−1, xn+p) (5.1.8)

≤ (αn + ...+ αn+p−1)d(x0, x1)

≤ (αn + ...αn+p−1 + ...)d(x0, x1)

= αn

1−αd(x0, x1).


Hence, the successive approximations sequence is a Cauchy sequence. SinceX is a complete metric space, it follows that the latter sequence is also con-vergent. Denote its limit by x∗. The mapping F satisfies Lipschitz condition,so it is continuous. Passing to the limit in equality

xn+1 = F (xn),

it yieldsx∗ = F (x∗),

thus x∗ is a fixed point of F .We prove now the uniqueness of the fixed point. Assume the contrary,

namely that x∗1 and x∗2 are two distinct fixed points of F. One has

0 < d(x∗1, x∗2) = d(F (x∗1), F (x∗2)) ≤ αd(x∗1, x

∗2),

which yields the contradiction α ≥ 1.Finally, passing to the limit in (5.1.8) with respect to p, we obtain

d(xn, x∗) ≤ αn

1−αd(x0, x1).

Remark 5.1.7. The Picard-Banach theorem provides, beside the existenceand uniqueness conditions, an effective method for constructing the succes-sive approximations sequence, as well as an upper bound for the error inapproximating the fixed point by the approximation of order n. Since condi-tion

M := supx∈X|F ′(x)| < 1

implies α < 1, one can use in practical applications the inequality M < 1 asconvergence test.

A more practical upper bound for the approximation error is given in thenext result.

Lemma 5.1.8. Let x∗ be a fixed point of F and let (xn)n∈N be the successiveapproximations sequence generated by F, for a given x0, which converges tox∗. If

|F ′(x)| < α, x ∈ V (x∗),

then the following inequality holds:

|xn − x∗| ≤ α1−α |xn − xn−1| . (5.1.9)


Proof. Denoting f(x) = x− F (x), one has

f ′(x) = 1− F ′(x).

Conditions in the hypothesis imply

|f ′(x)| = |1− F ′(x)| ≥ 1− α, x ∈ V (x∗).

Taking into account that f(x∗) = 0, we obtain

|xn − F (xn)| = |f(xn)− f(x∗)|= |f ′(ξ)| |xn − x∗|≥ (1− α) |xn − x∗| ,

where ξ belongs to the interval determined by x and x∗. We have

|xn − x∗| ≤ 11−α |xn − F (xn)| = 1

1−α |xn+1 − xn| .

Using again Lagrange formula, this yields

|xn+1 − xn| = |F (xn)− F (xn−1)| ≤ α |xn − xn−1| ,

whence|xn − x∗| ≤ α

1−α |xn − xn−1| .

Remark 5.1.9. Denoting by ε the maximal admissible absolute error, i.e,|x− x∗| ≤ ε, from (5.1.9) we obtain

|xn − xn−1| ≤ 1−ααε,

which can be used as a stop criterium for the computation.

5.1.2 Inverse interpolation method

Let x∗ ∈ Ω be a solution of equation (5.0.1) and V (x∗) a neighborhood of x∗.Assume that f has the inverse on V (x∗) and denote g = f−1. Since f(x∗) = 0it follows that x∗ = g(0).

The inverse interpolation method for approximating a solution x∗ of equa-tion (5.0.1) consists in approximating the inverse g by means of a certain in-terpolation method, and x∗ by the value of the interpolating element at point


zero. This approach generates a large number of approximation methods forthe solution of an equation (thus, for the zeros of a function), according tothe employed interpolation method.

In order to obtain a one-step F -method, all information on f must begiven at a single point, namely the starting value. Hence, we are lead toTaylor interpolation.

Theorem 5.1.10. Let x∗ be a solution of equation (5.0.1), V (x∗) a neigh-borhood of x∗, f ∈ Cm[V (x∗)], such that

f ′(x) 6= 0, for x ∈ V (x∗),

and consider xi ∈ V (x∗). Then we have the following method for approximat-ing x∗, denoted by F T

m :

F Tm(xi) = xi +

m−1∑

k=1

(−1)k

k!fki g

(k)(fi), (5.1.10)

where g is the inverse of f and fi = f(xi).

Proof. From the hypothesis of this theorem it follows that there exists g =f−1 and that g ∈ Cm[V (0)],where V (0) = f(V (x∗)).

Let yi = f(xi) and consider Taylor interpolation formula:

g(y) = (Tm−1g)(y) + (Rm−1g)(y), (5.1.11)

relative to the function g and the node yi, where Tm−1g is (m− 1)th degreeTaylor polynomial, namely,

(Tm−1g)(y) =m−1∑

k=0

1k!

(y − yi)kg(k)(yi),

and Rm−1g is the corresponding remainder. Since x∗ = g(0) and g ≈ Tm−1g,it follows

x∗ ≈ (Tm−1g)(0) = xi +

m−1∑

k=1

(−1)k

k!yki g

(k)(yi),

whence,

xi+1 := F Tm(xi) = xi +

m−1∑

k=1

(−1)k

k!fki g

(k)(fi)


is an approximation of x∗, and F Tm is an approximation method for the solu-

tion x∗.Concerning the order of the method F T

m we state the following result.

Theorem 5.1.11. Let x∗ be a solution of equation (5.0.1), f ∈ Cm[V (x∗)]and f ′(x∗) 6= 0. If g = f−1 satisfies condition

g(m)(0) 6= 0,

then the method F Tm is of order m.

Proof. Consider the remainder of the Taylor interpolation formula written inLagrange form, namely,

(Rm−1g)(y) = 1m!

(y − yi)mg(m)(yi + θ(y − yi)), 0 < θ < 1.

Since (Rm−1g)(0) is the error of approximating x∗ by (Tm−1g)(0), one has

x∗ − F Tm(xi) = (−1)m

m!fimg(m)(yi(1− θ)). (5.1.12)

Taking into account that f(x∗) = 0, it follows

f(xi) = f(xi)− f(x∗) = (xi − x∗)f ′(ξi), (5.1.13)

where ξi belongs to the interval determined by xi and x∗. Then (5.1.13)implies

x∗ − F Tm(xi) = (−1)m

m!(xi − x∗)m[f ′(ξi)]

mg(m)(yi(1− θ)),

and thus,x∗ − F T

m(xi)

(x∗ − xi)m= 1

m![f ′(ξi)]

mg(m)(yi(1− θ)).

Since g(m)(0) 6= 0 and f ′(x∗) 6= 0, we obtain

limxi→x∗

x∗ − F Tm(xi)

(x∗ − xi)m= 1

m![f ′(x∗)]mg(m)(0) 6= 0,

whence,ord(F T

m) = m.


Remark 5.1.12. From (5.1.13) we get

∣∣x∗ − F Tm(xi)

∣∣ ≤ 1m!|fi|mMmg, with Mm = sup

y∈V (0)

∣∣g(m)(y)∣∣ ,

which provides an upper bound for the absolute error in approximating x∗

by xi+1.

Next we present two particular cases.1) Newton-Raphson methodThis method is obtained for m = 2, namely

F T2 (xi) = xi −

f(xi)

f ′(xi).

Thus, the Newton-Raphson method (also called the Newton’s method or thetangent method) is of order p = 2. According to Remark 5.1.5, there alwaysexists a neighborhood of x∗ where F T

2 is convergent. Choosing the startingvalue x0 in such a neighborhood, allows approximating the solution x∗ byterms of the sequence generated by F T

2 , namely

xi+1 = F T2 (xi), i = 0, 1, ..., (5.1.14)

with a prescribed error. As far as the approximation error is concerned,Remark 5.1.12 gives

∣∣x∗ − F T2 (xn)

∣∣ ≤ 12[f(xn)]

2M2g.

Lemma 5.1.13. Let x∗ ∈ (a, b) be a solution of equation (5.0.1) and let

xn = F T2 (xn−1).

Then|x∗ − xn| ≤ 1

m1|f(xn)| , m1 ≤ m1f = inf

a≤x≤b|f ′(x)| .

Proof. We use the mean’s formula

f(x∗)− f(xn) = f ′(ξ)(x∗ − xn),

with ξ belonging to the interval determined by x∗ and xn. From

f(x∗) = 0


and|f ′(x)| ≥ m1,

for x ∈ (a, b), it follows

|f(xn)| ≥ m1 |x∗ − xn| ,

i.e,|x∗ − xn| ≤ 1

m1|f(xn)| .

In practical applications the next evaluation is more useful.

Lemma 5.1.14. Let x∗ ∈ (a, b) be a solution of equation (5.0.1) and letxn = F T

2 (xn−1) be the nth order approximation generated by F2. If f ∈ C2[a, b]and F T

2 is convergent, then there exists n0 ∈ N such that

|xn − x∗| ≤ |xn − xn−1| , n > n0.

Proof. We start with Taylor’s formula:

f(xn) = f(xn−1) + (xn − xn−1)f′(xn−1) + 1

2(xn − xn−1)

2f ′′(ξ),

where ξ belongs to the interval determined by xn−1 and xn. Since

xn = F T2 (xn−1),

it follows thatf(xn−1) + (xn − xn−1)f

′(xn−1) = 0.

Thus, we obtainf(xn) = 1

2(xn − xn−1)

2f ′′(ξ).

Consequently,|f(xn)| ≤ 1

2M2f(xn − xn−1)

2,

and using (5.1.13) yields

|x∗ − xn| ≤ 12m1

(xn − xn−1)2M2f.

Since F T2 is convergent, there exists n0 ∈ N such that

12m1|xn − xn−1|M2f < 1, n > n0


whence,|x∗ − xn| ≤ |xn − xn−1| , n > n0.

In general, choosing the starting value is a non-trivial problem. Often thisvalue is chosen randomly, though taking into account the available informa-tion about the function f. If after a fixed number of iterations the requiredprecision was not achieved, i.e., condition |xn − xn−1| ≤ ε does not hold fora prescribed positive ε, the computation has to be started over with a newstarting value.

For certain classes of functions the problem of choosing the starting valuecan be significantly simplified. For example, if the solution x∗ is isolated inthe interval (a, b) and f ′′(x) 6= 0, x ∈ (a, b), then one can choose as a startingvalue in F T

2 -method one of the endpoints a or b. More precisely,

x0 =

a, f(a)f ′′(c) > 0 or f(b)f ′′(c) < 0,b, f(a)f ′′(c) < 0 or f(b)f ′′(c) > 0,

where c ∈ (a, b), (for example, c = (a+ b)/2).This statement can be intuitively justified using the geometric interpre-

tation of the method. Consider, for example, the case f(a) < 0, f(b) > 0and f ′′(x) < 0, for x ∈ (a, b). It follows immediately x0 = a. Indeed,

f(a)f ′′(a+b2

) > 0.

Next we consider the second particular case.2) Case m = 3.Using (5.1.10), we obtain

F T3 (xi) = xi −

f(xi)

f ′(xi)− 1

2

[f(xi)

f ′(xi)

]2f ′′(xi)

f ′(xi), (5.1.15)

which is a method of order p = 3 and thus converges faster than F T2 .

Remark 5.1.15. The higher the order of a method is, the faster the methodconverges. Still this doesn’t mean that a higher order method of approxi-mation is more efficient (taking into account the computation requirements)than a lower order approximation method. By the contrary, the results inthe literature show that the methods of relatively low order are the mostefficient ones, due to their simplicity. Within the one-step methods class, themost efficient methods (in the above sense) are considered F T

2 or F T3 .

5.2. Multistep methods 221

5.2 Multistep methods

Recall that if the set of the starting values consists of more than one element,the corresponding F -method is called multistep method.

In the sequel we will also use the inverse interpolation method to gen-erate multistep methods. Obviously, in this case there has to be used aninterpolation method with more that one interpolation node.

Let x∗ ∈ Ω be a solution of equation (5.0.1), let (a, b) ⊂ Ω be a neighbor-hood of x∗ that isolates this solution and x0, ..., xn ∈ (a, b) some given values.Denote by g the inverse function of f, supposing it exists. Because x∗ = g(0),the problem reduces to approximating the inverse function g by means of aninterpolation method with n > 1 nodes, for example Lagrange, Hermite,Birkhoff, spline, etc. We discuss first the use of Lagrange interpolation. Let

yk = f(xk), k = 0, ..., n,

whence,xk = g(yk).

To the data yk and g(yk), k = 0, ..., n, we attach the Lagrange interpolationformula:

g = Lng +Rng, (5.2.1)

where

(Lng)(y) =n∑

k=0

(y−y0)...(y−yk−1)(y−yk+1)...(y−yn)

(yk−y0)...(yk−yk−1)(yk−yk+1)...(yk−yn)g(yk).

If g ∈ Cn+1[c, d], where [c, d] is the image of the interval [a, b] through thefunction f, then

(Rng)(y) = u(y)(n+1)!

g(n+1)(η), c < η < d. (5.2.2)

TakingFLn (x0, ..., xn) = (Lng)(0),

it follows that FLn is a (n+ 1)-steps method defined by

FLn (x0, ..., xn) =

n∑

k=0

f0...fk−1fk+1...fn

(f0−fk).../...(fn−fk)xk, (5.2.3)

with fk = f(xk).Concerning the convergence of this method we state the following result.


Theorem 5.2.1. If x∗ ∈ (a, b) is a solution of the equation (5.0.1), f ′ isbounded on (a, b), and the starting values satisfy

|x∗ − xk| < 1c, k = 0, ..., n,

with the constant c given in the proof, then the sequence

xi+1 = FLn (xn−i, ..., xi), i = n, n+ 1, ...

converges to x∗.

Proof. From formulas (5.2.1)–(5.2.2) we get

x∗ − xn+1 = (Rng)(0)

and

x∗ − xn+1 = (−1)n+1cg

n∏

k=0

fk,

where

cg = 1(n+1)!

g(n+1)(η).

Using the mean’s formula, we have

f(xk) = f(xk)− f(x∗) = (xk − x∗)f ′(ξk),

whence,

x∗ − xn+1 = cg

n∏

k=0

(x∗ − xk)f ′(ξk).

Since f ′ is bounded, it follows that there exists a constant c such that

c |x∗ − xn+1| ≤n∏

k=0

c |x∗ − xk| . (5.2.4)


|x∗ − xk| < 1c,

we obtain

c |x∗ − xn+1| < 1.


Denoting M = max0≤k≤n

(c |x∗ − xk|), it follows that there exist the numbers

ak > 0 such thatc |x∗ − xk| = Mak ,

for k = 0, ..., n. Then, from (5.2.4) we get

Man+1 ≤Ma0+...+an,

thus,an+1 ≥ a0 + ...+ an.

Considering as starting values xn−i, ..., xi and applying the above argumentsfor i = n+ 1, n + 2, ..., we obtain

an+i ≥ ai−1 + ... + an+i−1, i = 1, 2, . . . . (5.2.5)

Since ai > 0, it follows that the sequence a0, ..., an, ... is monotonically in-creasing and

limn→∞

ai =∞.

Indeed, iflimn→∞

an = α <∞,

inequalities (5.2.5) lead to the contradiction α ≥ (n+ 1)α, for n > 1. Conse-quently, condition 0 < M < 1 implies

limi→∞

c |x∗ − xi| = limi→∞

Mai = 0,

thus,limi→∞

xi = x∗, (c 6= 0).

Remark 5.2.2. For n = 1, we get

FL1 (x0, x1) = x1 −

(x1 − x0)f(x1)

f(x1)− f(x0),

which is called the secant method. Thus,

xk+1 := FL1 (xk−1, xk) = xk − (xk−xk−1)f(xk)

f(xk)−f(xk−1), k = 1, 2, ...

is the new approximation obtained using the previous approximations xk−1,xk.


Practical F -methods for approximating the zeros of a function can beobtained using inverse Hermite or Birkhoff interpolation. Assume x∗ is asolution of equation (5.0.1), V (x∗) is a neighborhood of x∗ such that g = f−1

exists on V (x∗) and x0, x1 ∈ V (x∗).Consider the Birkhoff-type interpolation formula

g(y) = (B1g)(y) + (R1g)(y),

where

(B1g)(y) = (y − y1)g′(y0) + g(y1),

with y0 = f(x0), y1 = f(x1), and for y0 < y1

(R1g)(y) = 12(y − y1)(y + y1 − 2y0)g

′′(η), (5.2.6)

with η ∈ [y0, y1], for every y ∈ [y0, y1]. Taking FB1 (x0, x1) = (B1g)(0), we

obtain the FB1 -method, defined by

F1(x0, x1) = x1 − f(x1)f ′(x0)

.

Therefore we can state the following result.

Theorem 5.2.3. If the following conditions hold:

i) f(x0) < 0 < f(x1);

ii) there exists f ′(x), it is finite, and f ′(x) > 0, for x ∈ [x0, x1];

iii) there exists f ′′(x), it is finite, and f ′′(x) ≤ 0, for x ∈ (x0, x1);

then1) equation (5.0.1) has the unique solution x∗and x0 < x∗ < x1.2) the sequence (xn)n∈N, defined by the method FB

1 , namely,

xn+1 = FB1 (x0, xn), n = 1, 2, ... (5.2.7)

converges to x∗. Moreover, we have

x∗ ≤ xn+1 ≤ xn, n = 1, 2, ... (5.2.8)


Proof. Existence and uniqueness follow immediately from conditions i) andii). Using complete induction we first prove the inequalities (5.2.8). Indeed,we have x∗ < x1. Assume x∗ < xn. Using the remainder expression (5.2.6),we obtain

x∗ − xn+1 = (R1g)(0) =1

2[f(xn)− 2f(x0)]

f ′′(ξn)f ′(ξn)]3

.

From the inequalities

f(xn) ≥ 0, (xn ≥ x∗), f(x0) < 0, f ′(x) > 0, f ′′(x) ≤ 0,

for x ∈ (x0, x1), it follows

x∗ − xn+1 ≤ 0,

so

x∗ ≤ xn+1,

and thus the first inequality in (5.2.8) is proved. From (5.2.7), we get

xn+1 − xn = − f(xn)f ′(x0)

≤ 0,

whence xn+1 ≤ xn, and (5.2.8) is completely verified. Consequently, (xn)n∈N

is a decreasing and bounded sequence, thus it is convergent. We denote byx its limit. Then, (5.2.7) implies

limn→∞

f(xn)f ′(x0)

= limn→∞

(xn+1 − xn) = 0,

so

limn→∞

f(xn) = f(x∗) = 0,

and

x0 < x∗ < x1.

Since x∗ is unique, it follows x = x∗, thus, the theorem is completely proved.

In the sequel we present the FH2 -method obtained by using the inverse

Hermite interpolation relative to the function g = f−1, the simple nodey0 = f(x0) and the double node y1 = f(x1), namely,

g(y) = (H2g)(y) + (R2g)(y),


where

(H2g)(y) = (y−y1)2

(y0−y1)2 g(y0) + (y−y0)(2y1−y0−y)(y0−y1)2 g(y1)

− (y−y0)(y−y1)y0−y1 g′(y1)

and(R2g)(y) = (y−y0)(y−y1)2

6g′′′(η), y0 ≤ η ≤ y1.

We obtain

FH2 (x0, x1) = x1 −

[f(x1)

f(x0)−f(x1)

]2(x1 − x0)− f(x1)

f(x0)−f(x1)f(x0)f ′(x1)

(5.2.9)

andx∗ − FH

2 (x0, x1) = −f(x0)f2(x1)6

g′′′(η0). (5.2.10)

Theorem 5.2.4. If f satisfies the requirements of Theorem 5.2.3, g′′′(y) ≥ 0,for y ∈ [y0, y1], and (xn)n∈N is the sequence generated by FH

2 , i.e.,

xn+1 = FH2 (xn−1, xn), n = 1, 2, ...

then(x∗ − xn)(x∗ − xn+1) ≤ 0, n = 1, 2, ...

andlimn→∞

xn = x∗.

Proof. From equalities (5.2.9), (5.2.10) and the hypothesis of the theorem itfollows x2 < x1 and x2 ≤ x∗. Since f(x2) < 0 (f(x2) = 0 implies x2 = x∗) andf(x1) > 0 we also get x3 > x2 and x∗ ≤ x3. Assume now that xn−1 ≤ x∗ < xn.Then, f(xn−1) ≤ 0, f(xn) ≥ 0, thus xn+1 ≤ xn and xn+1 ≤ x∗, so the firstclaim is fulfilled.

In order to prove that limn→∞

xn = x∗, notice first from the construction of

FH2 that xn ∈ (x0, x1), n = 2, ..., i.e., (xn)n∈N is a bounded sequence, hence

it contains a convergent subsequence (xnk)k∈N. Let x = limk→∞ xnk

. Then

limk→∞

(xnk+1− xnk

) = 0,

and (5.2.9) implieslimk→∞

f(xnk) = 0 = f(x),

where x ∈ (x0, x1). Finally, from the uniqueness of the solution x∗ we getx = x∗.

5.3. Combined methods 227

5.3 Combined methods

In applications one needs to approximate the solution of an equation witha prescribed error. Usually, this error is evaluated by means of the distancebetween two consecutive approximations, say |xn − xn−1| . Since inequality|xn − xn−1| ≤ ε does not always imply |xn − x∗| ≤ ε, where ε ∈ R+ is given,it turned out that very useful are the so-called combined methods, where thesolution x∗ is approximated both from bellow and from above, thus it alwayslays between two consecutive approximations, namely

xsn ≤ x∗ ≤ xdn, n = 1, 2, ...

In such cases, the error in approximating x∗ by (xsn + xdn)/2 is not biggerthan (xdn − xsn)/2. In other words, the approximation error can be strictlycontrolled. In the sequel we describe some simple combined methods.

1) The tangent-secant methodIf function f is convex or concave on an interval (a, b) that isolates the

solution x∗ of equation (5.0.1), i.e., f ′′(x) 6= 0, x ∈ (a, b), then both successiveapproximation sequences generated by the tangent method and the secantmethod are monotone. Namely, if the first sequence is increasing, the secondone is decreasing, and viceversa. The endpoints of the interval a and b canbe respectively chosen as starting values. Thus, in order to approximatex∗ ∈ (a, b) we can apply simultaneously the tangent and the secant methods.Denoting by xtk and xsk, respectively, the corresponding approximations oforder k, we obtain:

(xt0, xs0) ⊃ (xt1, x

s1) ⊃ ... ⊃ (xtn, x

sn) ⊃ ..., (5.3.1)

or(xs0, x

t0) ⊃ (xs1, x

tx) ⊃ ... ⊃ (xsn, x

tn) ⊃ ..., (5.3.2)

where

xtk+1 = xtk −f(xtk)

f ′(xtk), k = 0, 1, ...,

xsk+1 = xsk −(xsk − xtk+1)f(xsk)

f(xsk)− f(xtk+1), k = 0, 1, ...,

depending whether f(a)f ′′ (a+b2

)> 0 or f(a)f ′′ (a+b

2

)< 0. In the first case

we have xt0 = a, xs0 = b, and x∗ ∈ (xtn, xsn), while in the second one xs0 = a,


xt0 = b and x∗ ∈ (xsn, xtn). Successive approximations will be computed until

there holds ∣∣xtn − xsn∣∣ ≤ 2ε,

where ε is a prescribed positive number. When the above inequality is sa-tisfied, we take

x∗ ≈ x =xtn + xsn

2,

and so,

|x∗ − x| ≤ ε.

2) The tangent method −FB1

This method is based on the following result.

Lemma 5.3.1. If the requirements of Theorem 5.2.3 are fulfilled, namely, ifthere take place:

i) f(x0) < 0 < f(x1);

ii) there exists f ′(x), it is finite, and f ′(x) > 0, for x ∈ [x0, x1);

iii) there exists f ′′(x), it is finite, and f ′′(x) ≤ 0, for x ∈ (x0, x1);

then the sequence

x0, x1, ..., xn, ...

generated by the following iterations

x2n = x2n−2 − f(x2n−2)f ′(x2n−2)

(5.3.3)

and

x2n+1 = x2n−1 − f(x2n−1)f ′(x2n−2)

, (5.3.4)

n = 1, 2, ..., converges to x∗. Moreover, x∗ ∈ [x2n, x2n+1], for all n ∈ N.

Proof. Notice that the iteration in (5.3.3) is the one in Newton’s method(i.e., tangent method), and inequalities

x2n ≤ x2n+2 ≤ x∗, n = 0, 1, ...,

are satisfied in the given hypothesis.

5.3. Combined methods 229

Furthermore, from Theorem 5.2.3 we get

x∗ ≤ x2n+1 ≤ x2n−1, n = 1, 2, . . . .

This method has the advantage that each component uses the value ofthe derivative f ′ on the same point.

Remark 5.3.2. A much simpler combined method can be obtained using(5.3.3) and (5.3.4), for a fixed x0, namely

x2n = x2n−2 − f(x2n−2)f ′(x0)

,

x2n+1 = x2n−1 − f(x2n−1)f ′(x0)

, n = 1, 2, ...,

where only the value of f ′ in x0 is needed. This method is recommendedwhenever a lot of work is involved in computing a value of the derivative.Other combined methods can be obtained using FB

1 , FH2 , F

T3 , the tangent

and the secant method.

Chapter 6

Non-linear equations on Rn

Let D ⊆ Rn, fi : D → R, i = 1, ..., n and consider the system:

fi(x1, ..., xn) = 0, i = 1, ..., n; (x1, ..., xn) ∈ D. (6.0.1)

Taking the application f : D → Rn, such that

(x1, ..., xn)f−→ (f1(x1, ..., xn), ..., fn(x1, ..., xn)),

system (6.0.1) can be written also as the vectorial equation

f(x) = 0, x ∈ D, (6.0.2)

where 0 is the null element of the space Rn.In the sequel we describe two methods for approximating the solutions of

a non-linear system on Rn.

6.1 Successive approximation method

Consider first the system (6.0.1) written in the form

xi = ϕi(x1, ..., xn), i = 1, ..., n; (x1, ..., xn) ∈ D, (6.1.1)

where the functions ϕi : D → R are continuous on D and such that for all(x1, ..., xn) ∈ D one has

(ϕ1(x1, ..., xn), ..., ϕn(x1, ..., xn)) ∈ D.

231

232 Chapter 6. Non-linear equations on RN

Let x = (x1, ..., xn) and ϕ = (ϕ1, ..., ϕn), then the system can be written as

x = ϕ(x), x ∈ D. (6.1.2)

Like in the real case, for a given starting value x(0), one generates the sequence

x(0), x(1), ..., x(n), ..., (6.1.3)

defined byx(m+1) = ϕ(x(m)), m = 0, 1, . . . . (6.1.4)

If the sequence (6.1.3) is convergent (the ϕmethod is convergent), and x∗ ∈ Ddenotes its limit, then x∗ is a solution of equation (6.1.2). Indeed, passing tothe limit in (6.1.4) and taking into account that ϕ is a continuous function,we obtain

limm→∞

x(m+1) = ϕ limm→∞

x(m),

that isx∗ = ϕ(x∗).

The convergence of the ϕ method remains to be studied. To that end wecan use the Picard-Banach theorem: if ϕ : Rn → Rn satisfies the contractioncondition

‖ϕ(x)− ϕ(y)‖ ≤ α ‖x− y‖ , x, y ∈ Rn,

with 0 < α < 1, then there exists a unique element x∗ ∈ Rn which verifiesequation (6.1.2) and it is the limit of the sequence (6.1.3). The error inapproximating x∗ by x(n) is evaluated by

∥∥x∗ − x(n)∥∥ ≤ αn

1−α∥∥x(1) − x(0)

∥∥ .

Remark 6.1.1. As in the case of equations on R, a sufficient condition fora continuous function ϕ, with first order partial derivatives continuous on adomain D ∈ Rn, to be a contraction is that

‖ϕ′‖ ≤ q < 1, on D,

where the norm of ϕ′ is the norm of the Jacobian matrix

ϕ′ =(∂ϕi

∂xj

), i, j = 1, ..., n.

For example,

‖ϕ′‖ = max1≤i≤m

n∑

j=1

maxx∈D

∣∣∣∂ϕi(n)∂xj

∣∣∣ .

6.1. Successive approximation method 233

Remark 6.1.2. If ϕ is defined on a subdomain of Rn, for example, on thesphere

S := S(x(0), r) = x∣∣∥∥x− x(0)

∥∥ < r, r ∈ R, x(0) ∈ Rn,

then the supplementary condition∥∥x(0) − ϕ(x(0))

∥∥ ≤ (1− α)r

has to be fulfilled in order to guarantee the convergence of the correspondingiterative process. This condition makes sure that ϕ(x) ∈ S, provided x ∈ S.

The successive approximations method can also be applied for equationsof the form (6.0.2), where f is a function defined and continuous in a neigh-borhood of the solution vector x∗. We first write the equation (6.0.2) in theform

x = x+Mf(x),

where M is a non-singular matrix. Denoting

x+Mf(x) = ϕ(x), (6.1.5)

equation (6.0.2) becomes x = ϕ(x), so the successive approximations methodcan now be applied. Next, we determine the matrix M such that the methodϕ = I +Mf is convergent. For this, we use condition

‖ϕ′‖ ≤ q < 1

and the obvious observation is that the smaller ‖ϕ′‖ is, the faster the iterativeprocess converges. From (6.1.5) we get

ϕ′(x) = I +Mf ′(x).

We determine the matrix M such that for the starting value x(0) one has

ϕ′(x(0)) = 0.

It followsI +Mf ′(x0) = 0,

thus, if f ′(x0) 6= 0 we obtain

M = −(f ′(x(0)))−1.

If f ′(x(0)) = 0 another starting value x(0) has to be chosen.


6.2 Newton’s method

Consider again equation (6.0.2). Let x∗ ∈ D be a solution of this equationand let x(p) be an approximation of x∗. Denoting by ε(p) the error producedin approximating x∗ by x(p), we have

x∗ = x(p) + ε(p).

Hence,f(x(p) + ε(p)) = 0. (6.2.1)

Assuming that f is derivable in a convex domain that contains x∗ and x(p),the left side term in (6.2.1) can be approximated by the first two terms ofthe Taylor’s expansion of f in the neighborhood of x(p), namely,

f(x(p) + ε(p)) ≈ f(x(p)) + f ′(x(p))ε(p).

Thus, (6.2.1) can be approximated by

f(x(p)) + f ′(x(p))ε(p) = 0, (6.2.2)

which leads to the following linear algebraic systems of equations with theunknowns ε

(p)i , i = 1, ..., n:

∂∂x1fi(x

(p)1 , ..., x(p)

n )ε(p)1 +...+ ∂

∂xnfi(x

(p)1 , ..., x(p)

n )ε(p)n (6.2.3)

= −fi(x(p)1 , ..., x(p)

n ), i = 1, ..., n

Denoting

w(x(p)) = f ′(x(p)) =(

∂∂xjfi(x

(p)))i,j=1,...,n

and in the hypothesis that w(x(p)) is a non-singular matrix, we obtain

ε(p) = −w−1(x(p))f(x(p)),

where w−1(x(p)) is the inverse of the matrix w(x(p)). Thus, we get a value forthe correction vector ε(p).

If we denote by x(p+1) the approximation obtained by applying to x(p) thecorrection ε(p), we reach the iterative method:

x(p+1) = x(p) − w−1(x(p))f(x(p)), p = 0, 1, ..., (6.2.4)

6.2. Newton’s method 235

which is called Newton’s method.Passing to the limit in (6.2.4), it immediately follows that if (x(p))p∈N is a

convergent sequence with limit denoted by x∗, then x∗ is the solution of thediscussed equation.

Concerning the convergence of the sequence we can state the followingresult.

Theorem 6.2.1. Let f ∈ C2(D), x(0) ∈ D and r ∈ R such that

S0(x(0), r) = x

∣∣∥∥x− x(0)∥∥ ≤ r ⊂ D.

If the following conditions are verified:

i) there exists the matrix M0 = w−1(x(0)) and ‖M0‖ ≤ A0,

ii)∥∥M0f(x(0))

∥∥ ≤ B0 ≤ r/2,

iii) ‖f ′′(x)‖ ≤ C, x ∈ S0,

iv) η0 := 2nA0B0 ≤ 1,

then, for the starting value x(0), the iterative process (6.2.4) is convergent,the vector

x∗ = limp→∞

x(p)

is the solution of the given equation and one has

∥∥x∗ − x(k)∥∥ ≤ 1

2k−1η2k−10 B0.

Proof. We start showing that all the conditions i)− iv), valid for x(0), alsohold for the approximation x(1), given by (6.2.4). Indeed, one has

∥∥x(1) − x(0)∥∥ =

∥∥w−1(x(0))f(x(0))∥∥ =

∥∥M0f(x(0))∥∥ ≤ B0 ≤ r/2,

whence x(1) ∈ S1(x(0), r/2) ⊂ S0.

In order to prove the existence of M1 = w−1(x1)), we use the property(A · B)−1 = B−1 · A−1, and, thus, we obtain

M1 = ω−1(x(1)) = (w(x(0))M0w(x(1))−1 = (M0w(x(1)))−1M0.


Using condition i), we can write

∥∥I −M0w(x(1))∥∥ =

∥∥M0(w(x(0))− w(x(1)))∥∥ (6.2.5)

≤ ‖M0‖∥∥w(x(0))− w(x(1))

∥∥

≤ A0

∥∥w(x(1))− w(x(0))∥∥ .

Using the inequality

‖f ′(x+ ∆x)− f ′(x)‖ ≤ n ‖∆x‖ ‖f ′′(ξ)‖ , ξ = x+ α∆x, 0 < α < 1

and conditions iii), iv), relation (6.2.5) becomes

∥∥I −M0w(x(1))∥∥ ≤ nA0B0C = η0/2 ≤ 1/2. (6.2.6)

The latter inequality assures the existence of the inverse matrix

[M0w(x(0))]−1 = [I − (I −M0w(x(1))]−1

and one has ∥∥[M0w(x(1))]−1∥∥ ≤ 1

1−η0/2 ≤ 2.

Therefore, there exists the matrix

M1 = [M0w(x(1))]−1M0

and for its norm we get

‖M1‖ =∥∥[M0w(x(1))]−1M0

∥∥ ≤∥∥M0w(x(1))]−1

∥∥ ‖M0‖ ≤ 2A0 = A1. (6.2.7)

Thus condition i) is also valid for the approximation x(1).Using formulas (6.2.4), yields

∥∥f(x(1))∥∥ =

∥∥f(x(1))− f(x(0))− f ′(x(0))(x(1) − x(0))∥∥ (6.2.8)

≤ 12

∥∥x(1) − x(0)∥∥2 ‖f ′′(ξ)‖ ≤ 1

2nB2

nC,

where ξ = x(0) + θ(x(1) − x(0)), 0 < θ < 1.We mention that in order to obtain the first inequality above we used

relation

‖f(x+ ∆x)− f(x)− f ′(x)∆x‖ ≤ 12n ‖∆x‖2 ‖f ′′(ξ)‖ ,

6.2. Newton’s method 237

with ∆x = x(1) − x(0).From (6.2.7) and (6.2.8) it follows

∥∥M1f(x(1))∥∥ ≤ ‖M1‖

∥∥f(x(1))∥∥ ≤ 2A0

12nB0C

= nA0B0C = 12η0B0 = B1,

so condition ii) also holds for x(1).Since condition iv) is verified, it remains to show η1 = 2nA1B1C ≤ 1. We

haveη1 = 2n2A0

12η0B0C = η02nA0B0C = η2

o ≤ 1.

Using the same argument, we deduce that the successive approximationsx(p), p = 1, 2, ... generated by (6.2.4) can be defined, and one has x(p) ∈Sp(x

(p−1), r/2p−1), with

S0(x(0), r) ⊃ S1(x

(1), r/2) ⊃ ... ⊃ Sp(x(p), r/2p) ⊃ ...

Furthermore, we have

‖Mp‖ =∥∥w−1(x(p)

∥∥ ≤ Ap, (6.2.9)∥∥Mpf(x(p))

∥∥ =∥∥x(p+1) − x(p)

∥∥ ≤ Bp, p = 1, 2, ...,

where the constants Ap, Bp verify the relations

Ap = 2Ap−1, (6.2.10)

Bp = 12ηp−1Bp−1,

with ηp = 2nApBpC, p = 1, 2, ...Now, we show that the sequence generated by (6.2.4) is fundamental.

Indeed, one hasx(p+q) ∈ Sp(x(p), r/2p), q ∈ N

∗,

that is ∥∥x(p+q) − x(p)∥∥ ≤ r/2p.

In other words, for every ε > 0, there exists n0 ∈ N, such that∥∥x(p+q) − x(p)

∥∥ < ε, p > n0 and q ∈ N∗,

so (x(p))p∈N is a fundamental sequence. Since x(p) ∈ S0(x(0), r), p = 0, 1, ...

and S0(x(0), r) is a closed set, it follows that the above sequence is convergent.

Letlimp→∞

x(p) = x∗ ∈ S0(x(0), r).


In order to prove that x∗ is the solution of equation (6.0.2), we write equality(6.2.4) in the form

w(x(p))(x(p+1) − x(p)) = f(x(p)).

Passing to the limit and taking into account the continuity of f and f ′, itfollows f(x∗) = 0.

Using (6.2.9) and (6.2.10), we obtain

∥∥x(p+1) − x(p)∥∥ ≤ Bp = 1

2p η2p−10 B0,

which implies

∥∥x(k+p) − x(k)∥∥ ≤

∥∥x(k+p) − x(k+p−1)∥∥+ ... +

∥∥x(k+1) − x(k)∥∥

≤ B0

(1

2k+p−1 η2k+p−1−10 + ... + 1

2k η2k−10

)

= B0

2k η2k−10

(1 + 1

2η2k

0 + ...+ 12p−1 η

2k(2p−1−1)0

)

≤ B0

2k η2k−10

(1 + 1

2+ ... + 1

2p−1

).

Passing to the limit with respect to p, yields

∥∥x∗ − x(k)∥∥ ≤ B0

2k−1 η2k−10 ,

so the theorem is completely proved.

Chapter 7

Parallel calculus

7.1 Introduction

Parallel calculus, or parallel computation, is the process of solving problemson parallel computers. The 1980s will go down in history as the decade inwhich parallel computing first had a significant impact in the scientific andcommercial worlds. The problems have arisen from real world demand com-puters that are many order of magnitude faster than the fastest conventionalcomputers. Parallelism represents the most feasible avenue to achieve thiskind of performances. The modern improvements in hardware have broughttremendous advances, e.g., by means of ultra large-scale integrated circuits(VLSI). So, the processor has born and today there exist many supercom-puters which work with thousands and hundred of thousands processors, sothe parallel algorithms have a strong technique support.

From the software point of view, many programs that run well on sequen-tial computers (serial computers) are not easily transformed to programs thatefficiently can be performed on parallel computers. Conversely, algorithmsthat are less efficient in a sequential context often reveal an inherent paral-lelism that makes then attractive to parallel programs.

Anyway, the parallel computation does not replace all the serial programs(of course, the best and the most efficient of then, remain), but it is a veryattractive alternative in order to improve the speed-up of execution. Inparallel calculus, the problem have to be rethink in order to be perform withmore than one processor. This is the fascinating part of parallel calculus.

In the following section we try to indicate some parallel ways of solving

239

240 Chapter 7. Parallel calculus

several of the numerical methods presented in previous chapters.

7.2 Parallel methods to solve triangular sys-

tems of linear equations

7.2.1 The column-sweep algorithm

Let us consider the systemA · x = b

with A ∈Mn×n(R) triangular matrix and x, b ∈ Mn×1(R),We begin by defining a linear recurrence system R < n, m > of order m

for n equations by

R < n,m >: xk =

0, if k ≤ 0,

b′k +k−1∑

j=k−ma′kjxj , if 1 ≤ k ≤ n,

(7.2.1)

where m ≤ n− 1. If we write A = [aik], with aik = 0 for i ≤ k or i− k > m,x = [x1, ..., xn]

T and b = [b1, ..., bn]T , then (7.2.1) can be written as:

x = A′ · x+ b′. (7.2.2)

Note: A′ and b′ can be easily obtained by A and b by division with nonzeroelements.

Relation (7.2.2) is, e.g., R < 4, 3 >:

x1 = b′1 (7.2.3)

x2 = b′2 + a′21x1

x3 = b′3 + a′31x1 + a′32x2

x4 = b′4 + a′41x1 + a′42 + a′43x3.

In order to illustrate the column-sweep algorithm, we take a 4× 4 system,

A · x = b,

with

A =

1−a21 1−a31 −a32 1−a41 −a42 −a42 1

, (7.2.4)

7.2. Parallel methods to solve triangular systems of linear equations 241

x =

x1

x2

x3

x4

, b =

b1b2b3b4

.

Using (7.2.2), we get

x1 = b1 (7.2.5)

x2 = b2 + a21x1

x3 = b3 + a31x1 + a32x2

x4 = b4 + a41x1 + a42x2 + a43x3,

from which we can evaluate x1, x2, x3 and x4 in parallel as follows:

Step 1 Evaluate, in parallel, expressions of the from

b(1)i = bi + ai1, for i = 2, n,

where x1 = b1 is known. Now there remain n− 2 equations to solve.

Step 2. Evaluate, in parallel, expressions of the form

b(2)i = b

(1)i + ai2x2, i = 3, n,

where x1 and x2 are known. Now there remain n − 3 equations to besolved.

· · ·

Step k. Evaluate, in parallel, expressions of the from

b(k)i = b

(k−1)i + aik · xk, i = k + 1, ..., n,

where x1, ..., xk.

Remark 7.2.1. Using a network with n processors, the execution time forthe column-sweep algorithm is n times reduced comparing with the executiontime involved in solving a triangular system on a serial computer.


7.2.2 The recurrent-product algorithm

Recalling the method presented in the previous section, we can express thesolution x as a product of elementary matrices that can easily be computedusing the recursive-doubling technique. Thinking in this way, Sameth andBrent gave the following method for solving a triangular system of equations,known also as the recurrent-product algorithm.

According with it, relation (7.2.2) can be written

x = (I − A′)−1 · b′, (7.2.6)

where matrix A′ is strictly lower triangular and (I −A′)−1 can be expressedin the product from:

(I − A′)−1 =n−1∏

i=1

Mn−i, (7.2.7)

where

Mi =

1 0. . .

10 ai+1,i 1

.... . .

an,i 0 1

.

The productn−1∏i=1

Mn−i can be carried out using the recursive-doubling tech-

nique, on a binary tree connectivity among processors.

Example 7.2.2. Suppose we take a banded lower triangular system R 〈5, 2〉 .Then, relation (7.2.6) is:

x1 = b′1 (7.2.8)

x2 = b′2 + a′21x1,

x3 = b′3 + a′31x1 + a′32x2

x4 = b′4 + a′42x2 + a′43x3

x5 = b′5 + a′53x3 +′54 x4.

Hence,x = M4 ·M3 ·M2 ·M1 · b′, (7.2.9)

7.3. Parallel approaches of direct methods for solving systems of linear equations243

where

M4

11

1 00 1

a′54 1

; M3 =

11 0

10 a′43 1

a′53 0 1

M2 =

1 00 10 a′32 10 a′42 0 10 0 0 0 1

; M1 =

1a′21 1 0a′32 10 0 10 1

.

The matrix product in (7.2.9) can be evaluated using the recursive-doubling,as follows: (((M4M3)(M2M1)) · b′) which means that we compute all theinner brackets at step1, then all ”next” brackets at step2, and so on. Thecomputations of the inner brackets, etc. are made simultaneously, by twoprocessors. The whole computation needs three parallel steps.

Remark 7.2.3. The problem of inverting of a lower triangular matrix canbe made, with parallel calculus, in several other ways. For example, Borodinand Munro proposed a method that use date partitioning.

7.3 Parallel approaches of direct methods for

solving systems of linear equations

In this section we discuss several techniques for solving linear algebraic sys-tems which are particularly suited to parallel computation. It should benoted that the choice of method depends on the type of parallel environmentused.

7.3.1 The parallel Gauss method

Let us consider the system

A · x = b, (7.3.1)


with A ∈Mn×n(R), x, b ∈Mn×1(R),or, in the explicitly from:

a11x1 + a12x2 + ... + a1nxn = b1a21x1 + a22x2 + ... + a2nxn = b2

...an1x1 + an2x2 + ...+ annxn = bn.

(7.3.2)

A possible parallel execution with n processors is the following:

Stage 1: all n processors work together

Step 1: Considering a11 6= 0,

for j : 2,n in parallel do

a(1)1j := a1j/a11; b

(1)1 = b1/a11.

Step 2: for j := 2,n in parallel do

a(1)ij = aij − a(1)

1j ∗ ai1; b(1)i = bi − b(1)1 ∗ ai1.

After Step 1 and Step 2, the following system is obtained:

x1 + a(1)12 x2 + ... + a

(1)1nxn = b

(1)1

a(1)22 x2 + ...+ a

(1)2n xn = b

(1)2

......

...

a(1)n1x2 + ... + a

(1)nnxn = b

(1)n .

(7.3.3)

Stage 2: n− 1 processors work together

Repeat computations of type from Step 1 and Step 2, with a22 6= 0

...

Stage n− 1 : 2 processors work together

Repeat computations of type from Step 1 and Step 2, with an−1,n−1 6= 0.


Finally, we get the system:

x1 + a(1)12 x2 + ...+ a

(1)1,n−1xn−1 + a

(1)1n xn = b

(1)1

x2 + ...+ a(2)2,n−1xn−1 + a

(2)2n xn = b

(2)2

. . .

xn−1 + a(n−1)n−1,nxn = b

(n−1)n−1

a(n−1)nn xn = b

(n−1)n .

(7.3.4)

So, the problem has reduced to solving of the triangular system (7.3.4). Oneway to do it in parallel is by means of techniques presented in section 7.2.Another possibility is to reactivate the processors, one by one, and to get thesolution starting with the last equation.

Remark 7.3.1. The Gauss method can be parallelized, as we saw, but thefinal algorithm is not very efficient, because the n processors are not all ofthem used, at every step. From this point of view, the Gauss-Jordan methodis more appropriate for the parallel calculus.

7.3.2 The parallel Gauss-Jordan method

There are several possibilities to perform the Gauss-Jordan method by usingmore than one processor. So, considering the system

A · x = b, (7.3.5)

with A ∈ Mn×n(R), x, b ∈ Mn×1(R), aii 6= 0, i = 1, n, and n processors on aSIMD type machine, the parallel method is the following:

Step 1. The system (7.3.5) is transformed in:

x1 + a(1)12 x2 + ...+ a

(1)1n xn = b

(1)1

a(1)22 x2 + ... + a

(1)2nxn = b

(1)2

......

...

a(1)n2 x2 + ...+ a

(1)nnxn = b

(1)n ,

(7.3.6)

where

a(1)1j = a1j/a11, j = 2, n, b

(1)1 = b1/a11

a(1)ij = aij − a(1)

1j · ai1, b(1)i = bi − b(1)1 · ai1, i, j = 2, n.


The n processors work as follows: one processor computes a(1)12 , a

(1)13 , ..., a

(1)1n ,

b(1)1 , one after another. In the moment where a

(1)12 is computed, the

other n − 1 processors compute the elements of the second column:a

(1)i2 , i = 2, n, then the elements of the third column, a

(1)13 , etc., until

they compute also b(1)i , i = 2, n.

Step 2. Meanwhile, the first processor, which already finished the computa-tion of b

(1)1 , continues with a

(2)23 = a

(1)23 /a

(1)22 . The rest of n− 1 processors

will start the computation which eliminated x2, it means the values

a(2)13 = a

(1)13 − a(1)

12 · a(2)23 , b

(2)1 = b

(1)1 − a(1)

12 · b(2)2

a(2)i3 = a

(1)i3 − a

(1)i2 · a

(2)23 , b

(2)i = b

(1)i − a

(1)i2 · b

(2)2 , i = 3, n.

And so on. After n steps, we get the solution:

xi = b(n)i , i = 1, n. (7.3.7)

Remark 7.3.2. Another parallel approach can be made. Instead of dividingthe amount of work among one processor (which performs only a divisionoperation, all the time) and the other n − 1 processors (which perform amultiplication and an addition, all the time), Modi suggests a parallel exe-cution in which all n processors work together, on the same column, in orderto eliminate the elements upon and under the element aii, i = 1, n , so, aftern steps, we get the final from (7.3.7) of the system.

7.3.3 The parallel LU method

To get a parallel LU Method, the idea of parallel Gauss method (see section7.3.1) can also be used to get the matrices L and U of the decomposition

A = L · U.

Another possibility is to use the double recursive technique to get the matrixproduct which form matrix U, and then matrix L, as there was presentedin section 4.2.1.4, in the case of Doolittle algorithm. Also, as we alreadystated, the Block LU decomposition, presented in section 4.2.1.5, is moreappropriate to the parallel calculus, due to the fact that the operation in theblock-matrices can be perform simultaneously.


7.3.4 The parallel QR method

Recalling the QR method, presented in section 4.2.1.7, a parallel approachcan be obtain by using the Givens rotation, because several rotations canbe performed simultaneously. So, if we use a different notation and denoteby (pk, qk), k = 1, mn − 1

2n2 − 1

2n, if A∈Mn×n(R) the fact that a Givens

rotation G(Pk, qk) is performed between the rows Pk and Pk − 1, in order toannihilate the (Pk, qk) element of

G(pk−1, qk−1)...G(p1, q1)A.

According to Gentelman, the set S of all this rotations is partitioned intosubsets Sr, r = 1, m+ n− 2, so that

Sr = (p, q) |1 ≤ q < p ≤ m, q ≤ n, m+ 2q = p+ r + 1.

The rotations Sr are disjoint, and therefore they can be performed simulta-neously. The corresponding product matrix

Q(r)∏G(p, q) |(p, q) ∈ Sr,

QT = Q(n)...Q(1), QT ·A = R,

hence we get

A = Q · R.

7.3.5 The parallel WZ method

Recalling the WZ Method (see section 4.2.1.8), we notice that the matrix Acan be factorized

A = WZ,

where

W =

1 0

w21 1. . . 0 w2n

... 1...

wn−1,1 0. . . 1 wn−1,n

0 1


and

Z =

z1,1 · · · z1,n0 z2,2 · · · z2,n−1 0... 0

. . . zi,j · · ·0

. . .

zn,1 · · · zn,n

,

with elements wij and zij, i, j = 1, n computed according to the formulaspresented in section 4.2.1.8. The system A · x = b, it means WZ · x = b, canbe solved in two steps:

W · p = b, (7.3.8)

and we get the n-elements vector p, then

Z · x = p, (7.3.9)

which generates the solution x. From (7.3.8), we have

1 0w21 1 0 w2n...

. . .

1

wn−1,1. . . 1 wn−1,n

0 1

·

p1

p2...

pn−1

pn

=

b1b2...

bn−1

bn

, (7.3.10)

hence, first we compute p1 and ρn, then p2 and pn−1, and so on, (each timeworking from the top and the bottom). At the i-th step we have

pi = b(1)i , pn−i+1 = b

(i)n−i+2,

whereb(1)j = bj

and

b(i+1)j = b

(i)j − wj,i.pi − wj,n−i+1.pn−i+1 (j = i+ 1, ..., n− i). (7.3.11)

Relations (7.3.11) can be evaluated in parallel, using the double recursivetechnique. Then, in order to solve (7.3.9), the computation of x is similar,so it can be made, also, in parallel.

7.4. Parallel iterative methods 249

7.4 Parallel iterative methods

As we saw in section 4.3, the iterative methods for solving a linear systemof equations are useful when the dimension of the problem that has to besolved increases, and so the direct methods become almost impractical. Froma parallel execution point of view, the iterative methods are more attractive,in spite of the fact that they do not guarantee a solution for every systemof equations due to the convergent of the method. In what follows, we shallgive parallel approaches for some iterative methods.

7.4.1 Parallel Jacobi method

Recalling the expressions for computing the components of vector x at thestep k, starting with an initial value x(0) (see section 4.3.1)

x(k)i = (bi −

n∑

j=1j 6=i

aij · x(k−1)j )/aii, i = 1, n with aii 6= 0,

there exist several ways to perform them using more than one processor.If we work with n processors, every processor i can compute a value xi

at a given step k. The only problem, here, is that the processors have tocommunicate among then, because, at every step, every processor needs toknow all the other values computed by all the other processors.

Another parallel approach for Jacobi method was given by Wong.In order to present it, we rewrite the method in matricial from (see also

formula (4.3.7)):

A · x = b

b− A · x = r

A = L+D + U

A · x = (L+D + U)x + b

Dx(k+1) = −(L+ U)x(k) + b

x(k+1) = −(L+ U)x(k)/D + b/D

x(k+1) = −(L+D + U)x(k)/D + b/D + x(k)

x(k+1) = x(k) + r(k)/D.


Remark 7.4.1. The method is presented by means of the residual error.

Algorithmically, the Jacobi method of this form is the following:

Remark 7.4.2. Step 1. k = 0

Step 2. Initialize x(0).

Step 3. r(0) = b− A · x(0)

Step 4. While (∥∥r(k)

∥∥ > ε) do

Step 5. k = k + 1

Step 6. for i = 1 to n do

Step 7. x(k)i = r(k−1)/aii + x

(k−1)i

Step 8. end for

Step 9. r(k) = b− A · x(k)

Step 10.∥∥r(k)

∥∥ = (n∑i=1

(r(k)i )2)

12

Step 11. end while

Step 12. x = x(k).

Remark 7.4.3. We have effective computation in steps 7,9 and 10.

The parallel implementation of this algorithm, using n processors, is thefollowing:

Step 0: Initialize k and x(0).

Step 1 and 2: Assign the x(0)value to n processors.

Step 3: r = b−A ·x matrix-vector multiplication, computation made by allprocessors, without any communication among them.

Step 3+: Update vector r. Communication is required.

Step 4 and 10: Calculate residue ‖r‖ . Communication is required.

Step 7: Update value of vector x. Communication is required


Step 9: Calculate r = b−A · x, matrix-vector multiplication.

Step 9+: Update vector r; communication is required.

Remark 7.4.4. For parallel matrix multiplication, see the bibliography.

7.4.2 Parallel Gauss-Seidel algorithm

As for the Jacobi method, there are several parallel approaches for the Gauss-Seidel method.

For instance, the following idea is used: having n processors, and due tothe fact that the computation of a component x

(k)i , i = 1, n involves some

values already computed at the same iteration k, according to the formulas:

x(k)i =

1

aii

(bi −

i−1∑

j=1

aij · x(k)j −

n∑

j=i+1

aij · x(k−1)j

)i = 1, n

all the processors have to work together in order to determine the value ofone x

(k)i .

P. Wong presented another possibility, by means of the residual value.The Gauss-Seidel iteration is given as follows:

Ax = b (7.4.1)

b− Ax = r, A = (L+D + U)

Ax = (L+D + U)x = b

(D + L)x = −(U)x + b

(D + L)x(k+1) = −(U)x(k) + b

x(k+1) = −(D + L)−1 · U · x(k) + (D + L)−1 · bx(k+1) = −(D + L)−1 · (U +D + L)x(k) + x(k) + (D + L)−1 · bx(k+1) = x(k) + (D + L)−1 · rk.

In algorithmic manner, the relations (7.4.1) are the following:

Step 1: k = 0

Step 2: initialize x0

Step 3: r0 = b− A · x0


Step 4: while (‖rk‖ > ε) do

Step 5: k = k + 1

Step 6: for i = 1 to n do

Step 7: x(k)i = x

(k−1)i + (L+D)−1rk−1

Step 8: end for

Step 9: rk = b−A · xk

Step 10: ‖rk‖ =

(n∑i=1

r2k(i)

) 12

Step 11: end while

Step 12: x = xk.

Remark 7.4.5. The Gauss-Seidel algorithm appears to be identical withJacobi’s, however, here step7 is completely sequential because xi cannot becomputed until xi−1 has been computed. In fact, step7 can be rewritten as:

x(k)i =

(bi −

i−1∑

j=1

aij · x(k)j −

n∑

j=i+1

aij · x(k−1)j

)/aii

=((b− Ux(k−1))

)− Lx(k)

)/aii.

Hence, we can reuse only portion of the residue r calculation. These calcu-lations can be done in parallel.

Remark 7.4.6. The Gauss-Seidel method used to solve discretized problemsattached to PDE equations generates different type of parallel execution, e.q.the checker-board execution (or red-black execution). The type used takesinto account the numbering of the nodes in the discretized network and theirupdating. These ideas may be found also in the next section.


7.4.3 The parallel SOR method

As we stated in the previous section, the different types of parallel executioncan be shown also by means of the modified Gauss-Seidel method, which isSOR method.

As we have seen in section 4.3.3, using a parameter ω (0 < ω < 2) toimprove the convergence speed, the values of x are updated as follows:

x(k)i =(1− ω)x

(k−1)i + ω(bi −

i−1∑

j=1

aij · x(k)j (7.4.2)

−n∑

j=i+1

aij · x(k−1)j )/aii.

In order to get a good parallelization of the computations (7.4.2), we ”color”the matrix A in such a way that no two neighboring nodes are connectedso that update information are independent, and then they can be madesimultaneously.

Example 7.4.7. Consider a 1D discretized problem, with information in thepoints:

Figure 7.1: The set of points.

Then, the computation of values given by (7.4.2) can be carried on in twostages, according with the red-black type of execution:

Example 7.4.8. Stage 1: Work on even number of node points,

(so only node 2,4,6,8 are updated)

for i = 1 to n in parallel do

x(k)i =

(bi −

i−1∑

j−1

aijx(k)j −

n∑

j=i+1

aij · x(k−1)j

)/aii

end


Figure 7.2: The set of even points.

Stage 2: Work on odd number of node points, (so only node 1,3,5,7,9 areupdated)for i = 1 to n in parallel do

Figure 7.3: The set of odd points.

x(k)i =

(bi −

i−1∑

j=1

aij · x(k)j −

n∑

j=i+1

aij · x(k−1)j

)/aii

end for.

7.5 The parallelization of the multigrid method

C.C. Douglas considered that two major classes of methods for parallelizationof the multigrid method are used: telescoping and nontelescoping.

7.5.1 Telescoping parallelizations

As we have seen in section 4.3.6, the multigrid method implies the passingof information among several grids. At every step, some values in the gridnodes have to be computed.

One way of doing it in parallel is by means of domain decomposition tech-niques. Then, each processor computes on the block of data per grid that isassigned to it. Of course, with or without overlaps, they have to communi-cate at the borders, especially if the smoothing procedure is of Gauss-Seideltype.

7.5. The parallelization of the multigrid method 255

Another approach to telescoping multigrid involves massively parallelcomputers (it means at least as many processors as unknowns on all of thegrids).

Then, the following method is proposed, known as ”the concurrent muti-grid” method: the concept is that all operations should be perform simul-taneously, on all unknowns, on all grids. An initial approximation of zerois assumed for the solution. Two sets of vectors, qj and dj are used to holdinformation about right hand sides and data on the spectrum of levels in thesense that

bk ≃k∑

j=1

q1 + dj.

A third set of vectors, xj , contains the approximations to the solutions toeach problem on each level. This information percolates to the finest grid,k, to finally provide the approximate solution to the real problem. Recallingthe serial Multigrid Algorithm (MV) given in section 4.3.6, in what followswe denote the restriction operator, I2h

h , by R and the prolongation operator,Ih2h, by P. Also, the level h and 2h, will be denoted, respectively, by j andj − 1.

So, the parallel algorithm is the following:

Step 1: Initialize in parallel

xj = qj = 0, 1 ≤ j < k; gk = bk, xk = 0.

Step 2: Repeat (i = 1, ..., µ).

Step 2a: Smoothing in parallel:

xj ← PreSolve (xj , qj), 1 ≤ j ≤ k.

Step 2b: Compute data corresponding to xj in parallel:

dj ← Aj · xj , 1 ≤ j ≤ k.

Step 2c: Compute residuals in parallel:

qj ← qj − dj, 1 ≤ j ≤ k.


Step 2d: Project q onto coarser grids in parallel.

q1 ← q1 +R2q2;

qk ← (I − Pk−1Rk)qk;

qj ← (I − Pj−1Rj)qj +Rjqj+1, 1 < j < k.

Step 2e: Inject x into finer grids in parallel:

x1 ← 0; xk ← xk + Pk−1 · xk−1;

xj ← Pj−1 · xj−1, 1 < j < k.

Step 2f: Inject d into finer grids in parallel:

d1 ← 0; dk ← dk + Pk−1 · dk−1;

dj ← Pj−1 · dj−1, 1 < j < k.

Step 2g: Put all the data back into q in parallel

gj ← gj + dj, 1 ≤ j ≤ k.

Step 3: Return xk.

Finally, there is another approach to telescoping parallelization: applya domain decomposition method to the finest grid and then use a serialcomputer multigrid method on each of the subdomains.

7.5.2 Nontelescoping parallelizations

Nontelescoping methods use multiple coarse subspaces in which the sum ofthe unknowns on any level equals the sum on any other level.

For a given problem k, it either has a set Ck of coarse space correctionproblems or it has none at all (i.e. Ck = ∅). When Ck 6= ∅, there arerestrictions and prolongations operators for each space problem ℓ ∈ Ck suchthat

Rℓ : Mk →Mℓ and Pℓ : Mℓ → Mk,

where by Mℓ and Mk we denoted the solutions spaces.A multiple coarse space correction multigrid scheme is defined by the

following algorithm, named MCM:Algorithm MCM(j, µj , Cj, xj , bj)

7.5. The parallelization of the multigrid method 257

Step 1: If Cj = ∅, then solve Aj · xj = bj .

Step 2: If Cj 6= ∅, then repeat (i = 1, ..., µj).

Step 2a: Smoothing: xj ← PreSolve(xj, bj).

Step 2b: Residual correction:

xj ← xi +∑

ℓ∈CPℓ ·MCM(ℓ, µℓ, Cℓ, O,Rℓ(bj − Ajxj))/

Step 2c: Smoothing: xj ← PreSolve(xj, bj).

Step 3: Return xj .

Remark 7.5.1. In the parallel implementation of MCM algorithm, eachprocessor executes the computation on a Cj coarse space.

This algorithm was introduced by Ta’asan, when standard multigrid failedto converge for a class of problems with highly oscillatory solutions. He usedstandard interpolation and projection methods for operators (as presentedin section 4.3.6) and a relaxation method for the smoothing procedure.

Using a different type of interpolation and projection methods, Hackbuschdeveloped another method, known as robust multigrid.

Frederichson and McBryan elaborated the parallel superconvergent multi-grid method, using a SIMD machine, standard interpolation and projectionoperators, but an elaborate smoother on every level.

Remark 7.5.2. All these methods reduce the execution time, but are space-wasteful.

In order to correct this inconvenient Douglas proposed the domain re-duction method, in which the approximate solve step was eliminated. A sidebenefit of this theory is that the fine grid problem and most of the coarse gridmatrices do not need to be generated, so the method can use substantiallyless memory than a standard iterative algorithm.

Bibliography

[1] O. Agratini, I. Chiorean, Gh. Coman, R.T. Trımbitas, Analiz a Numerica si

Teoria Aproximarii, vol. III, Presa Universitara Clujeana, 2002.

[2] O. Agratini, P. Blaga, Gh. Coman, Lectures on Wavelets, Numerical Meth-

ods and Statistics, Casa Cartii de Stiinta, Cluj-Napoca, 2005.

[3] R.E. Barnhill, Representation and Approximation of Surfaces, Mathematical

Software III, (Ed. J.R. Rice), Academic Press, 1977, pp. 69–120.

[4] R.E. Barnhill, A survey of the representation and design of surfaces, IEEE

Computer Graphics and Applications, 3 (1983), no. 7, pp. 9–16.

[5] R.E. Barnhill, Surfaces in Computer Aided Geometric Design: A survey

with new results, Computer Aided Geometric Design, 2 (1985), pp. 1–17.

[6] R.E. Barnhill, G. Birkhoff, W.J. Gordon, Smooth interpolation in triangle,

J. Approx. Theory 8 (1973), pp. 114–128.

[7] R.E. Barnhill, R.P. Dube, F.F Little, Properties of Shepard’s surfaces, Rocky

Mountains J. Math., 13 (1983), no. 2, pp. 365–382.

[8] S. Bernstein, Demonstration du theoreme de Weierstrass fondee sur le calcul

des probabilites, Comm. Soc. Math., Khavkov, 13 (1912), pp. 1-2.

[9] J. Bernoulli, Ars conjectandi, Werke, 3, Birkhaser, 1975, pp. 107-286.

[10] P. Blaga, Gh. Coman, On some bivariate spline operators, Rev. Anal.

Numer. Theor. Approx., 7 (1979), pp. 143–153.

[11] P. Blaga, Gh. Coman, Multivariate interpolation formulae of Birkhoff type,

Studia Univ. Babes-Bolyai, Mathematica, 26 (1981), no. 2, pp. 14–22.

259

260 BIBLIOGRAPHY

[12] P. Blaga, Gh. Coman, S. Pop, R. Trımbitas, D. Vasaru, Analiza numerica-

Lucrari de laborator, Univ. Babes-Bolyai, Cluj, 1994.

[13] H. Bohman, On approximation of continuous and analytic functions, Ark.

Mat. 2, 1952, pp. 43-56.

[14] K. Bohmer, Spline functionen eine einfuhrung in theorie und anwendungen,

Teubner Valag, Stuttgart, 1974.

[15] K. Bohmer, Gh. Coman, Smooth interpolation schemes in triagles with error

bounds, Mathematica (Cluj), 18 (41) (1976), no. 1, pp. 15–27.

[16] K. Bohmer, Gh. Coman, On some approximation schemes on triangles,

Mathematica (Cluj), 24 (45) (1980), pp. 231–235.

[17] A. Borodin, I. Munro, The computational complexity of algebraic and nu-

merical problems, American Elsevier, 1975.

[18] B. Bylina, Solving linear systems with vectorized WZ factorization, Annales

UMCS Informatica AI, 1 (2003), pp. 5-13.

[19] T. Catinas (Gulea), On trivariate approximation, Proceedings of the In-

ternational Symposium on Numerical Analysis and Approximation Theory,

Cluj-Napoca, May 9–11, 2002, pp. 207–230.

[20] T. Catinas, Trivariate approximation operators on cube by parametric exten-

sions, Acta Universitatis Apulensis, Mathematics-Informatics, no. 4 (2002),

pp. 29–36.

[21] T. Catinas, Interpolating on some nodes of a given triangle, Studia Ma-

thematica ”Babes-Bolyai”, 48 (2003), no. 4, pp. 3–8.

[22] T. Catinas, The combined Shepard-Abel-Goncharov univariate operator,

Rev. Anal. Numer. Theor. Approx., 32 (2003), no. 1, pp. 11–20.

[23] T. Catinas, The combined Shepard-Lidstone univariate operator, ”Tiberiu

Popoviciu” Itinerant Seminar of Functional Equations, Approximation and

Convexity, Cluj-Napoca, May 21–25, 2003, pp. 3–15.

[24] T. Catinas, On the generalized Newton algorithm for some nodes given on

tetrahedron, Seminar on Numerical and Statistical Calculus, ”Babes-Bolyai”

University, Cluj-Napoca, 2004, pp. 65–76.

BIBLIOGRAPHY 261

[25] T. Catinas, The combined Shepard-Lidstone bivariate operator, Trends and

Applications in Constructive Approximation, (Eds. M.G. de Bruin, D.H.

Mache, J. Szabados), International Series of Numerical Mathematics, Vol.

151, 2005, Springer Group-Birkhauser Verlag, pp. 77-89.

[26] T. Catinas, Bounds for the remainder in the bivariate Shepard interpolation

of Lidstone type, Rev. Anal. Numer. Theor. Approx., 34 (2005), no. 1, pp.

47-53.

[27] T. Catinas, Bivariate interpolation by combined Shepard operators, Proceed-

ing of 17thIMACS World Congress, Scientific Computation, Applied Math-

ematics and Simulation (Eds. P. Borne, M. Benrejeb, N. Dangoumau, L.

Lorimier), Paris, July 11-15, 2005, ISBN 2-915913-02-1, pp.1-7.

[28] T. Catinas, Three ways of defining the bivariate Shepard operator of Lidstone

type, Studia Univ. ”Babes-Bolyai”, Mathematica, 50 (2005), no. 3, pp. 57–

63.

[29] T. Catinas, The Lidstone interpolation on tetrahedron, J. Appl. Funct. Anal.,

2006, no. 4, pp. 425-439.

[30] T. Catinas, A combined method for interpolation of scattered data based on

triangulation and Lagrange interpolation, Studia Univ. Babes-Bolyai, Math-

ematica, 51 (2006), no. 4, pp. 55-63.

[31] T. Catinas, Interpolation of scattered data, ”Casa Cartii de Stiinta”, 2007.

[32] T. Catinas, The bivariate Shepard operator of Bernoulli type, Calcolo, 2007,

to appear.

[33] T. Catinas, A cubature formula based on Bernoulli interpolation for the

rectangle, to appear.

[34] T. Catinas, Gh. Coman, Optimal quadrature formulas based on ϕ-function

method , Studia Univ. Babes-Bolyai, Mathematica, 51 (2006), no. 1, pp.

49-64.

[35] E.W. Cheney, Multivariate Approximation Theory, Selected Topics,

CBMS51, SIAM, Philadelphia, Pennsylvania, 1986.

[36] W. Cheney, W. Light, A Course in Approximation Theory, Brooks/Cole

Publishing Company, Pacific Grove, 2000.

262 BIBLIOGRAPHY

[37] I. Chiorean, Parallel Numerical Methods for Solving Partial Differential

Equations, Matarom Rev., Nr. 3, Paris, 1993, pp. 19-38.

[38] I. Chiorean, On the complexity of multigrid method in solving a system of

PDE, Research seminars, Seminar on Numerical and Statistical Calculus,

Preprint nr. 1, 1994, pp. 27-32.

[39] I. Chiorean, Calcul Paralel, Editura Microinformatica, 1995.

[40] I. Chiorean, On the Convergence of the Numerical Solution for a Problem of

Convection in Porous Medium, Research seminars, Seminar on Numerical

and Statistical Calculus, Preprint nr. 1, 1996, pp. 23-27.

[41] I. Chiorean, Parallel algorithm for solving a problem of convection in porous

medium, Advances in Engineering Software, Elsevier Ltd., 28 (1997), no. 8,

Great Britain, pp. 463-467.

[42] I. Chiorean, Free convection in an inclined square enclosure filled with a

heat-generating porous-medium, Studia Universitatis ”Babes-Bolyai” Math-

ematica, Cluj-Napoca, no. 4, 1997, pp. 35-43.

[43] I. Chiorean, Convection between the Number of Multigrid Iterations and the

Discretized Error in Solving a Problem of Convection in Porous Medium,

Studia Universitatis ”Babes-Bolyai” Mathematica, Cluj-Napoca, no. 1,

1997, pp. 59-65.

[44] I. Chiorean, Parallel Numerical methods for solving nonlinear equations,

Studia Universitatis ”Babes-Bolyai”, Cluj-Napoca, Mathematica, 46 (2001),

no. 4, pp. 53-61.

[45] I. Chiorean, On the complexity of some parallel relaxation schemes, Bul.

Stiintific Univ. Baia Mare, Ser. B, Matematica-Informatica, 43 (2002), no.

2, pp. 171-176.

[46] I. Chiorean, On some 2D Parallel Relaxation Schemes, Proceedings of

ROGER, Sibiu, 12-14 iunie 2002, Mathematical Analysis and Approxima-

tion Theory, Burg Verlag, 2002, pp. 77-85.

[47] I. Chiorean, Contributii la studiul unei probleme de convectie in mediu poros,

Presa Universitara Clujeana, 2002.

BIBLIOGRAPHY 263

[48] I. Chiorean, Parallel numerical methods for PDE, Kragujevac J. Math. 25

(2003), pp. 5-18.

[49] I. Chiorean, On some Numerical Methods for Solving the Black-Scholes For-

mula, Creative Mathematics Journal, 13 (2004), Pub. by Dep. of Math. and

Comp. Science, North Univ. Baia-Mare, pp. 31-36 (conf. ICAM4, Suior,

Baia-Mare).

[50] I. Chiorean, Remarks on the restriction and prolongation operators from

the multigrid method, Mathematical Analysis and Approximation Theory,

Mediamira Science Publisher, 2005, pp. 43-49.

[51] I. Chiorean, Parallel Algorithm for Solving the Black-Scholes Equation,

Kragujevac J. Math, 27 (2005), pp. 39-48.

[52] I. Chiorean, Parallel LU Method for solving the Black-Scholes Equation,

Proceedings of NAAT, Cluj, 2006, pp. 155-161.

[53] P.L. Clark, The Stone-Weierstrass Approximation Theory,

www.math.uga.edu.

[54] Gh. Coman, Two-dimensional monosplines and optimal cubature formulae,

Studia Univ. ”Babes-Bolyai”, Ser. Math.-Mech. 1973, no. 1, pp. 41–53.

[55] Gh. Coman, Multivariate approximation schemes and the approximation of

linear functionals, Mathematica, 16 (39) (1974), pp. 229–249.

[56] Gh. Coman, The approximation of multivariate functions, Technical Sum-

mary Report 1254, Mathematics Research Center, University of Wisconsin,

Madison, Wisconsin, 1974.

[57] Gh. Coman,The complexity of the quadrature formulas, Mathematica (Cluj),

23 (46) (1981), pp. 183–192.

[58] Gh. Coman, Homogeneous multivariate approximation formulas with appli-

cations in numerical integration, Research Seminaries, ”Babes-Bolyai” Uni-

versity, Faculty of Mathematics, Preprint No. 4, 1985, pp. 46–63.

[59] Gh. Coman, The remainder of certain Shepard type interpolation formulas,

Studia Univ. ”Babes-Bolyai”, Mathematica, 32 (1987), no. 4, pp. 24–32.

264 BIBLIOGRAPHY

[60] Gh. Coman, Shepard-Taylor interpolation, Itinerant Seminar on Functional

Equations, Approximation and Convexity, Cluj-Napoca, 1988, pp. 5–14.

[61] Gh. Coman, Homogeneous cubature formulas, Studia Univ. ”Babes-Bolyai”,

Mathematica, 38 (1993), no. 2, pp. 91–101.

[62] Gh. Coman, Analiza Numerica, Ed. Libris, Cluj-Napoca, 1995.

[63] Gh. Coman, Shepard operators of Hermite-type, Rev. Anal. Numer. Theor.

Approx., 26 (1997), no. 1–2, pp. 33–38.

[64] Gh. Coman, Shepard operators of Birkhoff type, Calcolo, 35 (1998), pp.

197–203.

[65] Gh. Coman, Numerical multivariate approximation operator, ”Tiberiu

Popoviciu” Itinerant Seminar of Functional Equations, Approximation and

Convexity, Cluj-Napoca, 2001, May 22–26.

[66] Gh. Coman, M. Birou, Bivariate spline-polynomial interpolation, Studia

Mathematica ”Babes-Bolyai”, 48 (2003), no. 4, pp. 17–25.

[67] Gh. Coman, K. Bohmer, Blending interpolation schemes on triangle with

error bounds, Lecture Notes in Mathematics, Springer-Verlag, Berlin, 571

(1977), pp. 14–37.

[68] Gh. Coman, T. Catinas, M. Birou, A. Oprisan, C. Osan, I. Pop, I. Somogyi,

I. Todea, Interpolation operators, Ed. ”Casa Cartii de Stiinta”, Cluj-Napoca,

2004.

[69] Gh. Coman, I. Gansca, Blending approximation with applications in con-

structions, Buletinul Stiintific al Institutului Politehnic Cluj-Napoca, 24

(1981), pp. 35–40.

[70] Gh. Coman, I. Gansca, Some practical application of blending approximation

II, Itinerant Seminar on Functional Equations, Approximation and Convexi-

ty, Cluj-Napoca, 1986.

[71] Gh. Coman, I. Gansca, L. Tambulea, Some practical application of blending

interpolation, Proceedings of the Colloquium on Approximation and Opti-

mization, Cluj-Napoca, October 25–27, 1984.

BIBLIOGRAPHY 265

[72] Gh. Coman, I. Gansca, L. Tambulea, Some new roof-surfaces generated by

blending interpolation technique, Studia Univ. ”Babes-Bolyai”, Mathema-

tica, 36 (1991), no. 1, pp. 119–130.

[73] Gh. Coman, I. Gansca, L. Tambulea, Multivariate approximation, Seminar

on Numerical and Statistical Calculus, Preprint no. 1, 1996, pp. 29–60.

[74] Gh. Coman, D. Johnson, Complexitatea algoritmilor, Lito. Univ. ”Babes-

Bolyai”, Cluj-Napoca, 1987.

[75] Gh. Coman, I. Pop, Some interpolation schemes on triangle, Studia Univ.

”Babes-Bolyai”, Mathematica, 48 (2003), 3, pp. 57–62.

[76] Gh. Coman, I. Pop, R. Trımbitas, An adaptive cubature on triangle, Studia

Univ. ”Babes-Bolyai”, Mathematica, 47 (2002), 4, pp. 27–35.

[77] Gh. Coman, I. Purdea Pop, On the remainder term in multivariate approx-

imation, Studia Univ. ”Babes-Bolyai”, Mathematica, 43 (1998), no. 1, pp.

7–14.

[78] Gh. Coman, I. Somogyi, Homogeneous formulas for approximation of mul-

tiple integrales, Studia Univ. ”Babes-Bolyai”, Mathematica, 45 (2000), pp.

11–18.

[79] Gh. Coman, R. Trımbitas, Lagrange-type Shepard operators, Studia Univ.

”Babes-Bolyai”, Mathematica, 42 (1997), pp. 75–83.

[80] Gh. Coman, R. Trımbitas, Combined Shepard univariate operators, East

Jurnal on Approximations, 7 (2001), 4, pp. 471–483.

[81] Gh. Coman, R. Trımbitas, Univariate Shepard-Birkhoff interpolation, Rev.

Anal. Numer. Theor. Approx., 30 (2001), no. 1, pp. 15–24.

[82] Gh. Coman, L. Tambulea, A Shepard-Taylor approximation formula, Studia

Univ. ”Babes-Bolyai”, Mathematica, 33 (1988), no. 3, pp. 65–73.

[83] Gh. Coman, L. Tambulea, On some interpolation of scattered data, Studia

Univ. ”Babes-Bolyai”, 35 (1990), no. 2, pp. 90–98.

[84] Gh. Coman, L. Tambulea, Bivariate Birkhoff interpolation of scattered data,

Studia Univ. ”Babes-Bolyai”, 36 (1991), no. 2, pp. 77–86.

266 BIBLIOGRAPHY

[85] Gh. Coman, L. Tambulea, I. Gansca, Multivariate approximation, Seminar

on Numerical and Statistical Calculus, Preprint no. 1, 1996, pp. 29–60.

[86] Ph. J. Davis, Interpolation and Approximation, Blaisdell Publishing Com-

pany, New York, 1963.

[87] B. Della Vecchia, G. Mastroianni, P. Vertesi,Direct and converse theorems

for Shepard rational approximation, Numer. Funct. Anal. Optim., 17 (1996),

nos. 5–6, pp. 537–561.

[88] F.J. Delvos, D-variate boolean interpolation, J. Approx. Theory, 34 (1982),

pp. 99–44.

[89] F.J. Delvos, Boolean methods for double integration, Math. Comp., 55

(1990), pp. 683–692.

[90] F.J. Delvos, W. Schempp, Boolean methods in interpolation and approxi-

mation, Longman Scientific & Technical, England, copublished with John

Wiley & Sons, New York, 1989.

[91] J.W. Demmel, Solving the Discrete Poisson Equation using Mutigrid,

CS267, Notes for Lectures, www.cs.berkeley.edu.

[92] C.C. Douglas, A Review of Numerous Parallel Multigrid Methods, SIAM

News, V. 25, No. 3, 2000.

[93] A.V. Efimov, Modulus of continuity, Enciclopedia of Mathematics, 2001.

[94] R. Farwig, Rate of convergence of Shepard’s global interpolation formula,

Math. Comp., 46 (1986), pp. 577–590.

[95] R. Franke, Scattered data interpolation: tests of some methods, Math.

Comp., 38 (1982), pp. 181–200.

[96] R. Franke, G.M. Nielson, Smooth interpolation of large sets of scattered data,

International Journal for Numerical Methods in Engineering, 15 (1980), pp.

1691–1704.

[97] P.O. Frederickson, O.A. McBryan, Parallel superconvergent multigrid, Lec-

ture Notes in Pure and App. Math., 1988.

BIBLIOGRAPHY 267

[98] M. Gasca, T. Sauer, Polynomial interpolation in several variables. Multi-

variate polynomial interpolation, Adv. Comput. Math., 12 (2000), no. 4,

pp. 377–410.

[99] M. Gasca, T. Sauer, On the history of multivariate polynomial interpolation,

J. Comput. Appl. Math., 122 (2000), pp. 23–35.

[100] W.B. Gearhart, M. Qian, Peano kernels and the Euler-MacLaurin formula,

Far East J. Math. Sci., 3 (2001), no. 2, pp. 247–271.

[101] W.M. Gentelman, Least squares computations by Givens transformations

without square roots, Journal of the IMA, 12, 1973.

[102] A.C. Gilbert, The SOR Method, www.maths.lancs.ac.uk.

[103] W.J. Gordon, Distributive lattices and approximation of multivariate func-

tions, Proc. Symp. Approximation with Special Emphasis on Spline Func-

tions (Madison, Wisc.), (Ed. I.J. Schoenberg), 1969, pp. 223–277.

[104] W.J. Gordon, Blending-function methods of bivariate and multivariate inter-

polation and approximation, SIAM J. Numer. Anal., 8 (1971), pp. 158–177.

[105] W. Hackbusch, A new approach to robust multigrid solvers, Proceeding of

ICIAM’87.

[106] P.P. Korovkin, On converge of linear positive operators in the space of con-

tinuous functions, Dobl. Akad. Nauk SSSR 90, 1960, pp. 961-964.

[107] D.V. Ionescu, Numerical Quadratures, Bucharest, 1957.

[108] D.V. Ionescu, Sur une classe de formules de cubature, C.R. Acad. Sc. Paris,

266 (1968), pp. 1155–1158.

[109] D.V. Ionescu, Extension de la formule de quadrature de Gauss a une classe

de formules de cubature, C.R. Acad. Sc. Paris, 269 (1969), pp. 655–657.

[110] D.V. Ionescu, L’extension d’une formule de cubature, Acad. Royale de Bel-

gique, Bull. de la Classe des Science, 56 (1970), pp. 661–690.

[111] W. Light, W. Cheney, Approximation Theory in Tensor Product Spaces, Lec-

ture Notes in Mathematics 1169, (Ed. A. Dold and B. Eckmann), Springer-

Verlag, Berlin, 1985.

268 BIBLIOGRAPHY

[112] L.E. Mansfield, On the optimal approximation of linear functionals in spaces

of bivariate functions, SIAM J. Numer. Anal., 8 (1971), pp. 115–126.

[113] C.A. Micchelli, Interpolation of scattered data: distance matrices and con-

ditionally positive definite functions, Constr. Approx., 2 (1986), no. 1, pp.

11–22.

[114] J.J. Modi, Parallel Algorithm and Matrix Computation, Clarendon Press,

Oxford, 1988.

[115] G.M. Nielson, R. Franke, A method for construction of surfaces under ten-

sion, Rocky Mtn. J. Math., 14 (1984), no. 1, pp. 203–221.

[116] H.H. Olsen, The Weierstrass density theorem, TMA4230, Functional Anal-

ysis, 2005.

[117] A. Oprisan, About convergence order of the iterative methods generated by

inverse interpolation, Seminar on Numerical and Statistical Calculus, 2004,

pp. 97–109.

[118] A. Pinkus, Density Methods and Results in Approximation Theory, Orliez

Centenary volume, Banck Center Publication, vol.64, Polish Acad. of Sci-

ence, Warszawa, 2004.

[119] T. Popoviciu, On the proof of Weierstrass theorem using interpolation poly-

nomials, Lucr. Ses. Gen. Stiintific, 2:12 (1950), pp. 1664-1667.

[120] H.J. Quinn, Designing Efficient Algorithm for Paralell Computers, McGraw-

Hill Int., 1988.

[121] A.H. Sameh, R.P. Brent, Solving triangular systems on a parallel computer,

SIAM Journal on Numerical Analysis, 14 (6), 1977.

[122] A. Sard, Linear Approximation, AMS, Providence, RI, 1963.–334.

[123] I.J. Schoenberg, On monosplines of least deviation and best quadrature for-

mulae, J. SIAM Numer. Anal., Ser. B, 2 (1965) no. 1, pp. 145–170.

[124] I.J. Schoenberg, On monosplines of least deviation and best quadrature for-

mulae II, J. SIAM Numer. Anal., Ser. B, 3 (1966), no. 2, pp. 321–328.

BIBLIOGRAPHY 269

[125] L.L. Schumaker, Fitting surfaces to scattered data, Approximation Theory

II, (Eds. G. G. Lorentz, C. K. Chui, L. L. Schumaker), Academic Press,

1976, pp. 203–268.

[126] L.L. Schumaker, Spline functions: basic theory, Pure and Applied Mathe-

matics. John Wiley & Sons, Inc., New York, 1981.

[127] B. Sendov, A. Andreev, Approximation and Interpolation Theory, Hand-

book of Numerical Analysis, vol. III, ed. P.G. Ciarlet and J.L. Lions, North

Holland, Amsterdam, 1994.

[128] W. Shayne, Refinements of the Peano kernel theorem, Numer. Funct. Anal.

Optimiz., 20 (1999) nos. 1&2, pp. 147–161.

[129] D. Shepard, A two-dimensional interpolation function for irregularly spaced

points, Proc. 23rd Nat. Conf. ACM (1968), pp. 517–523.

[130] S.A. Smolyak, Quadrature and interpolation formulas for tensor products of

certain classes of functions, Soviet Math. Dokl., 4, 1963, pp. 240–243.

[131] I. Somogyi, About some cubature formulas, Proceedings of the Interna-

tional Symposium on Numerical Analysis and Approximation Theory, Cluj-

Napoca, May 9–11, 2002, pp. 430–436.

[132] I. Somogyi, Almost optimal numerical methods, Studia Univ. ”Babes-

Bolyai”, Mathematica, 1999, no. 1, pp. 85–93.

[133] D.D. Stancu, A study of the polynomial interpolation of functions of several

variables, with applications to the numerical differentiation and integration;

methods for evaluating the remainders, Doctoral Dissertation, 1956, Univer-

sity of Cluj.

[134] D.D. Stancu, The generalization of certain interpolation formulae for the

functions of many variables, Buletinul Institutului Politehnic din Iasi, nos.

1–2, 1957, pp. 31–38.

[135] D.D. Stancu, On the Hermite interpolation formula and on some of its ap-

plications, Acad. R. P. Romıne. Fil. Cluj. Stud. Cerc. Mat., 8 (1957), pp.

339–355.

270 BIBLIOGRAPHY

[136] D.D. Stancu, Generalization of some interpolation formulas for multiva-

riate functions and some contributions on the Gauss formula for numerical

integration, Buletin St. Acad. Romane, 9, 1957, pp. 287–313.

[137] D.D. Stancu, On some Taylor expansions for functions of several variables,

Rev. Roumaine Math. Pures Appl., 4 (1959), pp. 249–265 (in Russian).

[138] D.D. Stancu, The expression of the remainder in some numerical partial

differentiation formulas, Acad. R. P. Romıne Fil. Cluj Stud. Cerc. Mat., 11

(1960), pp. 371–380.

[139] D.D. Stancu, On the integral representation of the remainder in Taylor’s

formula in two variables, Stud. Cerc. Mat. (Cluj), 13 (1962), no. 1, pp.

175–182.

[140] D.D. Stancu, The remainder of certain linear approximation formulas in

two variables, SIAM J. Numer. Anal. Ser. B, 1 (1964), pp. 137–163.

[141] D.D. Stancu, On Hermite’s osculatory interpolation formula and on some

generalizations of it, Mathematica (Cluj), 8 (31) (1966), pp. 373–391.

[142] D.D. Stancu, A generalization of the Schoenberg approximating spline ope-

rator, Studia Univ. ”Babes-Bolyai”, Math., 26 (1981), no. 2, pp. 37–42.

[143] D.D. Stancu, Gh. Coman, O. Agratini, R. Trımbitas, Analiza Numerica si

Teoria Aproximarii, vol. I, Presa Universitara Clujeana, 2001.

[144] D.D. Stancu, Gh. Coman, P. Blaga, Analiza Numerica si Teoria

Aproximarii, vol. II, Presa Universitara Clujeana, 2002.

[145] A.H. Stroud, Optimal quadrature formulas, Approximation in Theory and

Praxis, Bibliographices Institut, Mannheim, 1979.

[146] A.H. Stroud, Approximate calculation of multiple integrals, Prentice-Hall,

Inc., Englewood Cliffs, New Jersey, 1971.

[147] J.R. Shewchuk, An Introduction to the Conjugate Gradient Method with-

out Agonizing Pain, School of Computer Science, Carnegie Mellon Univ.,

Pittsburg, 1994.

[148] O. Shisa, B. Mond, The degree of convergence of sequences of linear positive

operators, Aerospace Research Lab., Ohio, 1968, pp.11-96.

BIBLIOGRAPHY 271

[149] J. Szabados, Direct and converse approximation theorems for the Shepard

operator, Approx. Theory Appl., 7 (1991), no. 3, pp. 63–76.

[150] S. Ta’asan, Multigrid Methods for Highly Oscillatory Problems, PhD thesis,

Weizmann Inst. of Science, Rehovot, Israel, 1984.

[151] R. Toomas, Transformation and LU-Factorization, www.cs.ut.ee.

[152] J.F. Traub, Iterative Methods for the Solutions of Equations, Prentice-Hall,

Inc, Englewood Cliffs, 1964.

[153] R.T. Trımbitas, Univariate Shepard-Lagrange interpolation, Kragujevac J.

Math., 24 (2002), pp. 85–94.

[154] R. Trımbitas, Numerical Analysis, Presa Universitara Clujeana, 2006.

[155] P. Wong, Iterative solvers for System of Linear Equations, www.jics.utk.edu,

2002.

[156] ***www.Wikipedia.org, Weierstrass Approximation Theorem,

www.literka.addr.com.

numerical analysis an advanced coursemath.ubbcluj.ro/~tcatinas/cursanalizanum.pdf · theorems, best...

Documents