approximation theory output

145

Upload: maria-joao-lagarto

Post on 21-Jul-2016

68 views

Category:

Documents


2 download

TRANSCRIPT

AMS SHORT COURSE LECTURE NOTES Introductory Survey Lectures

A Subseries of Proceedings of Symposia in Applied Mathematics

Volume 36 APPROXIMATION THEORY Edited by Carl de Boor (New Orleans, Louisiana, January 1986)

Volume 35 ACTUARIAL MATHEMATICS Edited by Harry H. Panjer (Laramie, Wyoming, August 1985)

Volume 34 MATHEMATICS OF INFORMATION PROCESSING Edited by Michael Anskel and William Gewirtz (Louisville, Kentucky, January 1984)

Volume 33 FAIR ALLOCATION Edited by H. Peyton Young (Anaheim, California, January 1985)

Volume 32 ENVIRONMENTAL AND NATURAL RESOURCE MATHEMATICS Edited by R. W. McKehey (Eugene, Oregon, August 1984)

Volume 31 COMPUTER COMMUNICATIONS Edited by B. Gopinath (Denver, Colorado, January 1988)

Volume 30 POPULATION BIOLOGY Edited by Simon A. Levin (Albany, New York, August 1988)

Volume 29 APPLIED CRYPTOLOGY, CRYPTOGRAPHIC PROTOCOLS, AND COMPUTER SECURITY MODELS By R. A. DeMillo, G. I. Davida, D. P. Dobkin, M. A. Harrison, and R. J. Lipton

(San Francisco, California, January 1981)

Volume 28 STATISTICAL DATA ANALYSIS Edited by R. Gnanadesikan (Toronto, Ontario, August 1982)

Volume 27 COMPUTED TOMOGRAPHY Edited by L. A. Shepp (Cincinnati, Ohio, January 1982)

Volume 26 THE MATHEMATICS OF NETWORKS Edited by S. A. Burr (Pittsburgh, Pennsylvania, August 1981)

Volume 25 OPERATIONS RESEARCH: MATHEMATICS AND MODELS Edited by S. I. Gass (Duluth, Minnesota, August 1979)

Volume 24 GAME THEORY AND ITS APPLICATIONS Edited by W. F. Lucas (Biloxi, Mississippi, January 1979)

Volume 23 MODERN STATISTICS: METHODS AND APPLICATIONS Edited by R. V. Hogg (San Antonio, Texas, January 1980)

Volume 22 NUMERICAL ANALYSIS Edited by G. H. Golub and J. Oliger (Atlanta, Georgia, January 1978)

Volume 21 MATHEMATICAL ASPECTS OF PRODUCTION AND DISTRIBUTION OF ENERGY Edited by P. D. Lax (San Antonio, Texas, January 1976)

http://dx.doi.org/10.1090/psapm/036

PROCEEDINGS OF SYMPOSIA IN APPLIED MATHEMATICS

Volume 20 THE INFLUENCE OF COMPUTING ON MATHEMATICAL RESEARCH AND EDUCATION Edited by J. P LaSalle (University of Montana, August 1978)

Volume 19 MATHEMATICAL ASPECTS OF COMPUTER SCIENCE Edited by J. T. Schwartz (New York City, April 1966)

Volume 18 MAGNETO-FLUID AND PLASMA DYNAMICS Edited by H. Grad (New York City, April 1965)

Volume 17 APPLICATIONS OF NONLINEAR PARTIAL DIFFERENTIAL EQUATIONS IN MATHEMATICAL PHYSICS Edited by R. Finn (New York City, April 1964)

Volume 16 STOCHASTIC PROCESSES IN MATHEMATICAL PHYSICS AND ENGINEERING Edited by R. Bellman (New York City, April 196S)

Volume 15 EXPERIMENTAL ARITHMETIC, HIGH SPEED COMPUTING, AND MATHEMATICS Edited by N C. Metropolis, A. H. Taub, J. Todd, and C. B. Tompkins (Atlantic City and Chicago,

April 1962)

Volume 14 MATHEMATICAL PROBLEMS IN THE BIOLOGICAL SCIENCES Edited by R. Bellman (New York City, April 1961)

Volume 13 HYDRODYNAMIC INSTABILITY Edited by R. Bellman, G. Birkhoff and C C Lin (New York City, April 1960)

Volume 12 STRUCTURE OF LANGUAGE AND ITS MATHEMATICAL ASPECTS Edited by R. Jakobson (New York City, April 1960)

Volume 11 NUCLEAR REACTOR THEORY Edited by G. Birkhoff and E. P. Wigner (New York City, April 1959)

Volume 10 COMBINATORIAL ANALYSIS Edited by R. Bellman and M. Hall, Jr. (New York University, April 1957)

Volume 9 ORBIT THEORY Edited by G. Birkhoff and R. E. Longer (Columbia University, April 1958)

Volume 8 CALCULUS OF VARIATIONS AND ITS APPLICATIONS Edited by L. M. Graves (University of Chicago, April 1956)

Volume 7 APPLIED PROBABILITY Edited by L. A. MacColl (Polytechnic Institute of Brooklyn, April 1955)

Volume 6 NUMERICAL ANALYSIS Edited by J. H. Curtiss (Santa Monica City College, August 1958)

Volume 5 WAVE MOTION AND VIBRATION THEORY Edited by A. E. Heins (Carnegie Institute of Technology, June 1952)

Volume 4 FLUID DYNAMICS Edited by M. H. Martin (University of Maryland, June 1951)

Volume 3 ELASTICITY Edited by R. V. Churchill (University of Michigan, June 1949)

Volume 2 ELECTROMAGNETIC THEORY Edited by A. H. Taub (Massachusetts Institute of Technology, July 1948)

Volume 1 NON-LINEAR PROBLEMS IN MECHANICS OF CONTINUA Edited by E. Reissner (Brown University, August 1947)

AMS SHORT COURSE LECTURE NOTES Introductor y Surve y Lecture s

publishe d as a subserie s o f Proceeding s o f Symposi a in Applie d Mathematic s

This page intentionally left blank

CONTRIBUTORS

E. W. CHENEY, Department of Mathematics, University of Texas at Austin, Austin, Texas

RONALD A. DEVORE, Department of Mathematics and Statistics, University of South Carolina, Columbia, South Carolina

KLAUS HOLLIG, Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin

CHARLES A. MICCHELLI, IBM T. J. Watson Research Center, Yorktown Heights, New York

A. PINKUS, Department of Mathematics, Technion, Haifa, Israel

E. B. SAFF, Institute for Constructive Mathematics, Department of Mathematics, University of South Florida, Tampa, Florida

This page intentionally left blank

PROCEEDINGS OF SYMPOSIA IN APPLIED MATHEMATICS

Volume 36

Approximatio n Theor y Carl de Boor, Editor

American Mathematical Society Providence, Rhode Island

LECTURE NOTES PREPARED FOR THE AMERICAN MATHEMATICAL SOCIETY SHORT COURSE

APPROXIMATION THEORY HELD IN NEW ORLEANS, LOUISIANA

JANUARY 5-6 , 1986

The AMS Short Course Series is sponsored by the Society's Committee on Employment and Education Policy (CEEP). The series is under the direction of the Short Course Advisory Subcommittee of CEEP.

Library of Congress Cataloging-in-Publication Data Approximation theory.

(Proceedings of symposia in applied mathematics, ISSN 0160-7634; v. 36) (Proceedings of symposia in applied mathematics; v. 36. AMS short course lecture notes)

Includes bibliographies and index. 1. Approximation theory—Congresses. I. De Boor, Carl, 1937—

II. American Mathematical Society. III. Series, IV. Series: Proceedings of symposia in applied mathematics. AMS short course lecture notes. QA221.A653 1986 51l'.4 86-10846 ISBN 0-8218-0098-1 (pbk.: alk. paper)

COPYING AND REPRINTING. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy an article for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given.

Republication, systematic copying, or multiple reproduction of any material in this publication (including abstracts) is permitted only under license from the American Math­ematical Society. Requests for such permission should be addressed to the Executive Director, American Mathematical Society, P.O. Box 6248, Providence, Rhode Island 02940.

The appearance of the code on the first page of an article in this book indicates the copyright owner's consent for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law, provided that the fee of $1.00 plus $.25 per page for each copy be paid directly to the Copyright Clearance Center, Inc., 21 Congress Street, Salem, Massachusetts 01970. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale.

1980 Mathematics Subject Classification (1985 Revision). Primary 41-01; Secondary 41-02, 30E10, 65Dxx.

Copyright ©1986 by the American Mathematical Society. All rights reserved. Printed in the United States of America.

This volume was printed directly from copy prepared by the authors. The paper used in this book is acid-free and falls within the guidelines

established to ensure permanence and durability.

CONTENTS

Preface

Approximation of Functions RONALD A. DEVORE

Polynomial and Rational Approximation in the Complex E. B. SAFF

N-Widths and Optimal Recovery A. PlNKUS

Algorithms for Approximation

E. W. CHENEY

Algebraic Aspects of Interpolation

CHARLES A. MICCHELLI

Multivariate Splines

KLAUS HOLLIG

Index

ix

This page intentionally left blank

PREFACE

This book is the result of the 1986 American Mathematical Society Short Course entitled Approximation Theory given at the annual meeting at New Orleans, on January 5-6, 1986.

Approximation Theory is properly a subfield of Analysis, but derives much of its impetus from applications such as data fitting, the representation of curves and surfaces for design and display, the reconstruction of functions from partial information, the numerical solution of functional equations and the like. For this reason, Approximation Theory offers ready-made applications of the basic ideas of Analysis.

The first lecture describes and illustrates the basic concerns of Approximation The­ory. The other lectures are intended to provide a quick introduction into some of the areas of current research interest. Topics highlighted are: Approximation in the com­plex domain, n-width, optimal recovery, interpolation, algorithms for approximation, and splines, with strong emphasis on a multivariate setting in the last three topics.

I thank the authors very much for the considerable and selfless effort they have put into the preparation of the lectures and these notes.

Carl de Boor Madison, Wisconsin March, 1986

XI

This page intentionally left blank

Proceedings of Symposia in Applied Mathematics Volume 36, 1986

APPROXIMATION OF FUNCTIONS

Ronald A. DeVore

Approximation Theory began at the end of the last century with the study

of the approximation of functions by polynomials and rational functions. It is

a broad subject which interacts with various aspects of real, complex and

functional analysis. Some of its recent popularity comes from its importance in

the development of numerical algorithms and the solution of problems of

optimization.

One hundred years ago, Weierstrass proved his famous theorem on the

approximation of continuous functions by algebraic polynomials. Undoubtedly,

everyone of you has seen this theorem. But, in order to guarantee that we are

all starting at the same point, let's begin with a formulation of this theorem

which uses the notation of Approximation Theory.

We want to approximate functions f which are continuous on an interval

I:=[a,b]. We let C(I) denote the set of all such functions and let

||f||:- sup |f(x)|,

xsl

be its norm. We are interested in approximating f by algebraic polynomials

P(x)= a^ + a ^ + ... + a x of degree at most n. If II denotes the set of o l n ° n

a l l such polynomials, we l e t E (f) be the e r ro r of approximation to f from II :

(o.i) En<f>:s= inf M f - p M -Pe n

n

With this, we have

THEOREM 0.1. (Weierstrass [W]) If fsC(I), then E (f)-K) as n->°°.

In other words, each continuous function can be approximated arbitrarily well

in the uniform norm by polynomials. There are many wonderful proofs of

© 1986 American Mathematical Society 0160-7634/86 $1.00 + $.25 per page

1

http://dx.doi.org/10.1090/psapm/036/864363

2 RONALD A. DEVORE

Weierstrass' theorem. We shall give one of these a little later (§7).

Important theorems often open more doors than they close. This is

certainly the case with the Weierstrass theorem. Now that we know polynomial

approximation is possible, we are confronted with questions like:

Are there polynomials P* e P which attain the infimum in (0.1)?

If such a P* exists, is it unique?

How can we calculate P*?

Can we say anything about how fast En(f) tends to zero?

These fundamental questions about polynomial approximation were studied

around the turn of the century and their solution forms the foundation of

Approximation Theory. It is appropriate therefore in a course on

approximation, whether it be short or long, that we begin with a look at the

solution to these questions.

§1. Best Approximation. The polynomials P* are called polynomials of best

approximation to f of degree n. Let us begin with the questions of existence

and uniquenes of the P*. These can be discussed in the following more general

setting. We have a normed linear space X, | | . | | and one of its finite

dimensional subspaces Y. We are interested in approximating xeX by the

elements Y. For this, we introduce the distance d(x,Y) of x to Y:

(1.1) dist(x,Y):= inf ||x-y||.

yeY

If y*sY assumes the infimum in (1.1), then we say that y* is a best

approximation to x from Y and we let B(x) denote the collection of all such

best approximants. A simple but useful remark is that B(ax) = cxB(x) for any

scalar a. It is rather easy to see that best approximants always exist.

THEOREM 1.1. For each xsX, B(x)^0.

Proof. The infimum in (1.1) can be restricted to those yeY with ||y||< 2||x||.

Indeed any other y gives ||x-y||> ||y|| - ||x|| > ||x||, so that 0 is a better

approximation than y. Now the closed ball ||y||< 2||x|| is a compact set

(because Y is finite dimensional) [L,p.l6], hence the continuous function

||x-y|| attains its infimum on this set. In this way, we see that best

approximants exist.

A more subtle question is to decide when best approximation is unique,

that is, when does B(x) have just one element? This depends very much on the

APPROXIMATION OF FUNCTIONS 3

geometry in the space X, namely, on the shape of the unit ball U:={x: ||x||<l}.

This ball is always convex: if x,x'sll, then the line segment [x,x']:=

{zeX: z= ax+(l-a)x, a>0} is contained in U. We say U is strictly convex if

(1.2) ||o«+(l-a)*'|| < 1, for all x,x'eU, x^x', and all 0<a<l.

Strict convexity means that the interior of the line segment [x,x'] is

contained in the interior of U, for all x,x'€ U.

THEOREM 1.2 If the unit ball U of X is strictly convex, then best

approximation from a finite dimensional space Y ijs unique, that is, B(x) is; a

singleton for all xeX.

Proof. Suppose y,y'e B(x):

l|x-y|| - l|x-y'|| = dist(x,Y).

If dist(x,Y)=0, then x=y=y' as desired. Otherwise, we can rescale (that is

multiply x,y,y' by the same constant) so that dist(x,Y)=l. Then %(y+y') is in

Y. If y^y', then from (1.2),

llx - %<y+y')ll < dist(x,Y),

which is an obvious contradiction. Hence y=y' and best approximation is

unique.

The most important normed spaces X with strictly convex unit balls are the

L (I) spaces. This space consists of all Lebesgue integrable functions on I

for which the following norm is finite:

r D 1 / p

(1.3) ||f|l(I):= { |f(x)|p dx} , l<p<-. P Jj

When p=°°, the right side of (1.3) is replaced by the essential supremum of f.

The spaces L are strictly convex when Kp<°°. This is proved by examining when

equality can hold in the triangle inequality for the L norm.

From the strict convexity of the L spaces, Kp<°°, it follows that any

function fsL (I) has a unique best approximation P* from II .

Analogous to the L are the I spaces which consist of all sequences

x=(x.)i for which the norm

/ m xl/p I E l^ i p , I<P<° M = I * }

|x||p:=

max |x.|, Ki<m

4 RONALD Ao DEVORE

is finite. These spaces also have strictly convex unit ball if K p O but not

when p=l or ». This is easily seen when m=2 where the boundaries of the unit

balls are depicted in Figure 1 for the values p= 1,2,°°.

P=l

Figure 1

Unit balls of I1 for p=l,2,«>

Unfortunately, the space C(I), of most immediate interest to us, does not

have a strictly convex unit ball. For example, the functions <|>(x)=x and 2

\J/(x)=x each have norm one in C[0,1] but %(<|>+ ) also has norm one. Even more

to the point, it is easy to construct finite dimensional subspaces Y of C[0,1]

from which best approximation is not unique. Consider, for example, the space

Y:= span {<j>, }. Any non-negative function in Y of norm at most two is a best

approximation to f(x):=l.

Nevertheless, the situation is not as bad as it seems. It was shown by

P.L. Chebyshev that each feC(I) has a unique best approximation from II . The

special properties of II that make this true is the next subject for

discussion.

2. Chebyshev's Theorem. This theorem gives that best approximation P* from II

to a continuous function f is unique. To prove this theorem, P.L. Chebyshev

analyzed the behavior of the error function E(x):= f(x)-P*(x). He showed that

there are many points where E alternately takes on the values ±||E||.

THEOREM 2.1 (Chebyshev) If feC(I) and P* is its best approximation from IIn,

then there are points x ,. s- o ,x -, and a value X)= +1 such that > n+1*

(2.1) E(x.) = (-1)1 n ||E||, i=0,...,n+l.

Hence, this theorem shows that there are at least n+2 points where the error E

alternately takes on its maximum (||E||) and its minimum (-||E||).

Proof of Theorem 2.1. Let x be the first point x (from the left) on I where

|E(x)| = ||E||; such a point exists because E is continuous. Then, let x. be

the first point x>x where E(x)=-E(xJ. Continuing in this way, we create a o o

sequence of points x <...<x where E alternately takes on the values ±||E||.

APPROXIMATION OF FUNCTIONS 5

We claim that m>n, as desired. Indeed, if m<n, then because of the continuity

of E, we can find points £-,..., £ with x <£-<x..<...<£ <x such that

[£., £. -] contains no points x with E(x)=-E(x.). Now,for a proper choice

of Y> the polynomial P(x):= YCX-S^)*••( x -£ m) *s of degree < n and agrees in

sign with E at each of the points xQ,...,xm. If E(x^)>0, then P(x)>0 on

(t i,^ i + 1) and also E(x)> -E(xi) on [^i»^i+il« Hence, for f|>0 sufficiently

small, |E(x)-r|P(x)| < ||E||, for x£[^.,^. + 1]. The same is true when ECx.^0

and also on the end intervals. Since there are only a finite number of

intervals we can choose one T\>0 and obtain | | f-f|P| | <| |E| | . But then, this

means that P*+f)P is a better approximation to f than P*. This contradiction

means that m>n and proves Chebyshev's theorem.

From the Chebyshev alternation theorem, it is easy to prove the uniqueness

of best polynomial approximants.

THEOREM 2.2. If f is in C(I), then f has a unique best approximation P*eIIn.

Proof. If P^ and P2 a r e t w o best approximants to f from IIn then so is P: =

%(Pl+P2). Let xQ,...,xn be alternation points for f-P. Then,

(2.2) Y2(f(x.)-P1(xi)) + y2(f(xi)-?2(xi)) = f(Xi)-P(x.) = ±||E||.

Since ||f-P1||<||E||, and likewise for f-P2, the only way that (2.2) can

hold is if f(xi)-P1(xi) = f (xi)~-P2(xi). That is P1(xi) = P2(xi), i=0,...,n.

This means that the polynomial ^\~^2 wnich is of degree < n has n+1 zeros and

hence must be the zero polynomial. Therefore, P- = P2.

Actually the proof of Theorem 2.2 shows more. Namely, we have the

following Chebyshev characterization of when a polynomial P is the best

approximation to f.

THEOREM 2.3. ^f feC(I) and PeIIn are such that f-P alternately takes on the

values ±M at least n+2 times with M:= ||f-P||, then P=P* i^ the best

approximation to f and M=En(f).

Proof. Let x-, i= 0,... be the alternation points of f-P. If Q is any other

polynomial with ||f-Q|| < M, then Q-P = f-P - (f-Q) has the same sign as f-P

at each of the x^. Hence Q-P has at least n+1 zeros and Q=P. This is the

desired contradiction.

In view of this theorem, the search for the best approximation is reduced

to finding a polynomial P such that f-P has sufficiently many alternations.

6 RONALD A. DEVORE

What are the essential properties of polynomials which were used in the

above proof of uniqueness? Well, in the Chebyshev alternation theorem, we

constructed a polynomial which changed sign precisely at the points

£^,...,£m, and in the proof of uniqueness we used the fact that any non-

trivial polynomial of degree n has at most n zeros. There are other n+1

dimensional subspaces Xn+^ of C(I) which have these properties. They are

called Haar spaces and any basis $ 0 y ' t $ n for X n + 1 is called a Haar system

(sometimes called a Chebyshev system).

DEFINITION 2.4. An n dimensional subspace Xn of C(I) is called a Haar subspace

if each function <j>e XR has at most n -1 zeros on I unless <$> is identically

zero.

Of course, IIn are the most important Haar spaces. Some other a. x

interesting examples are the span of the exponentials e , i=l,...,n or a.

the span of the power functions x , i=l,..,n. Here, OI,...,OL can be any

non-negative real numbers.

Haar spaces are much like polynomial spaces. For example, we have.

THEOREM 2.5. If Xn is a Haar space and feC(I), then f has a unique best

approximation from Xn.

The proof is essentially the same as that given above for polynomials

except that now one has to work much harder to show that there is a function <f>

which changes sign at any prescribed points £ p • • •»^m' m^n> f r° m the interior

of I.

Remarkably, the notion of Haar space actually characterizes the Chebyshev

spaces of C(I) (i.e. the spaces from which best approximation is unique).

Indeed, we have the following theorem of Haar.

THEOREM 2.6. If every fsC(I) has a unique best approximation from the subspace

XR, then XR is a Chebyshev space.

A proof of this theorem can be found in the book of Lorentz [L]. Haar

systems are important in many fields other than approximation. The interested

reader should consult the seminal paper of Krein [Kr] or the book of Karlin and

Studden [K-SJ.

3. Trigonometric polynomial approximation. It is not necessary for the

interval I to be closed in the definition of a Haar system. In fact, one of

the most important Haar systems is the space t n of trigonometric polynomials of

degree < n. A trigonometric polynomial of degree n is an expression of the

form

APPROXIMATION OF FUNCTIONS 7

n

T(x)= aQ + Yt (ak c o s ^ x + ^k s* n kx)

with a2 + b2 > 0. Any trigonometric polynomial T of degree n has at most

2n zeros on [0,2n). Hence T n is a Haar system on this interval.

Approximation by trigonometric polynomials is quite similar to algebraic

polynomial approximation except that now we approximate functions f which are

2n periodic. We let jC(T) denote the space of all such continuous functions

and let ||.|| be the supremum norm on (-00,00) or equivalently any interval of

length 2n. If feC(T), then f has a unique best approximation T*stn:

||f-T*|| = inf ||f-T||. T8T n

The error of approximation in this case is denoted by E*(f) = ||f-T*||.

There is a very useful and important connection between trigonometric and

algebraic polynomial approximation which is obtained by using the

transformation x= cos 9 to identify points on [-1,1] with points on [0,n]. If

fe C(I), I:=[-l,l], then the function g(9):= f(cos 9) is an even 2n periodic

continuous function in C(T). Similarly, if P is an algebraic polynomial of

degree n then T(9):= P(cos 9) is an even trigonometric polynomial of degree

n.

We can go the other way as well. Namely, for any even trigonometric

polynomial T, the function P(x):= T(arccos x) is an algebraic polynomial

n of degree at most n. In fact, T(9) = Z ak c o s ^9 and so P(x) is a linear

o combination of the functions Cjc(x):= cos k(arccos x) , k=0,l,...,n. The C^

are algebraic polynomials of degree k (see §5).

It follows from the uniqueness of best approximations that the best

approximation T* to the even function g is an even trigonometric polynomial.

Hence, the above one-to-one correspondence between algebraic polynomials P and

even trigonometric polynomials T gives that P* is the best approximation to f

if and only if T* is the best trigonometric approximation to g. We also have

(3.1) En(f) = E*(g).

This simple remark allows us to prove results about algebraic approximation by

considering their analogue in trigonometric approximation.

4. Computing best approximants. It is generally difficult to compute best

approximations. An exception is when X has an inner product ( , ) and its

induced norm: ||f||2: = (f,f). For example, L2(I) has the inner product

8 RONALD A. DEVORE

(4.1) (f,g):= f f(x)g(x)dx. JI

Now suppose that Xn is an n-dimensional subspace of X and we wish to

compute the best approximation to f€ X from X . We take a basis <f>-,...,<|> for

X which satisfies the orthonormality conditions

(4.2) <*i'V = hy i'* - 1>--->n»

with 6i. the usual Kronecker 6 notation. The best approximation <|>* from X to

f is then given by

(4.3) <f>* = £ (f,*k)<i>k.

In f a c t , s ince f-<f>* i s orthogonal to each <f>v, k = l , . . . , n , i t i s or thogonal to

every <|>€X . Therefore, we have

||f-<f>*-<|>||2 . ( f -**-+, f _$*_$) = (f-4>*,f-<|>*) + («,,*) > \\f-t*\\2.

for a l l <f>€ X , which c l e a r l y says that <|>* i s the best approxiamation to f from n

X . n

For example, when X=L2(ir), the space of 2n-periodic square integrable

functions, then the best approximation to f€L«(]T) by trigonometric polynomials

of degree at most n is S (f), the n-th partial sum of the Fourier series of f:

(4.4) Sn(f,x):= a /2 + £ (a, cos kx + b,sin kx), o

i rn i rn

a k : = H f ^ cos kx dx 5 b k : = i f^x^ sin kx dx*

In this case, the error of approximation is simply f J] (a,+b,) ] ^ n+1 K K ;

1/2

For approximation in the space C(I), there are only a few special cases

where best approximants can be computed exactly. The simplest of these is for

approximation by constants. For any feC(I), its best approximation from Il0 is

a:= %(m+M) with m the minimum of f on I and M the maximum of f on I. Indeed

since f takes on both its maximum and minimum on I, f-a has two alternations

and the Chebyshev criterion of Theorem 2.3 shows that a is the best

approximation.

APPROXIMATION OF FUNCTIONS 9

A similar result holds for the approximation of a convex (or concave)

function f by linear polynomials. If Q is the linear polynomial which

interpolates f at the end points of I, and M:= ||f-Q||, then P*:= Q-M/2 is the

best approximation to f from n^.

Another very important example is the approximation of xn by polynomials

of degree < n. This problem was solved by Chebyshev and gave rise to a very

important sequence of polynomials which bear his name.

5. Chebyshev Polynomials. We take I:=[-l,l]. To find the best approximation

to xn from nn_-p we need only find a polynomial Q(x) = xn + an_^x

n + ••• + a

such that Q alternately takes on the values ±||Q|| at least n+2 times on I.

From Theorem 2.3, xn-Q is the best approximation to xn from IIn_^. Now, the

trigonometric polynomial cos n9 has such alternation properties. Recalling our

discussion in &3 of the transformation x= cos 9, we see that C_(x):=

cos(n arccosx) is an algebraic polynomial of degree n which has norm one

and has the required n+1 alternations; namely, Cn(xjc) = (-l)n~k, for

Xk.:= cos (n-k)n/n, k=0,...,n.

Now the polynomial Cn is not quite the Q we are looking for since it does

not have leading coefficient one. But it is easy to compute the leading

coefficient of Cn. For this we use the recurrence relation

(5.1) Cn(x) = 2xCn_1(x) - Cn_2(x)

which follows from the corresponding trigonometric identity.

Since CQ(x)s 1 and C^(x)s x, it follows by induction from (5.1) that

(5.2) Cn(x) = 2 n _ 1 xn + lower order terms.

Hence Q(x):= 2~ n + 1 Cn(x) is our sought after polynomial and P*(x):= xn-Q(x)

is the best approximation to xn from Iln^. This also give the error of

approximation En(xn) = 2~ n + 1.

We do not have time to go into all the wonderful properties of Chebyshev

polynomials but we should mention one of their other applications to estimating

the size of En(f). Let

(5.3) xk:= cos (2k-l)rt/2n, k=l,...,n,

be the zeros of Cn. If feC[-l,l], we let P(x):=P(f,x) be the polynomial of

degree n-1 which interpolates f at the points x^, that is, P(x^)=f(x^),

k=l,...,n. The existence and uniqueness of P is well known and equivalent to

the non-vanishing of the Vandermonde determinant. Also, one can represent the

error (see [B,p. 9]) of interpolation by

10 RONALD A. DEVORE

(5.4) E(x):= f(x) - P(f,x) = ^ ^ f(n+1)(£;x) (x-x^.. . (x-xn).

We recognize that (x-x-^)...(x-xR) = Cn(x), so that the right side of (5.3) does

not exceed ||f(n)|| 2" n + 1/ (n+1)!. This gives THEOREM 5.1 (Bernstein). Tt f has n continuous derivatives, then

En(f) < ||f<n+1>|| 2- n + 1/ (n+1)!.

6. Interpolation. Usually, we cannot determine the best polynomial

approximants for a given feC(I). Instead, we look for polynomials which are

"good" rather than best approximations. The typical way of constructing such

polynomials is to find linear operators Ln which map C(I) onto IIn and have good

approximation properties. One posibility (others are considered in the next

section) is for Ln to satisfy

(6.1) ||f-Ln(f)|| < cn En(f).

with cn a constant which may depend on n. Then, except for the constant cn,

the polynomial Ln(f) is just as good an approximation to f as is the best

approximation. Of course, the smaller the constant cn, the better the operator

Ln and therefore we would like to find Ln which will make cn as small as

possible. It turns out, as we will explain in a little more detail shortly,

that the best constants cn behave like const, log n; in particular, they tend

to infinity with n. Thus unfortunately, the cn in (6.1) cannot be replaced by

a constant c which is independent of n.

Finding operators L which satisfy (6.1) is intimately connected with the

construction of projectors onto the space IIn. In fact Ln satisfies (6.1) if

and only if it is such a projector, that is, if and only if LR(P)= P for all

Pe IIn. Indeed, if (6.1) holds and f is in IIn, then En(f)=0 and hence Ln(f) = f.

On the other hand if L is such a projector then for any fsC(I) and Pell , we

have

(6.2) ||f-Ln(f)|| = ||Ln(f-P) - (f-P)|| < (HLJI+I) ||f-P||,

where ||Ln||:= sup ||Ln(f)||/||f|| is the norm of Ln onC(I). feC(I)

Taking an infimum over all P in (6.2) shows that (6.1) holds with cn =

imji+i. The smallest constant cn which can be used in (6.1) is roughly speaking

||Ln||. We have seen that we can always take cn < ||Ln||+1. On the other

hand for some appropriate f, and with I the identity operator, we have

APPROXIMATION OF FUNCTIONS 11

|f-Ln(f)|| = ||I-Ln|| ||f|| > (||Ln||-l)||f|| > (||Ln||-l) En(f).

Hence, whenever (6.1) holds, we must have cn > ||Ln||-l.

This means that to make cn small, we had better make ||Ln|| small. We

are therefore led to the problem of constructing Ln with the smallest possible

norm. This turns out to be a very difficult problem which is only solved in the

special cases n=0,l. Nevertheless, it is possible to construct operators Ln

which have close to the smallest possible norm. One of the simplest and most

important methods of doing this is to use polynomial interpolation.

If X: x ,...,xn are n+1 points from the interval I and fe C(I), then

there is a unique polynomial Pn(f):= Pn(f,X) which interpolates f at the points

in X. In fact, we have the Lagrange representation for Pn:

n .ILXx-x.:) (6.3) Pn(f,x) = J f(xk)lk(x) ; lk(x):=

3 n K( x , _ x < ) •

k=o 3 A K k 3J

Then PR is a linear operator which is a projector from C(I) onto IIn. It is

simple to compute the norm of PR:

(6.4) MPnll= max |A(x)|. X6l

where

n (6.5) A(x):= A(X,x):= £ |lk(x)|

k=o

is called the Lebesgue function of Pn.

There is no simple description of interpolation points X which will

minimize ||Pn||; however, the work of Kilgore [K] and de Boor-Pinkus [B-P] give

their uniqueness and some of their properties. The most obvious choice of

interpolation points is to space the x^ equally in the interval I. But

disappointingly, the norms of the resulting projector are then very large, in

fact they grow exponentially with n. A much better choice for interpolation

points X is the zeros of the Chebyshev polynomial Cn given in (5.3). In fact,

with this choice, ll^nll - (2/n) log n + 1 , n=l,... . Hence, this projector

has within constants the smallest possible norm. With this, we have

THEOREM 6.1. Ijl Pn is the projector corresponding to interpolation at the

zeros of the Chebyshev polynomial Cn, we have I|PnlI < (2/n) log n + 1 .

For any fe C(I),

12 RONALD A. DEVORE

(6.6) ||f-Pn(f)|| < [(2/n)log n + 1] En(f) n=l,2,... .

For a proof of this theorem, we refer the reader to the book of Rivlin

[R, p.18] on Chebyshev polynomials.

For n small, log n is not too large so that the approximation P

comparable with the best approximation. On the other hand, there are functions

f for which the right hand side does not tend to 0 and even more to the point

for which Pn(f) does not converge to f. So in spite of the attractiveness of

polynomial interpolation, this type of approximation can not even give a proof

of the Weierstrass theorem.

7. Degree of approximation. We have yet to discuss the behavior of En(f). We

expect that the nicer the function f then the faster En(f) converges to zero.

One result in this direction is the following:

THEOREM 7.1. TE f ij3 r times continuously dif ferentiable on I=[-l,l], then

(7.1) En(f) < Cr ||f(r)|| n"r, n=l,2,... .

Thus for example, we know that En(f) tends to zero at least as fast as 1/n o

whenever f is differentiable, 1/n when it is twice differentiable, and so on.

Estimates of the type given in Theorem 7.1 have a rich history. The first

results of this type were given at the beginning of this century by Bernstein

[Be]. Later, Favard [F] found the best constant Cr. Jackson [J] then refined

(7.1) by using subtler measures of the smoothness of a function f such as its

modulus of continuity oo(f,t) defined for fsC(I) by

co(f,t):= sup |f(x)-f(y)|. |x-y|<t x,ysl

THEOREM 7.2 (Jackson) Let r= 1,2,... . If f is r times continuously

differentiable, then

(7.2) V f > < Cr n~ r fcKf^^n"1), n=l,2,... .

The continuity of f insures that oo(f,t)->0 as t-K) and therefore (7.2) with r=0

shows that En(f)-K), n->». Hence (7.2) contains the Weierstrass theorem as well.

There are now several different techniques for proving Jackson's theorem.

One of the most important is to use the transformation x= cos 9 as described in

§3. If fs C(I), then g(9):= f(cos 9) satisfies co(g,t)< co(f,t) and if f is r

times continuously differentiable so is g. Using these ideas, Theorems 7.1 and

7.2 follow from their counterparts for trigonometric approximation.

APPROXIMATION OF FUNCTIONS 13

To approximate a function ge C (T), we can use convolution operators.

Namely, if KR is a trigonometric poynomial of degree n, then

(7.3) Ln(g,9):= g*Kn(0):=2i jg(9-t)Rn(t)dt

is likewise a trigonometric polynomial of degree n. In order that L preserve

constant functions, we shall require that

(7.A) jY(t>d t = 2rc n - '

n It is also convenient to take K non-negative and even.

If we want L (f) to provide a good approximation to f, the kernel K

should concentrate its mass near the origin (similar to the delta function).

In fact, in order to

it is enough to have

In fact, in order to prove Theorem 7.1 for r=0 or (7.1) for r=l by using L ,

Jl o 2

(7.5) [ sin2t/2 K (t)dt < const, n ^n n

Let us indicate how (7.5) gives a proof of these results. From (7.5) and

the Cauchy-Schwarz inequality for positive functionals, we find

pJt pH r pH o -.1/1 r

(7 .6) | t |K ( t )dt < il | s in t / 2 | K ( t )dt < n s in z t / 2 K ( t )d t < -J.TT « T T n L J_ n j n

Now if f is continuous and M:= ||f'||, then |f(G-t)-f(9)| < M |u|. Since

L (f(9),9) = f(9) (because L preserves constants), we have

(7.7)|Ln(f,9) - f(9)| < ji J |f(9-t)-f(9)| Kn(t)dt < J M|t| Kn(t)dt < CM/n,

which is (7.1) when r=l.

In this same way, we can also prove Theorem 7.2. This requires the

inequality w(f,t) < (nt+l)«(f,1/n), t>0, which follows from the

subadditivity of w: (w(f,t-+t2) < w(f,t-.) + w(f,t«)). Using this and the

inequality |f9-t)-f(9)| < co(f,|t|) as in (7.7) gives Theorem 7.2 because of

(7.4) and (7.6).

There are many choices of kernel K which satisfy (7.4). Indeed, since 2 n

sin t/2 = V2(l-cos t), (7.4) can be restated as a condition on the first

Fourier coefficients $n(l) = f e_ltKn(t)dt of Kn, namely: « TT

14 RONALD A. DEVORE

(7.8) l-£n(l) < C n'2.

Thus any positive trigonometric kernel K which have integral one and

satisfies (7.8) has the desired properties.

One of the simplest example of such a kernel was given by Jackson:

4 K n ( t ) = Xn ( sin T/l ) ' m : = (n/4l+1> n=l,2,...,

with Xn chosen so that Kn satisfies (7.4). One easily shows that X^ ~ n and

then deduces (7.4) (see [L, p.55]).

8. Piecewise polynomial approximation. Polynomials are not usually the best

choice for applied problems or numerical computation. For one thing it is not

easy to evaluate a polynomial of high degree. Piecewise polynomial functions

of low degree are much more desirable in computation. Indeed, it is the case

that most numerical algorithms are based in one sense or another on some form

of piecewise polynomial approximation.

We consider once again the interval I:=[-1,1] and let

T: -l=tQ<t1<..•<tn=l be an increasing sequence of points from I. Then T

partitions I into n intervals I -: = [ t-t- , t j), j=l,...,n-l, In: = ( tn_p t n]. By

S (T) , we denote the piecewise polynomial functions of degree r-1 on T. That

is, Ss S (T) means that S(x)= P^(x), xsl •, with P- a polynomial of degree < r

for j=l,...,n. Sometimes we are only interested in functions from S (T) which

have some prescribed continuity at the points t . These are called spline

functions. For example the continuously differentiable functions in -S^(T) are 1 2

called C cubic splines; those that are twice differentiable are called C

cubic splines, etc. The simplest spline functions are the truncated power

functions (x-c)^, k=0,... with

x * ( 0, x < 0

\xk, x > 0

To describe the error in approximation by piecewise polynomials it is

enough to consider how the error in polynomial approximation on a subinterval

J:=[a,b] of I depends on the length of J. For this, we let

(8.1) Er(f,J):= inf ||f-P||(J) Pen -

r-1

with the norm being the sup norm (on the interval J), as usual.

Now if f has r continuous derivatives we can form the Taylor polynomial

Ta of f at a:

Ta(x):= f(a) + f'(a)(x-a) + ... + f(r-1>(a)(x-a)r/(r-l)!.

APPROXIMATION OF FUNCTIONS 15

We have the well known error formula for

(8.2) f(x) - Ta(x) = ( ^ ) r J f(r)(t) (x-tj^dt.

If in the integral on the right side of (8.2), we replace f(r) by ||f(r)|| and then integrate, we see that

(8.3) E r(f,j) < 4 r Mf ( r ) i i u i r ,

with |J| the length of J. Thus, as the length of J tends to zero the error of

approximation E (f,J) goes to zero like |J|r.

The simple inequality (8.3) already tells us a lot about approximation by

the elements of S (T). Namely, if

6T:= max |I.|,

l<j<n

then we have

THEOREM 8.1. IjE f has r continuous derivatves on I, there is a piecewise

polynomial Se 5 (T) such that

(8.4) l|f-s||(i) < 4 l M f < r )M $r

Indeed, we can define S on the interval I. to be the Taylor polynomial of f

for the left end point of I. so that (8.4) follows from (8.3). 1 (r)

Sometimes it is useful not to assume that f is continuous but only that f ~ ' is absolutely continuous and f ' is in L for some Kp<>. In

P this case, if we apply Holder's inequality to the integral in (8.2), we find

(8.5) Er(f,J) < j ||f(r)||p(J) |J|r"1/p.

Hence, there is a spline Sc£ (T) which satisfies

(8.6) l|f-S||(D < j ~ i Mf ( r )M p(D 4"1/P-

There are a variety of other estimates (see [S]) for the error in spline

approximation. For example, E (f,J) can be estimated by const. «(f,6T) or (k) k r

by const. ||fv '|| §T, for any 0<k<r. Remarkably, it is also possible to

prove these same estimates for approximation by spline functions which have

smoothness. For example, we have

16 RONALD A. DEVORE

THEOREM 8.2. If f has r continuous derivatives on I, there is a spline

function S€ 5 (T) which has r-2 continuous derivatives and satisfies

(8.7) l|f-S||(I) < C ||f(r)||(I) $£,

with C depending only on r.

There are several techniques for proving estimates like (8.7). When r=l,

(8.7) follows from (8.5) since there is no continuity prescribed. When r-2,

the case of approximation by piecewise linear functions, we can take S as the

continuous piecewise linear function in -^n(T) which interpolates f at each of

the t., i=0,...,n. Interpolating splines can also be used for other small

values of r but the question of where to place these points gets more and more

sticky as r increases and is still not solved for general r.

A more successful method to prove (8.7) was introduced by de Boor and Fix

[B-F]. It uses certain linear operators LT called quasi-interpolants. LT is (r-1)

a projection from C(I) onto 5^(T)A Cv '. While Lj,(f) does not interpolate

f in the usual sense, it uses only a finite number of values of f (hence the

name quasi-interpolant). Quite surprisingly the norms of the projectors L»T

are bounded independent of T. This is in stark contrast to polynomials where

as explained in §6, the norms of projectors onto II must tend to infinity with

increasing n. Using the fact that the L™ are bounded, one proves (8.7)(see

Unfortunately, we do not have time to describe these powerful

approximation methods in more detail but certainly they will be brought up in

other lectures. The reader should also consult the books of de Boor [B] and

Schumaker [SJ.

9. Non-linear approximation. Up to this point, we have only discussed

approximation by elements from a linear space (polynomials or trigonometric

polynomials). But many other families of functions used in approximation are

not linear spaces. For example, we have the set R of rational functions R

of degree < n or the set ^ = S , °f a H piecewise polynomials of degree k

which have n pieces.

Approximation by such non-linear families can sometimes give dramatic

improvement in the error of approximation. For example, approximation to the

function f(x)*=|x| was extensively studied by S. Bernstein [Be] because it is

prototypical for polynomial approximation to one time differentiable

functions. Bernstein showed that for the error E (f) of polynomial

approximation, we have lim n E (f) exists. Hence E (f) behaves like n->» n n

const./n as n-*30. It was therefore a great surprise when D.J. Newman [N]

18 RONALD A. DEVORE

favor in so-called adaptive methods which find a partition of I into interval

I. by subdividing. For example, the adaptive analogue of (9.1-2) would proceed

as follows. We would choose some tolerance s>0 which is the error we are

willing to accept in the approximation. We call an interval J "good" if

Int(f',J) < s. Otherwise, J is "bad". We are looking to generate a set G of

disjoint good intervals which are a partition of I. If I itself is good, we

can simply take G = {I}. On the other hand if I is bad, we divide I in half

producing therefore two new intervals. Whichever of these two intervals is

good, we put into our set G . The bad intervals are further subdivided. We

continue in this way and the whole process stops when there are no more bad

intervals.

Of interest to us is how many good intervals will appear in the set G .

Birman and Solomjak [B-S] have shown that when f is in L for some p>l then

G contains no more than C /s intervals with C depending only on p. Thus for

example, if we take e=C /n, then G will contain at most n intervals I-,...,I . o s i n

If we then take S as in (9.2), (9.3) will again be satisfied. This means

that by assuming slightly more about the function f, we can approximate f by

the above adaptive scheme to the same accuracy as with the optimal knot

approximation. There are a variety of other results on adaptive approximation

which show that functions with singularities can be approximated better in this

way than by linear methods of approximation (see for example [B-R]).

The piecewise polynomials used in the above approximation are not smooth.

While it is possible to modify these methods so that the resulting piecewise

polynomial has smoothness Cv ' (in the case the piecewise polynomials have

degree r), it is sometimes of interest to approximate f by smoother functions.

It turns out that the same accuracy of approximation is attainable with

rational functions. For example, we have the following result of Popov [P]

THEOREM 9.1. If f's L (I), p>l, there is a rational function R of degree at

most n which satisfies

(9.4) ||f-R|| < c Mf'llp n"1.

There is a simple technique [D2] for deriving (9.4) from (9.3). For this,

we can assume that l|f'llp = 1. Then, as in the derivation of (9.3), we

choose n intervals I., such that

(9 .5) r | f M p < 1 / n JI.

3

APPROXIMATION OF FUNCTIONS 19

By refining these intervals if necessary, we can further require that |I.| <

1/n and still there are at most 2n of these intervals. We let £. be a point in

I. and define 3

• .(x):- ^ 2 ( x - V

2 + | I . | 2

If Y:= Yi *> tnen tne functions

<j>j:= <yY, j=l,...,n

are a partition of unity:

2 ^(x) = 1, xs I.

It can be shown [D9] that for suitably chosen £., the rational function

R:= E fUj) ^

has degree at most 4n and satisfies (9.4).

BIBLIOGRAPHY

[Be.] S. Bernstein, Sur l'ordre de la meilleure approximation des fonctions

continues par des polynomes de degre donne', Memoires publics par la classe des

sci. Acad, de Belgique (2)4(1912), 1-103.

[Be2] S. Bernstein, Sur la meilleure approximation de |x| par de polynomes de

degre's donnes, Acta Math., 37(1914), 1-57.

[B-S] M.S. Birman- M. Solomjak, Piecewise polynomial approximation of

functions of the class Wa, Math. USSR, Vol.2 No. 3(1967), 295-317.

[B] C. de Boor, A Practical Guide to Splines, Applied Math. Science,

Springer-Verlag vol. 27, New York, 1978

[B-F] C. de Boor- G. Fix, Spline approximation by quasi-interpolants, J.

Approx. Th. 8(1973), 19-45.

[B-P] C. de Boor- A. Pinkus, Proof of the conjecture of Bernstein and Erdos

concerning the optimal nodes for polynomial interpolation, J. Approx. Theory,

24(1978), 289-303.

20 RONALD A. DEVORE

[B-R] C. de Boor - J. Rice, An adaptive algorithm for multivariate

approximation giving optimal convergence rates, JAT 25(1979), 337-359.

[D-jJ R. DeVore, Degree of approximation, in: Approximation II, Academic

Press, New York, 1976, pp. 117-161.

[D2] R. DeVore, Maximal functions and their application to rational

approximation, in Approximation Theory, CMS Conference Proceedings Vol. 3,

1983, Amer. Math. Soc, p. 143-155.

[F] J. Favard, Sur les meilleures procede's d'approximation de certaines

classes des fonctions par des polynomes trigonometriques, Bull. Sci. Math.

61(1937) 209-224, 243-256.

[J] D. Jackson, On the approximation by trigonometric sums and polynomials,

TAMS, 13(1912), 491-515.

[K-S] S. Karlin - W. Studden, Tchebycheff Systems: vith Applications in

Analysis and Statistics, Interscience, Wiley, New York, 1966.

[Kr] M.G. Krein, The ideas of P.L. Chebyshev and A.A. Markov in the theory of

limiting values of integrals and their further developments, Amer. Math Soc.

Translations, Ser. 2, 12, 1-122.

[Ki] T.A. Kilgore, A characterization of Lagrange interpolating projections

with minimal Tchebycheff norm, J. Approx. Theory 24(1978), 273-288.

[L] G.G Lorentz, Approximation of Functions, Holt, Rinehart and Winston, New

York, 1966.

[N] D.J. Newman, Rational approximation of |x|. Michigan Math. J. 11(1964),

11-14.

[P] V. Popov, Uniform rational approximation of the class V and its

applications, Acta Math. Acad. Sci. Hung. 29.U977), 119-129.

[R] T. J. Rivlin, The Chebyshev Polynomials, Interscience, Wiley, New York,

1974.

[S] L. Schumaker, Spline Functions: Basic Theory, Interscience, Wiley, New

York, 1981.

[W] K. Weierstrass, Uber die analytische Darstellbarkeit sogenannter

willkurlicher Functionen reeller Argumente, Sitzungberichte der Acad. Berlin

(1885), 633-639, 789-805.

Proceedings of Symposia in Applied Mathematics Volume 36, 1986

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN

E. B. SAFF1

ABSTRACT. Approximation theory in the complex variable setting has its roots in classical function theory, but is rich in modern applications. Moreover, it is a subject that lends much insight into real approximation problems. Starting with the example of Taylor series, we describe methods (such as Faber series and inter­polation) for generating good polynomial approximants to a function analytic on a compact set in the plane. We also discuss character­izations for polynomials of best uniform approximation and the "near circularity property." An introduction is given to the theory of Pad£ approximants, which are rational function analogues of the Taylor sections. We conclude by discussing some contrasts between the theories of polynomial and rational approximation.

1. TAYLOR SECTIONS.

The properties of the Taylor sections for an analytic function are a

convenient starting point for approximation and interpolation in the complex

z-plane (denoted by (D). This is because Taylor sections are least squares

polynomial approximants as well as interpolating polynomials. Indeed, if

f is analytic at z = 0, then the Taylor sections

(l.D s n ( z ) = s n ( f ; z ) : = i : ^ M z k n n k=Q K.

satisfy the interpolation conditions

(1.2) sjj)(0) = f ( j )(0), j = 0,1 n.

2 Moreover, the polynomials 1, z, z ,... are orthogonal with respect to the inner

product

(1.3) (g,h) := J L /g(z)hTzT|dz|, Cf : |z| = r, Cr

and, if f is analytic on |z| <_ r,

1980 Mathematics Subject Classification 41A10, 41A20, 41A21. •'•Supported by the National Science Foundation.

© 1986 American Mathematical Society 0160-7634/86 $1.00 + $.25 per page

21

http://dx.doi.org/10.1090/psapm/036/864364

2 2 E . B . SAF F

(1-4) ( f . Zk )=^ fr

fM*W'Kr L L (z)r 2 k rdz

C -"' ->C zk i Z

r2k 1 f f_Mdz = r2k fW(0) 2TT 1 y r k+ 1 k!

L r z

Thus , th e leas t square s (bes t L ) polynomia l approximatio n to f ou t o f n on

the c i r c l e C i s

k=0 (z K ,z K ) k= 0 / K k ! n

Here and below, n denotes the collection of all algebraic polynomials (with

complex coefficients) of degree at most n.

Another significant property of Taylor sections is that they provide

minimal projections onto n with respect to the sup norm.

Definition 1.1. Let A(A) denote the collection of functions f that are

analytic in the open disk |z| < 1 and continuous on the closed disk

A : |z| <.l. A projection P : A(A) -* nn is a bounded linear operator such

that P2 = P and P = I on n

Endowing A(A) and n with the sup norm

(1.6) ||f|| := sup (|f(z)|: z € A},

we expect to find "near best" polynomial approximants to f on A by utilizing

a projection with smallest possible norm. It was shown by Geddes and Mason [21]

that this minimal projection is the Taylor projection:

n ^r(k)/nx .

(1.7) (Snf)(z) := s_(f;z) = E L

TTm * •

n n k=Q K.

Namely, they proved

Theorem 1.2. Let P be any projection of the space A(A) onto the subspace

n . Then, for the operator norm induced by the sup norm over A, we have

(1-8) | | 5 n | | < | | P | | ,

where S is the Taylor projection of (1.7).

The proof of this theorem follows from the clever observation that for

any projection P

(1.9) ( V ) W - ^ T {t|=1(AtPAtfHz>^'

where At is the shift operator defined by (Atf)(z) := f(tz).

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 23

What can be said about the rate of convergence of the Taylor

sections? The answer is intimately related to the familiar Cauchy-Hadamard

formula for the radius of convergence p of a power series ]T, cu z • k=0

That is, K u

(1.10 ) - = l im sup |c . | 1 / k . p k+<* > K

The basic convergence result is the following.

Theorem 1.3. Let f be analytic in an open set that contains the closed unit

disk A. Then for the sup norm (1.6), the Taylor sections s satisfy

(1.11) lim sup ||f - s ||1/n = 1/p < 1,

where p is the radius of the largest open disk centered at the origin

throughout which f has a single-valued analytic continuation. Moreover, the

sequence s converges to f for |z| < p.

The above theorem, which provides a model for more general results to

be mentioned later, nicely illustrates the relationship between the degree of

convergence and the maximal circular region of analyticity for f; that is, the

larger this circular region, the faster the convergence. In particular, for

entire functions f

(1.12) lim ||f - sn||1/n = 0.

n->oo

While the proof of Theorem 1.3 can be deduced via (1.10), it is more

instructive to give an argument based on the interpolation property (1.2) of

Taylor sections. For this purpose we appeal to the Hermite representation

(cf. Walsh [62, §3.1]) for interpolating polynomials.

Lemma 1.4. Suppose f is analytic inside and on the simple closed contour r

that surrounds the n + 1 points z ,z..,...,z . JLf p is the unique

polynomial in n that interpolates f in these points, then

(1.13) f(z) - p(z) = T f$ffi&J d t ' Z ^ ^ r ' n

where w(z) := n (z - z, ) . k=0 K

Proof. Replacing f(z) by its Cauchy integral representation

f(z) = -^j- f^H: dt, z inside r, f(t) 1

'r

24 E. B. SAFF

equation (1.13) becomes

(i-i*) P(z)=^r/:|ti(t-(zi^)dt, an z. r

From (1.14) we see that p is indeed a polynomial in n and from (1.13) that it interpolates f in the points z. (the zeros of w(z)). •

It is important to keep in mind that (1.13) is valid even when the points z. are not distinct; in such a case interpolation is meant in the Hermite sense. That is, if z. is repeated I times, then

p ^ ( z k ) = f^'(z^) for j=0,l,... ,£-1. In particular, since the Taylor section s interpolates in the origin of multiplicity n + 1, equation (1.13) gives

(1.15) f(z) - sn(z) = ^ f 0 ^ - * t , |z|<r, • |t|=r * ( t " z )

for any r such that f is analytic on |z| < r. With the assumptions of Theorem 1.3, we deduce from (1.15) that

(1.16) lim sup ||f - sn||1/n < 1/p,

n->«>

and that the sequence s converges to f in |z| < p. If strict inequality holds in (1.16), then

lim sup |f(n)(0)/n!|1/n = lim sup ||s - s jll^11 < 1/P,

which implies that the sequence s (the Taylor expansion for f) converges to an analytic function in some disk |z| < R, with R > p. As this violates the definition of p, the equality of (1.11) follows.

As is often the case where n-th root asymptotics are concerned, p

results that hold for best L polynomial approximants (such as the Taylor sections) are also valid for best L^, 1 < p < °°, approximants. For example, Theorem 1.3 holds if the sections s are replaced by the polynomials p* of best uniform approximation to f on A.

Many of the elegant properties of Taylor sections can be found in the book of Dienes [14]. We mention only one more fact concerning the behavior of the zeros of Taylor sections for the case when the radius of convergence p is finite and positive. Namely, Jentzsch proved [14, p.352] that every point of the circle of convergence |z| = p is a limit point of the set of zeros of the

n4-1

sequence {s }°\ The zeros of the partial sums s (z) = (z - l)/(z - 1) of f(z) = 1/(1 - z) provide a simple illustration of this theorem.

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 25

2. POLYNOMIAL APPROXIMATIONS FOR FUNCTIONS ANALYTIC ON E.

Given a compact set E in the z-plane and a function f analytic on

E (i.e., f is analytic on an open set G D E ) , how do we generate good

polynomial approximations to f on E? When E is a closed disk, we can use

Taylor sections which are "good" in the sense of Theorem 1.3. For general sets

E we need a procedure that likewise reflects the geometry of E.

First, we insist that E does not separate the plane; that is, (C\E

is connected. This assumption is necessary if we expect to get uniform

convergence (of polynomials) to an arbitrary function analytic on E. For

example, the function f(z) = 1/z is analytic on the circle E : |z| = 1, but

is not the uniform limit on E of any sequence of polynomials because (by the

maximum principle) uniform convergence on |z| = 1 implies convergence to an

analytic function throughout |z| < 1.

The connectedness of C\E is also a sufficient condition for

polynomial approximation to functions analytic on E as is stated in the

following version of the classical Runge's theorem (cf. [62, §1.10]).

Theorem 2.1. lf_ f is analytic on a compact set E that does not separate the

plane, then there exists a sequence of polynomials that converges uniformly to

f on E.

(The question of polynomial approximation to functions not analytic

on E is much more delicate and will be addressed in the next section.)

To prove Theorem 2.1, Runge's approach was to first form Riemann sum

approximations to the Cauchy integral representation for f. These Riemann sums

are rational functions whose poles lie outside E. Through a process of "pole

moving," the rational approximants are converted to polynomial approximants.

For reasonable sets E, we can generate polynomial approximants

more directly by constructing an analogue of Taylor series. This was the

fruitful approach taken by Faber [15]. To simplify the description of Faber's

method we assume that E is a compact set (not a single point) whose complement

C*\E with respect to the extended plane is simply connected. The Riemann

mapping theorem asserts that there exists a conformal mapping w = <j>(z) of

C*\E onto the exterior of the unit circle in the w-plane (see Figure 2.1). We

can insist that <}>(«>) = «> and $l («>) > 0 so that, in a neighborhood of

infinity,

(2.1) *(z) = f + b 0+ ^ + -|+ •••> c > 0.

26 E. B. SAFF

Figure 2.1 ( ^ \ y [ h ][\

^ w-plane

The polynomial basis {wn}°° for Taylor expansions in the w-plane now corresponds to the functions {cf)(z)n}°° in the z-plane. The obvious fly in the ointment is that the latter functions are not (in general) polynomials. However, <J>(z)n does have a polynomial part that will serve our purpose. Indeed, from (2.1), we get

(2.2) *(z)n = ( 4 + ' ' -) + iMn(z)

= F n(z) +iM n(z),

where F (z) = zn/cn + • • • e n and M (z) is analytic at infinity. We call F the n-th degree Faber polynomial for the set E, but caution the reader that many authors reserve this terminology for its monic brother c Fn(z).

For a function f analytic on E, our goal is to obtain an expansion of the form

(2.3) f(z) = a0FQ(z) + a ^ U ) + a2F2(z) + . • . .

For this purpose, it is convenient to introduce the inverse mapping of <{>, denoted by z = ip(w), and the level curves

(2.4) rr : |*(z)| = r (r > 1)

which are images under ty of the circles C : |w| = r (see Figure 2.1). Since F is the principal part of the Laurent series (2.2) for <|>(z)n, we can write

r

and transforming to the w-plane we obtain

Lr To derive the expansion (2.3) we begin with the Cauchy integral

representation for f(z):

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 27

rr Lr

Since f(^(s)) is analytic in an annulus of the form 1 < |s| < R, we can

expand this function in a Laurent series:

f(*(s)) = £ ansn.

n=-oo

Substituting this series into (2.6) and recalling (2.5) we get

00 a r n i / \ . °° a r n • / \ • °° ( 2 - 7 » fM - „ L rf /Cr * * $ $ - n?0 KT fCr * j m • „?„ * • ' (the integrals with negative n vanish because the integrand is 0(l/s )

near »).

To summarize, we obtain the Faber expansion for f by forming the

Taylor series for the Cauchy integral of the composition foif; and substituting

FR for w . The process is diagrammed below.

f ( z ) _ ( f 0 , ) ( w ) _ _ i T ^ I f i l M d s = ± a / _ | : anFn(z)

Exploiting the relationship between Taylor and Faber series leads to

the following analogue of Theorem 1.3.

Theorem 2.2. Let f be analytic on E and let p(>l) be the largest index

such that f has a single-valued analytic continuation throughout the interior

of the level curve r . Then the partial sums of the Faber series for f

satisfy

(2.8) lim sup ||f - £ akF.||J/n = 1/p < 1,

n+oo k=0 K K b

where || • || denotes the sup norm on E. Moreover, the Faber series converges

to f throughout the interior of r .

What does Theorem 2.2 say to realists who do approximation on an

interval? If E = [-1,1], then ^(z) = z+\z - 1 is just the Joukowski

transformation with inverse

(2.9) ip(w) = |(w + w " 1 ) .

Fo r n > 1 , th e polynomia l par t o f <f>(z) n i s th e sam e as th e polynomia l par t o f

<Kz) n + cKz)- n = wn + w" n

28 E. B. SAFF

i A which reduces to 2cos ne when w = e . Thus the Faber polynomials are (apart from a multiplicative constant) the same as the classical Chebyshev polynomials T , and the Faber series reduces to the (orthogonal) Chebyshev expansion! For the Joukowski transformation, the level curve r is an ellipse with foci ±1

1 and semi-major axis of length (r + r )/2. Hence Theorem 2.2 asserts that the

Chebyshev expansion for a function f analytic on [-1,1] will converge to f throughout the largest ellipse with foci ±1 in which f is analytic.

A more in-depth discussion of Faber series and Faber transforms is given in [13], [17], [22], [49], and [ 2 ] . The reader will find the subject rich in applications to geometric function theory.

Polynomial approximants can also be constructed via interpolation. As we observed, the Taylor section s (f,z) of (1.1) interpolates f in the origin or, more precisely, in the zeros of the polynomial w (z) = z . Since the mapping function <j> for the disk AQ : |z| <_ c is just cj)(z) = z/c, the w (z) trivially satisfy

(2.10) lim |w (z)| 1 / n = c|4>(z)| n->~ n

In fact, this asymptotic relation,when used to estimate the integral in the

Hermite interpolation formula (1.15), is all that is needed to prove the

convergence assertions of Theorem 1.3. For more general compact sets E (with

(D*\E simply connected) this suggests that we determine a triangular scheme of

points for E

a(0)

(2.11)

,(D ft(D

R(n) An) (n) e0 » 31 ' " ' " ' 3n

n , N

such that w (z) := kQ 0 (z - 3£ ;) satisfies (2.10) uniformly on compact subsets of C\E, where 4>(z) = z/c + ••• is now the mapping function of (2.1). Coupled with the Hermite interpolation formula this leads to the following result (cf. [62, §7.2]).

Theorem 2.3. If the scheme of points (2.11) of E satisfies (2.10) uniformly on compact subsets of C\E, then the assertions of Theorem 2.2 remain valid when the Faber sections are replaced by the sequence of polynomials p that interpolate f in the successive rows of (2.11).

n-t-1

For the unit disk A: |Z| < 1, the zeros of w (z) = z - 1 (the

roots of unity) provide "good points" of interpolation in the sense of (2.10).

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 29

When E is bounded by a smooth Jordan arc or curve, we obtain good points by taking the images under z = ijj(w) of such equally spaced points on |w| = 1 . For example, if E = [-1,1], the images of the roots of w + i = 0 under the transformation (2.9) yield the zeros of the Chebyshev polynomial T .

There are good points of interpolation that can be determined without knowledge of the mapping function. These are the Fekete points (cf. [62,§7.8]).

Definition 2.4. Let V (z ,Zi z ) := n (z. - z.) denote the Vandermonde i<j ' J

determinant of order n + 1. The points &£ = \ e E for wnicn the maximum

max {|Vn(z0,z1,...,zn)|; zk e E, k=0,...,n}

is attained are called Fekete points for E.

The positive constant c that appears in the expansion (2.1) for the mapping function has great importance; it is called the transfinite diameter or logarithmic capacity of E and is denoted by cap(E). Such terminology arises from an electrostatics problem that we now describe.

For a compact set E (with (D*\E simply connected) we distribute a unit charge over its boundary 3E so that equilibrium is reached in the sense that the energy with respect to the logarithmic potential is minimized. This corresponds to the problem of finding the minimum of the energy integral

(2.12) I[y] := f f log|q - ^ l ^ d y U ^ d y ^ ) r8E J 3E

over all positive unit measures y supported on 3E. The unique measure yE

that minimizes I[y] gives the equilibrium charge distribution with potential

(2.13) UE(z) := f log|z - t|"1dyE(t 3E

Apart from a small exceptional set, this potential has the constant value

I[vc] on the boundary of E. The capacity of E is defined as

(2.14) cap(E) := exp(-I[yE]).

In this context, the essential criterion for (2.11) to be "good points" of interpolation is that the discrete measures

(2-15) U n - ^ i S ^ ) ,

where 6($>n') denotes the unit measure supported at 3^n , converge to the equilibrium measure yF (in the weak-star topology). Such convergence implies

30 E. B. SAFF

that for z € C\E,

(2.16) / log|z - t|_1dyn(t) — U F ( z ) as n-**,

which is equivalent to property (2.10). In this light, the fact that the Fekete

points are good interpolation points seems reasonable since they are defined by

minimizing the energy log(V" ) for n + 1 distinct point charges.

The above discussion applies to more general sets E and these

aspects of potential theory can be found in Hille [27, §16.4], Tsuji [57], and

Landkof [28]. We mention one further characterization of good interpolation

points (cf. [62, §7.4]).

Theorem 2.5. The points 3£n' of E satisfy (2.10) if and only if

lim||wn||l/n = cap(E).

3. POLYNOMIALS OF BEST UNIFORM APPROXIMATION.

Let E be a compact set in the z-plane and f a function continu­

ous on E. Since n is finite dimensional, there exists a polynomial p* e n of best uniform approximation to f on E in the sense that

(3.1 ) || f - p*||E = inf{|| f - q|| £ : q e l y ,

where || ||E is the sup norm over E. Moreover, if E contains at least n + 1

points, then n is a Chebyshev subspace and hence p* is unique (see §2 of

DeVore's notes). A fundamental characterization of best approximation in the

complex variable setting is the Kolmogoroff criterion:

Theorem 3.1. A polynomial P e nn is a best uniform approximation to f on E

if and only if

(3.2) min Re{(f(z) - p(z)) q(z)} <. 0 zcM

holds for every q e n , where M is the set of extremal points for

f(z) - p(z); that is,

(3.3) M := (z € E : |f(z) - p(z)| = ||f - p||E>.

A proof of Theorem 3.1 is given in Meinardus [34, p.15].

For real functions, condition (3.2) asserts that there is jio polyno­

mial in n that has the same sign as the error f - p* on its extremal point

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 31

set. This is the essential fact that is used to prove the Chebyshev Equioscil-

lation theorem. For complex functions, an analogue of the alternating-sign

patterns was developed by Rivlin and Shapiro (cf. [48, §2.6]) and is called the

extremal signature.

Let's turn to the geometric aspects of best approximation. We let

A(E) denote the collection of functions f that are analytic in the interior

of E and continuous on E. If f € A(E) and E is bounded by a Jordan

curve r, then best polynomial approximation to f on E reduces to best

approximation on r; that is, by the maximum principle,

IIf " P l l r= l | f " PH E. a l l P « V

The image of r under f - p is a curve in the w-plane which we denote by

(f - p)(r) and call an error curve. In this context, the problem of best

uniform approximation to f is equivalent to finding an error curve that is

contained in a disk of minimal radius about w = 0.

It had been observed by some authors and crystallized by Trefethen

[53] that the minimal error curve (f - p*)(r) often has a near circularity

property in the sense that it winds around the origin n + 1 times and is close

to being a perfect circle. Before proceeding with a discussion of this phenome­

non we give a consequence of perfect circularity.

Lemma 3.2. Suppose E is bounded by a Jordan curve r, f e A ( E ) , arid p e n .

If the error curve (f - p)(r) is a perfect circle with center at the origin

and winding number >. n + 1, then p is the polynomial of best uniform approx­

imation to f mi E out of n .

Proof. If, to the contrary, there exists q € n such that ||f - q||E < ||f - p||E,

then

|(f - p)(z) - (q - p)(z)| = |(f - q)(z)| < ||f - p||E = |(f - p)(z)|

for all z on r. By Rouchg's theorem, this means that q - p and f - p

have the same number of zeros interior to r. But since this number is at least

n + 1 and q - p c n , we arrive at a contradiction.D

As a simple application of Lemma 3.2, consider the problem of finding

the polynomial in n that is of best uniform approximation to f(z) = zn on

A : |z| <_ 1. Since f itself has the perfect circularity property, then

p* = 0. In other words, the Chebyshev polynomials for the disk A are just the

powers of z.

Using finite Blaschke products we can produce other examples of per-

32 E. B. SAFF

fectly circular error curves, but only for certain rational functions f. (The

reader is invited to determine the polynomials of best approximation on A to

f(z) = l/(z - a), |a| > 1.)

While near circularity is a property that can be made precise in an

asymptotic sense (cf. Trefethen [53]), its practical importance is in the con­

struction of yery accurate polynomial approximations. The starting point for

this algorithm is an elegant theorem due to Caratheodory and Fejer

(cf. [22, p. 497]).

^ k Theorem 3.3. Given a polynomial p(z) = YJ C U Z > there exists a unique power k=0 K

series extension B(z) = p(z) + J2 ctz analytic in the unit disk A that

k=v+l K

minimizes ||B|| among all such extensions. Moreover, B(z) is a finite

Blaschke product with at most v zeros in the disk.

The solution B(z) to the minimal extension problem of Theorem 3.3

can be computed quite easily. We know that it has the form

ID + . . . + t> zv

(3.4) B(z) = X-^ ?— , X > 0, b + • • • + b zv

o v (k) and that it extends p(z) in the sense that Bv ;(0)/k! = c. for k = 0,...,v.

When the c.'s are real, this system reduces to an eigenvalue problem for a

(v + 1) x (v + 1) Hankel matrix formed from the c. ' s. It turns out that the

constant A (which equals ||B|| ) in (3.4) is the largest of the absolute

values of the eigenvalues of this matrix and that the coefficients b. are

determined by a corresponding eigenvector. For complex coefficients c. , the

procedure is modified by working instead with the largest singular value of the

same Hankel matrix.

How does the CF extremal problem of Theorem 3.3 relate to the problem

of best polynomial approximation on A? Finding the minimal error f - p* for °° k

f(z) = Y au z 1S equivalent to minimizing o K

n . oo

(3.5) | | E c .z k + £ a k z k |L , C : | z | = 1, k=o K k=n+l K L

over all (n + 1) -tuples (c ,...,c ), which is the converse of the CF

problem. Nonetheless, we can utilize Theorem 3.3 by performing a truncation and

an inversion z -> 1/z. Following Trefethen [53], we truncate the given

series for f at k = N so that XI a.z is negligible; that is, we work

N N+1

with X 3|ZK =: zn V z ) instead of X a.zK in (3.5). n+1 K n+1 K

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 33

Next we solve the CF problem for the inverse polynomial

(3.6) p(z) := z ^ ^ q d / z ) 6 1 ^ ,

to obtain the minimal extension Blaschke product

B(z) = p(z) + £ c*zk. k=N-n K

Since

(3.7 ) ||B| | = | |z N B(l /z) | | = | |z n + 1 q(z ) + £ cjj_ k z k + ± c*z N " k | | L L k=0 N k=N+l K L

then discarding the terms involving negative powers of z (which have small

coefficients), we see that the choice

ck = cN-k' k = 0'lj • • • »n,

in (3.5) gives an error curve with a near circularity property.

The polynomial approximants obtained via this CF method are often much

better in the sup norm sense than the Taylor sections. Moreover the technique

can be extended to find near best rational approximants (cf. [54], [56]). The

theoretical underpinnings of the CF method are contained in a paper of Adamjan,

Arov, and Krein [l] who generalized the results of Caratheodory, Fejer, Schur,

and Takagi.

Let's now turn to the question of convergence of approximating poly­

nomials. We naturally ask, what is the extension of the Weierstrass theorem to

the complex setting? Runge's theorem (Theorem 2.1) is not a true generalization

because it assumes far more than continuity - it requires f to be analytic in

an open set containing E. Only in 1951 did the Russian mathematician Mergelyan

confirm the suspicions of many who had worked on the problem by proving that the

assumption on f in Runge's theorem could be weakened.

Theorem 3.4 (Mergelyan [35]). Let E be a compact set that does not separate

the plane. _Tf f € A(E) (that is, f is analytic in the interior of E and

continuous on E), then there exists a sequence of polynomials that converges

uniformly to f on E.

The proof of Mergelyan's theorem (cf. [17], [41]) is a tou/t do. {OHQ.<L

that utilizes the Tietze Extension theorem as well as Koebe's 1/4-theorem.

Observe that the Weierstrass theorem is a special case of Theorem 3.4 because an

interval has an empty interior and so A(E) reduces to the collection of

functions continuous on E.

As an application of Theorem 3.4 we mention the following

34 E. B. SAFF

generalization of the Cauchy integral formula: If r is a rectifiable Jordan

curve and f is analytic in the interior ® of r and continuous on

G u r , then

(3.8) f(z) = 2 T / / = 4 d t ' z e G-

To prove (3.8) we take a sequence of polynomials p that converges uniformly

to f on G u r (the special case of Mergelyan's theorem used here was proved

in 1926 by Walsh [61]). Since the Cauchy integral representation holds for

polynomials, we have for z e G

f(z) = lim pn(z) = 11m ^ f ^ i t - ^ f j U L dt,

as claimed in (3.8).

Results on the rate of polynomial convergence require special

assumptions on the smoothness of the boundary E as well as on the modulus of

continuity of f. For some extensions of the Jackson type theorems, see Sewell

[47].

As the reader might suspect from the results of §2, geometric rates of

convergence characterize the functions that are analytic ii E. Before making

this precise we present a useful lemma dealing with the growth of polynomials.

Lemma 3.5 (Bernstein-Walsh [62, §4.6]). Suppose that E is a compact set (not

a single point), whose complement C*\E is simply connected. If p e n satisfies |p(z)| < M for z cm E, then

(3.9) |p(z)| < Mrn, z on rr (r > 1),

where r is the level curve defined in (2.4).

Proof. We apply the maximum principle to g(z) := p(z)/cj>(z)n, where cf>(z) is

the mapping function of (2.1). Observe that since p 6 n and <j> has a simple

pole at oo, then g(z) is analytic exterior to E, even at «. As z

approaches the boundary of E from the outside, |g(z)| < M; hence |g(z)| < M

for all z outside E. For z e r , the last inequality gives (3.9). •

We can now prove

Theorem 3.6 (Walsh [62, §4.7]). J_et E be as in Lemma 3.5, f a function

continuous on E, and set

(3.10) En(f) :=||f - p*||E,

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 35

where p* is the polynomial in n of best uniform approximation to f on E.

Then f is analytic on E if and only if

(3.11) lim sup E n(f)1 / n < 1.

n -oo

Proof. In one direction the proof is trivial. Namely, if f is analytic on

E, then Theorem 2.2 asserts that the Faber sections and, a fiosutio/U, the

polynomials of best approximation converge geometrically.

On the other hand, if (3.11) holds, then

(3.12) limsup||p*+1 - p*||J/n < 1.

n->oo

Appealing to Lemma 3.5, we deduce that, for some r > 1,

lim sup ||p*+1 - p*||J/n< 1.

n->-°° r

But this means that the sequence ( P * ) Q converges in the interior of r ,

necessarily to an analytic extension of f. •

As with several of the theorems presented, Theorem 3.6 is not stated

in its full generality - the assumption on E can be considerably weakened.

4. PADE APPROXIMANTS.

Polynomials have the advantage of being easy to evaluate. But the

same is true of rational functions. Moreover, rational functions have poles

which can imitate the singularities of a function to be approximated. In this

section we introduce a class of interpolating rational functions called Pade

approximants. These rationals provide a natural extension of the Taylor

sections. (Standard references are [39], [ 4 ] , [5,6]; for a historical treat­

ment, see Brezinski [11].)

Given a formal power series

(4.1) f(z) = E akzk,

k=0 K

we wish to construct a rational function of a certain type whose Taylor co­

efficients match those of f as far as possible. To be precise, let

(4.2) n m s n := {R(z) = P(z)/Q(z) : P € nm, Q € nn, Q i 0}.

Then the matching condition can be stated as follows: For a fixed pair (m,n),

find an R e n „ such that m,n

(4.3) (f - R)(z) = 0(z£)',

36 E. B. SAFF

0

where I is as large as possible. (Here and below, 0(z ) denotes a power

series with lowest order term z .) What is a realistic value for 11 Since

there are m+1 free parameters in the choice for the numerator P, and n+1

in the choice for the denominator Q, there are m+n+1 parameters available in

the ratio P/Q (one parameter is lost in the division process). Thus we expect

to have I >_ m+n+1 or, equivalently, to match the first m+n+1 terms of (4.1).

Unfortunately this is not always possible (try m=0, n=l, and f(z) = z). To

circumvent this difficulty we work, instead, with the following linearized

version of (4.3).

Given (m,n), select Pm„ e n and Q_(20) e n so that mn m mn n

If f is (m+n)-times differentiate at z=0, then (4.4) is equivalent to

(Qm.f - pmn)(k)(°) = °> k=0,1,...,m+n. VMmn mn' x

Notice that (4.4) represents a homogeneous system of m+n+1 equations in m+n+2

unknowns (the coefficients of Pmn and Q m n ) . Hence this system has a nontri-

vial solution, necessarily with Q m n i 0. With this observation we give

Definition 4.1. The Pade approximant (PA) of type (m,n) to f is the rational

(4.5) [m/n](z) :- P . J z J / ^ J z ) ,

where Pm„ e n and QmA?0) € n satisfy (4.4). mn m mn n J

Notice that for n=0, the PA reduces to a Taylor section of (4.1):

m . (4.6) [m/0](z) = E a.zk.

k=0 K

Tacit in Definition 4.1 is the fact that a PA is unique. To prove

this, suppose that

(4.7) (Qjf - P1)(z) = 0(zm + n + 1) and (Q,,f - P2)(z) = 0 ( z m + n + 1 ) ,

where pi» PQ € nm an^ Qi» QQ € nn* ^n "multiplying the first equation in

(4.7) by Q2 and the second by Q ^ we deduce on subtracting that

(4.8) QXP2 - Q2PX = 0 ( z m + n + 1 ) .

But the left-hand side of (4.8) is a polynomial of degree j<m+n. Hence

Q1P2 " Q2P1 ~ ° or Pl/Ql ~ P2 / Q2*

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 37

The Pade numerators and denominators are rich in algebraic properties

such as the 3-term recurrence relations found by Frobenius (see [4 , 24,39] for

a detailed discussion of these properties). Here we pause only to mention a

representation for Q that illustrates the important role played by the

Toeplitz determinants

(4.9) D(m/n)

3m-l

am+l

Vn+1 Vn+2

m+n-1 am+n-2

(a. := 0 if k < 0)

formed from the coefficients of f.

Theorem 4.2. (Jacobi). lf_ D(m/n) f 0, then

(4.10) f(z) - [m/n](z) = 0(z m + n + 1)

and the Padg denominator Qmn normalized by Qmn(0) = 1 js

^ n > Qmn(z) "Dm

am-l

am+l a_

Vn+1 Vn+2 .,n n-1

m+n am+n-l

Vl

A fast numerical method (based on the Euclidean algorithm) for solving

Toeplitz systems and computing PAs is described in [10].

The PAs for (4.1) are typically displayed in a doubly infinite array

known as the Pade table:

[0/0]

! [o/i] [0/2]

[1/0]

[1/1]

[1/2]

[2/0] •

[2/1] •

[2/2] •

• •

38 E. B. SAFF

Here the first row lists Taylor sections; the 2nd row consists of PAs with at most one pole; the 3rd row consists of PAs with at most two poles; etc. The structure of this table was the subject of the 1892 thesis of E. Pade. He showed that the table breaks up into square blocks of identical entries, with the common entry not appearing elsewhere in the table. When all blocks are of size one, i.e., no entry is repeated, the table is said to be normal. Normal tables arise when all Toeplitz determinants D(m/n) are nonzero. It is possible for a Pade table to contain an infinite block of identical entries, but (as shown by Kronecker) such a table arises only for the power series of a rational function.

Of special interest in the Pade table are the diagonal entries, for these represent continued fraction expansions. Indeed, if

^ k d i z

(4.12) f(z) = E a / = dn + -i ,

0 K U 1 + d2z

1 + .

then an inductive argument shows that the successive truncations dQ , dQ + d,z, d~ + d,z/(l + d 2z), etc. are rational functions that have maximal contact with f at the origin. In other words, these truncations give the PAs

[0/0] , [1/0] , [1/1] ,..., [n/n] , [(n+l)/n] ,...,

which form a staircase of entries in the Pade table (the main diagonal and first superdiagonal). For many of the classical special functions (such as ez ), the [n/n] approximant in continued fraction form provides an accurate, computationally stable approximation that is considerably better than using the 2n-th degree Taylor section. Of course, continued fraction expansions of real numbers have played an important role in number theory and, in this respect, PAs provide their function theoretic analogues. (For further discussion of the continued fraction aspects of PAs, see [39], [59]).

The PAs for a function f have poles that can be used to predict the positions of the poles as well as other singularities of f. For example, the qd-algorithm (cf. [26, §7.6]) for computing the zeros of a given polynomial p is based on the fact that the poles in certain rows of the Pade table for f = 1/p tend to poles of f (zeros of p ). The basic row convergence theorem involved is the following.

Theorem 4.3. (de Montessus de Ball ore [4, p.139]). Let f be analytic in the disk D : |z| < R (0 < R <_ «) except for poles of total multiplicity v, none of which occurs at z=0. Then, as m->«>, the sequence of Pade approximants [m/v](z) converges to f(z) uniformly on e\/ery compact subset of

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 39

D\{ poles of f}. Furthermore, as m+«>, the poles of [m/v](z) tend, respectively, to the v poles of f in D.

For example, suppose that f is a meromorphic function in the plane whose poles are simple and occur at the points £k, where

o < U 2| < U2I < •••.

Then Theorem 4.3 asserts that the poles of [m/l](z) tend to ^; the two poles of [m/2](z) tend to ^, ^; etc.

The proof of Theorem 4.3 is based on the following simple observation (cf. [46]). Since

(Q f . p )(Z) = 0 ( z m + v + 1 ) , -XMmv mv' x ' v '

then for any Q € n , the product QP € n satisfies

(Qm Qf " Qpm )(z) = 0 ( z m + v + 1 ) , vxmv^ x mv' v ' K '

and so QP is the (m+v)-th Taylor section of Q Qf. Consequently, we can use the Hermite formula (1.15) to write

z m + v + 1(Q m Qf)(t) (4.13) (Qm Qf - QPm )(z) = J-,- / m + ™v dt, |z| < r,

provided Q Qf is analytic on |t| <_ r. If Q is chosen to be the monic polynomial whose zeros are the poles of f, then r can be taken arbitrarily close to R. On suitably normalizing the Pade denominators Q we find that the right-hand side of (4.13) tends to zero in D. In particular, at a zero £ of Q, we have (QmvQf)U) + 0 and so QmU) + 0 because (Qf)U) f 0. This means that every limit polynomial of the Q 's has zeros at the poles of f (the zeros of Q), which establishes the last assertion of Theorem 4.3. (This same argument can be applied to rational functions that interpolate in the "good points" discussed in §2; see [43].)

In proving convergence theorems for PAs, the essential question is: Where (asymptotically) are the poles of the PAs? In Theorem 4.3, the v poles of f serve as "attractors" for all the available poles of the [m/v] approx-imants. However, if f has fewer than v poles, then only a subset of the poles of [m/v](z) "know where to go," and the remaining poles may wander aim­lessly, destroying convergence. The following simple example illustrates this point.

Consider a sequence of nonzero coefficients a for which there is a large discrepancy between the root test and the ratio test:

40 E. B. SAFF

(4.14) lim |a m|1 / m = 0 and lim sup |a m + 1/a m| = «.

m->oo m->oo

As shown by Perron [39, §78], it is possible to construct such am's so that

the sequence ^ am/

am +i^o

hdS ^imi't P01'nts that are dense in the plane. But,

from (4.11), we see that ^^^^+1 " the zero of the Pade denominator Q ^ i ^ )

for f(z) = l^a.z (which is an entire function). Hence the 2nd row of the o K

Pade table for this f has poles everywhere dense in the plane.

Even more startling is the following result due to Wall in [60] con­

cerning the diagonal of the Pade table.

Theorem 4.4. There exists an entire function f such that the sequence of

diagonal PAs {[n/n](z)}°° for f is unbounded at ewery point in the plane

except z = 0.

In light of these anomalies, results on the convergence of PAs usually

pursue one of three directions:

(i) Proving uniform convergence for special classes of functions;

(ii) Replacing uniform convergence by a weaker condition, such as

convergence in measure or in capacity;

(iii) Extracting subsequences of PAs that do have the desired uniform

convergence properties.

An early step in the first direction was taken by Pade, who studied

the table for the exponential function. He showed that whenever m+n->oo5 the

approximants [m/n](z) for e z converge to e z uniformly on compact subsets

of the plane. Precise asymptotic results for the location of the zeros and

poles of these PAs were obtained by Saff and Varga [45]. The approximants

for the exponential have several important applications. For example, proving

the stability of certain numerical schemes for solving differential equations

boils down to showing that certain of these approximants are bounded by 1 in

the left-half plane.

A substantial extension of Pade's results for the exponential function

was obtained by Arms and Edrei [3 ] . They proved convergence of the approx­

imants for the class of functions generated by totally positive sequences (also

cal1ed Polya frequency series).

The PAs for the class of Stieltjes functions have particularly elegant

properties. Here we discuss Stieltjes functions that can be written in the form

(4.15) f(z) = fb M * o * • z t

where y is a finite positive measure on [0,b], with 0 < b < «>. Such a

function is analytic in the cut plane C*\(-°°, -1/b] and has the power series

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 41

expansion f(z) = 2-, (-1) c.z , where the c.'s are the moments

• b

0

(4.16) ck := j tkdy(t), k=0,l,...

As we now show, the Pade denominators Q . for f are related to the poly­

nomials that are orthogonal with respect to dy. Starting with the defining

property

(Vl,nf " V l , n » W = <>(z2n),

we replace z by -1/z and multiply by zn to obtain

• b

'0

where qn(z) := znQn_1>n(-l/z) € np and p ^ z ) := z^P^^-l/z) e V r

Then for j=0,l, — , we have

(4.17) qn(z) fb f ^ - zpn^(z) = 0(l/zn),

(4.18) qn(z) / Y^tMt) - z\^(z) = 0(zj/zn+1). '0

Next, we integrate with respect to z around a simple closed contour containing

[0,b] in its interior. Using the Cauchy formula, we find that

•b f qn(t)t

jdy(t) = 0, for j=0,l n-l; 0

that is,

(4.19) qn(z) = znQn.1>n(-l/z)

is the n-th degree orthogonal polynomial for dy. One consequence of this

relation is that the zeros of Qn-1 n(z) must be simple and lie on the cut

(-oo, -1/b). On writing the approximant [(n-l)/n] in the form

Pn , n(z) n A . (4.20) [(n-l)/n](z) = Q ^ ' V N = £ , +"

J7t ,

Vl,n l z j j=l L + ztnj

where the t .'s are zeros of qn(t), we deduce in a similar manner from

(4.17) that

b (4.21) f P(t)dy(t) = t A .P(t.)

J j=l nj nJ 0

for any polynomial P € n2n i. Hence, the constants A • are the Christoffel

42 E. B. SAFF

numbers for Gaussian quadrature (cf. [52]). Since Christoffel numbers are

positive, we see from (4.20) that the approximant [(n-l)/n](z) is itself a

Stieltjes function of the form (4.15) corresponding to a discrete measure dyn.

In particular, the zeros and poles of this approximant are interlaced along the

cut (-«>, -1/b). With all these facts in hand, a simple normal families

argument can be used to prove that [(n-l)/n](z)—•f(z) in C*\(-°°, -1/b].

This is a classical result due to Markoff [33], which has been further extended

by Stahl [50].

The example of Stieltjes functions shows that the Pade theory pro­

vides a natural setting for generalizing the classical theory of orthogonal

polynomials. In this regard, convergence results for PAs to functions of the

form (4.15) with y a complex measure, were obtained by Magnus [32], Nuttall

and Wherry [38], and Stahl [51].

Many commonly occurring functions have "smooth" Taylor coefficients 2

in the sense that aLC-iai<+i/

at< h a s a 1 imit as k->°°. Convergence properties of

the PAs for such functions were investigated by Lubinsky [29], [30].

Space limitations preclude a discussion of results concerning the

convergence of subsequences of the rows, columns, or diagonals of the Pade

table. We also leave it for the reader to delve into results (such as the

Nuttall-Pommerenke theorem [5, §6.5]) that deal with the convergence in capacity

of near diagonal PAs.

We do wish to emphasize that various generalizations of PAs exist

that are quite useful; e.g. multipoint Pade approximants (rational functions

found by interpolation in distinct points), Faber-Pade approximants (rational

functions whose Faber series matches the Faber expansion of f as far as pos­

sible), multivariate Pade approximants; etc. (A dictionary [31] of these

generalizations is available, upon request, from this author.)

5. RATIONAL VERSUS POLYNOMIAL APPROXIMATION (WHAT A DIFFERENCE A DIVISION

MAKES!)

We now discuss some essential differences between polynomial and

rational approximation in the complex variable setting. Some of the contrasts

are rooted in function theoretic properties, while others are more typical of

linear vs. nonlinear approximation theory (cf. the forthcoming book of Braess [9]).

VoMibJJUXy oi Conv&igmct. For rational approximation, Runge's classical

theorem asserts that if f is analytic on a compact set E, then f is the

uniform limit on E of a sequence of rational functions. Unlike its polynomial

version (Theorem 2.1), the hypothesis that (D\E be connected is not needed; it

is compensated for by choosing rational approximants that have poles in the

components of (D\E. For example, a function analytic on the annulus

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 43

E : r1 < |z| < r2 is the uniform limit of rational functions that have poles at z = 0 and z = °° (think of its Laurent series!).

To describe the more delicate problem of approximating functions in A(E), we let n(E) denote the uniform limits on E of polynomials, and R(E) denote the uniform limits on E of rational functions whose poles lie outside E. Then the theorem of Mergelyan (Theorem 3.4) states that A(E) = n(E) if and only if C\E is connected. In contrast, the compact sets E for which A(E) = R(E) cannot be characterized topologically; that is, this property is not invariant under a homeomorphism of the plane (cf. [20]). The most popular (and most tasteful) example of a compact set E for which A(E) * R(E) is the Swiss cheese of A. Roth (cf. [17]), which she manufactured by removing a countable number of disjoint open disks from the closed unit disk. For further discussion of the possibility of rational approximation see Gamelin [18].

ExAj>tmc& oi BeAt kpptioxAjnawtA. For an arbitrary compact set E, the existence of best polynomial approximants from nm is a simple compactness argument. However, for best rational approximants from n (n > 0), this argument must be modified to handle the possibility of poles tending to the boundary of E. Using normal families, Walsh [62, §12.2] proved that best rational approximants exist provided E contains no isolated points.

UyisLqumeAA o& BeJ>£ kppsioxAjnarvtA . If f e C[a,b] is real-valued, then Chebyshev showed that the best uniform approximation to f on [a,b] out of

(5-1) nm « := ^R e nm n : R nas real coefficients} v ' m,n m,n

is unique (cf. [34, §9.2]). Surprisingly, this is no longer true if approximation to a real-valued f is done from n ; that is,if we allow rational approximants with complex coefficients. Indeed, as was shown by Saff and Varga [44], the function f(x) = x has no unique best uniform approximation on [-1,1] out of n- , (any such best rational r,, has complex coefficients, so that ^ll(*rll) 1S a^so best). Further examples of this type, as well as non-uniqueness results for approximation on a disk can be found in [25], [42].

Given f € A(E) we can nonetheless construct a table of best uniform rational approximants to f on E by making a specific choice for each pair (m,n). This analogue of the Pade table is called the Walsh array. The convergence theory for this array closely parallels the theory for the Padg table (e.g. Walsh [64] proved an analogue of Theorem 4.3). Moreover, the Pade table can be viewed as a limiting version of Walsh arrays where best approximation is done on disks E :|z| < e with e -> 0 (cf. [55]>[63]).

VdQKZd o{ ConveAgmce, o{ BeAt kppK.oxAjna.wU. For f e A(E), we set

44 E. B. SAFF

E n(f) := inf{ ||f - p||E: p c nn>, e n(f) := inf{ ||f - R||£: R c n ^ } .

Clearly, en(f) <• E n(f) f ° r all n> and so the essential question is: Can

e n(f) tend to zero substantially faster than En(f)? (Let's rule out the

trivial situation where f itself is rational.) The now famous example of

Newman [36] answered this affirmatively for f(x) = |x| on E : [-1,1], where E

n ( f ) * !/n' while en ( f ) * e_7Tv/S" (cf* C12], [58]). Another example of the

contrast is readily accessible to the reader. Using a simple calculus argument,

one shows that for the partial sums s (x) := ]T x /k! of e , there holds

for the sup norm on [0,~), °

lim sup ||e"x - l/s n(x)||1 / n < 1/2.

n->oo [0,«0

Replacing x by (l+x)/(l-x) we see that for f(x) := exp[-(l+x)/(l-x)] and

E : [-1,1],

lim sup e n ( f )1 / n < 1/2.

On the other hand, Theorem 3.6 asserts that

lim sup E n ( f )1 / n = 1

n->oo

because f is not analytic at x = 1.

At present, there is no simple characterization of the functions f

for which e (f) « E (f). However, some special classes of functions have

been investigated in this direction. For example, Goncar [23] has obtained the

precise geometric rate of convergence for rational approximation on an interval to

Stieltjes functions of the form (4.15). Several important results for classes

of real functions were obtained by Popov [40], Freud [16], and others. See also

the survey articles of Ganelius [19] and Newman [37] on the subject.

Analytic Continuation. A simple but important observation concerning the

class n of rational functions,is that it is invariant under a bilinear trans­

formation. Thus, unlike polynomials, rational functions can provide analytic

continuations of functions to unbounded regions of the plane. The convergence

properties of the diagonal Pade approximants to Stieltjes functions illustrates

this point. Another example of the contrast is for Newman's example. It can be

shown that the sequence of polynomials {p*)p of best uniform approximation to

f(x) = |x| on [-1,1] diverges on eyery continuum in t\[-l,l]; moreover (analo­

gous to the Jentzsch theorem of §1), every point of [-1,1] is a limit point of zeros

of the p* (cf. [7]). On the other hand, the best (real) rational approximants

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN

Rn to |x| out of nn have all their zeros and poles on the imaginary

and satisfy (cf. [ 8 ] )

( z for Re z > 0 lim R*(z) = J n->«> ( -z for Re z < 0.

REFERENCES

1. V. M. Adamjan, D. Z. Arov, and M. G. Krein, "Analytic properties of Schmidt pairs for a Hankel operator and the generalized Schur-Takagi problem", Math. USSR Sbornik, 15 (1971), 31-73.

2. J. M. Anderson, "The Faber operator", In: Rational Approximation and Interpolation (P.R. Graves-Morris, E. B. Saff, and R. S. Varga, eds.), Lecture Notes in Math., Vol. 1105, Springer-Verlag, Berlin (1984), 1-10.

3. R. J. Arms and A. Edrei, "The Pade tables and continued fractions generated by totally positive sequences", In: Mathematical Essays Dedicated to A. J. Macintyre, Ohio University Press, Athens, Ohio (1970), 1-21.

4. G. A. Baker, Jr., Essentials of Padg Approximants, Academic Press, New York (1975).

5. G. A. Baker, Jr. and P. R. Graves-Morris, Pade Approximants Part I: Basic Theory, Encyl. of Math., Vol. 13, Cambridge Univ. Press, Cambridge (1981).

6. G. A. Baker, Jr. and P. R. Graves-Morris, Pade Approximants Part II: Extensions and Applications, Encyl. of Math., Vol. 14, Cambridge Univ. Press, Cambridge (1981).

7. H.-P. Blatt and E. B. Saff, "Behavior of zeros of polynomials of near best approximation", J. Approx. Theory, 46 (1986).

8. H.-P. Blatt, A. Iserles, and E. B. Saff, "Remarks on the behavior of zeros of best approximating polynomials and rational functions", (to appear).

9. D. Braess, Nonlinear Approximation, Springer-Verlag, Berlin, (to appear).

10. R. P. Brent, F. G. Gustavson, and D. Y. Yun, "Fast solution of Toeplitz systems of equations and computation of Pade approximants", Journal of Algorithms, 1 (1980), 259-295.

11. C. Brezinski, "The long history of continued fractions and Pade approximants", In: Pade Approximations and Applications (M. G. de Bruin, H. van Rossum, eds.), Lecture Notes in Math., Vol. 888, Springer-Verlag, Berlin (1981), 1-27.

12. A. P. Bulanov, "The asymptotics of the maximum deviation of |x| from rational functions", Mat. Sb. 76 (1968), 288-303.

13. J. H. Curtiss, "Faber polynomials and the Faber series", Amer. Math. Monthly, 78 (1971), 577-596.

46 E. B. SAFF

14. P. Dienes, The Taylor Series, Dover, New York (1957).

15. G. Faber, "Uber polynomische Entwicklungen", Math. Ann. 57 (1903), 398-408.

16. G. Freud, "Uber die Approximation reller Funktionen durch rationale gebrochene Funktionen", Acta Math. Acad. Sci. Hungar, 17 (1966), 313-324.

17. D. Gaier, Vorlesungen uber Approximation im Komplexen, Birkhauser Verlag, Basel (1980).

18. T. W. Gamelin, Uniform Algebras, Prentice-Hall, Englewood Cliffs, N.J. (1969).

19. T. Ganelius, W. K. Hayman and D. J. Newman, Lectures on Approximation and Value Distribution, Seminaire de Mathematiques Supeneures, Les Presses de 1'Universite de Montreal, Montreal, Canada (1982).

20. P. M. Gauthier, "On the possibility of rational approximation", In: Pade and Rational Approximation (E. B. Saff and R. S. Varga, eds.), Academic Press, New York (1977), 261-264.

21. K. 0. Geddes and J. C. Mason, "Polynomial approximation by projections on the unit circle", SIAM J. Numer. Anal., 12 (1975), 111-120.

22. G. M. Golusin, Geometric Theory of Functions of a_ Complex Variable, Amer. Math. Soc, Vol. 26, Providence, R.I. (T969).

23. A. A. Goncar, "On the speed of rational approximation of some analytic functions", Math. USSR Sbornik, 34 (1978), 131-145.

24. W. B. Gragg, "The Pad£ table and its relation to certain algorithms of numerical analysis", SIAM Rev., 14 (1972), 1-62.

25. M. H. Gutknecht and L. N. Trefethen, Nonuniqueness of best rational Chebyshev approximations on the unit disk," J. Approx. Theory, 39 (1983), 275-288.

26. P. Henrici, Applied and Computational Complex Analysis, Vol. I, John Wiley & Sons, New York (1974).

27. E. Hille, Analytic Function Theory (Introduction to Higher Mathematics, vol. II), Ginn and Co., Boston (1962).

28. N. S. Landkof, Foundations of Modern Potential Theory, Springer-Verlag, Berlin (1972).

29. D. S. Lubinsky, "Pade tables of entire functions of very slow and smooth growth", Constr. Approx. 1 (1985), 349-358.

30. D. S. Lubinsky, "Uniform convergence of rows of the Pade table for functions with smooth Maclaurin series coefficients", (to appear).

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 47

31. D. S. Lubinsky and E. B. Saff, "A dictionary of generalized Pade approximants", Institute for Constr. Math. Technical Report (1986), Univ. of South Fla.

32. A. P. Magnus, "Another theorem of convergence of complex weight Pade approximants", Pade Meeting, Luminy, France, 14-18 Oct. 1985.

33. A. Markoff, "Deux demonstrations de la convergence de certains fractions continues", Act. Math., 19 (1895), 93-104.

34. G. Meinardus, Approximation of Functions: Theory and Numerical Methods, Springer-Verlag, Berlin (1967).

35. S. N. Mergelyan, "On the representation of functions by series of polynomials on closed sets", (Russian) Dokl. Akad. Nauk SSSR, 78, 405-408; Translations Amer. Math. Soc. No. 85 (1953).

36. D. J. Newman, "Rational approximation to |x| ", Michigan Math. J., 11 (1964), 11-14.

37. D. J. Newman, Approximation with Rational Functions, Regional Conference Series in Math, Vol. 41, Amer. Math. Soc, Providence, R.I. (1979).

38. J. Nuttall and C. J. Wherry, "Gaussian integration for complex weight Pade functions", J. Inst. Math. Appl., 21 (1978), 165-170.

39. 0. Perron, Die Lehre von den Kettenbrcrchen, Chelsea Pub. Co., New York (1929).

40. V. A. Popov, "Uniform rational approximation of the class V and its applications", Acta Math. Acad. Sci. Hungar, 29 (1977), 119-129.

41. W. Rudin, Real and Complex Analysis, McGraw-Hill, New York (1974).

42. A. Ruttan, "On the cardinality of a set of best complex rational approximations to a real function", In: Pade and Rational Approximations (E. B. Saff and R. S. Varga, eds.), Academic Press, New York (1977), 303-319.

43. E. B. Saff, "An extension of Montessus de Ballore's theorem on the convergence of interpolating rational functions", Journ. Approx. Theory, 6 (1972), 63-67.

44. E. B. Saff and R. S. Varga, "Nonuniqueness of best complex rational approximation to real functions on real intervals", J. Approx. Theory, 23 (1978), 78-85.

45. E. B. Saff and R. S. Varga, "On the zeros and poles of Pade approximants to e z , III, Numer. Math., 30 (1978), 241-266.

46. E. B. Saff, "An introduction to the convergence theory of Pade approximants", In: Aspects of Contemporary Complex Analysis (D. A. Brannan, J. G. Clunie, eds.), Academic Press, New York (1980), 493-502.

48 E. B. SAFF

47. W. E. Sewell, Degree of Approximation by Polynomials in the Complex Domain, Ann. of Math. Studies No. 9, Princeton Univ. Press, Princeton, N.J. (1942).

48. H. S. Shapiro, Topics in Approximation Theory, Lecture Notes in Math., Vol. 187, Springer-Verlag, Berlin (1971).

49. V. I. Smirnov and N. A. Lebedev, Functions of a Complex Variable: Constructive Theory, 11 iffe Books Ltd., London (1968).

50. H. Stan!, Beitrage zum Problem der Konvergenz von Pade-approximierenden, Dissertation, Technischen Umversitat Berlin (1976).

51. H. Stahl, "Orthogonal polynomials with complex valued weight function", I & II, Constr. Approx. (to appear).

52. G. SzegS, Orthogonal Polynomials, 3rd ed., Amer. Math. Soc. Colloq. Pub., Vol. 23, Amer. Math. Soc, Providence, R.I. (1967).

53. L. N. Trefethen, "Near-circularity of the error curve in complex Chebyshev approximation", J. Approx. Theory, 31 (1981), 344-367.

54. L. N. Trefethen, "Rational Chebyshev approximation on the unit disk", Numer. Math., 37 (1981), 297-320.

55. L. N. Trefethen and M. H. Gutknecht, "On convergence and degeneracy in rational Pade and Chebyshev approximation," SIAM J. Math. Anal., 16 (1985), 198-210.

56. L. N. Trefethen and M. H. Gutknecht, "The Caratheodory-Fejer method for real rational approximation, SIAM J. Numer. Anal., 20 (1983), 420-436.

57. M. Tsuji, Potential Theory in Modern Function Theory, Dover, New York (1959).

58. N. S. Vjaceslavov, "On uniform approximation of |x| by rational functions", Soviet Math. Dokl.,16 (1975), 100-104.

59. H. S. Wall, Analytic Theory of Continued Fractions, Van Nostrand, Princeton, N.J. (1948).

60. H. Wall in, "On the convergence theory of Pade approximants", In: Linear Operators and Approximation, ISNM Vol. 20, Birkhauser, Basel (1972), 461-469.

61. J. L. Walsh, "Uber die Entwicklung einer analytischen Funktion nach Polynomen", Math. Ann., 96 (1926). 430-436.

62. J. L. Walsh, Interpolation and Approximation by Rational Functions in the Complex Domain. 3rd ed., Amer. Math. Soc. Colloq. Publ., Vol. 20, Amer. Math. Soc, Providence, R.I. (1960).

63. J. L. Walsh, "Pade approximants as limits of rational functions of best approximation", J. Math. Mech., 13 (1964), 305-312.

POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 49

64. J. L. Walsh, "The convergence of sequences of rational functions of best approximation with some free poles", In: Proc. Sympos. Approximation of Functions (General Motors Res. Lab., 1964), Elsevier, Amsterdam (1965), 1-16.

E. B. Saff Institute for Constructive Mathematics Department of Mathematics University of South Florida Tampa, Florida 33620

This page intentionally left blank

Proceedings of Symposia in Applied Mathematics Volume 36, 1986

N-WIDTHS AND OPTIMAL RECOVERY

A. PINKUS

ABSTRACT. These lecture notes are intended as a short introduction to the theory of n-widths, and to the theory of optimal recovery. Some simple examples are studied in detail in an attempt to explain and motivate the main ideas.

1. GENERAL INTRODUCTION. In this lecture I hope to whet the readers

interest in both the theory of n-widths and the theory of optimal recovery.

As such* these notes are intended as an introduction to the subject matter,

and not as an overview or survey.

The twinning of the two topics of n-widths and optimal recovery in one

lecture is somewhat artificial. Nonetheless a relationship does exist in the

type of problems considered. Both subjects differ from the more classical

problems of approximation theory in that they are concerned with determining

optimal subspaces, operators, algorithms, or whatever, with which to

approximate elements of an a priori given set. However because we are dealing

with two topics we have divided these notes into two distinct parts. In

Section A we discuss n-widths, and in Section B optimal recovery. In each of

the sections we present a simple example and try, using the example, to

motivate some of the main concepts and ideas of the theory. Readers interested

in more comprehensive surveys are urged to consult references [3], [4], [6],

[7] and [9].

A. N-WIDTHS

2. INTRODUCTION, Perhaps "the" classic problem of approximation theory is the

following. Given an element x of a normed linear space X, and an n-dimensional

subspace Xn of X, find a best approximation to x from X , and determine the

value of the error, i.e. the measure of the distance of x from a best

approximant. Thus, for example, if X = H is a Hilbert space over t with inner product (•,•)> and Xn is spanned by v-|,...,v which, for convenience, we

1980 Mathematics Subject Classification. 41A46, 41A65. © 1986 American Mathematical Society

0160-7634/86 $1.00 + $.25 per page

51

http://dx.doi.org/10.1090/psapm/036/864365

52 A. PINKUS

n assume to be an orthonormal basis for Xn, then \ (x,v.) v. is the unique

best approximant to x from Xn. The "error", generally denoted as E(x;Xn), may

be expressed by n'

2 _ V i#v lf M 2 nl/2 E(x;Xn) = [ ||x||< - I |(x,Vi)|^ ] n i = l n

^ery often one is not so much interested in approximating a given element

of X, but in approximating a subset A of X. By this we mean determining

E(A;XJ = sup inf ||x - y||. n xeA y€Xn

There are many reasons for considering such a quantity. One is often not

interested in a best approximant or in the specific error obtained, but in

measuring the error in terms of some other criteria such as, for example,

smoothness. We consider an example of such a problem.

We denote by W^'CO^ir] the (Sobolev) space of real-valued 2TT-periodic

functions on (R, for which f^" 1' is absolutely continuous, and whose rth_ der ivat iv e on [0,2TT ] ex ist s as a funct io n o f L 2 [ 0 , 2 T T ] . Se t X = L 2 [ 0 , 2 T T ] , an d

A (= ffOr> ) = { f : fG W (2

r)[0,2*], | | f ( r ) | | 2 < 1 } .

Let Tn denote the (2n + l)-dimensional subspace of trigonometric polynomials

of degree n, i.e.

T = span{ 1, sin x, cos x,..., sin nx, cos nx }.

It is not at all difficult to prove that E(A;Tn) = (n + l ) "r for all

non-negative integers n and r.

A more common, but totally equivalent way of stating this result is the

following. For every feW^f'[0,2TT],

E(f;Tn) < (n+ l ) "r ||f(r)||2,

and (n + l ) " r is the best constant in the above inequality. It is often to

obtain inequalities of this form, with best constants, that we study the

quantities E(A;Xn).

As above, we assume that X is a normed linear space and A a subset of X.

For each n-dimensional subspace Xp, we have the associated quantity E(A;Xn). In

1936, Kolmogorov [1] proposed the following idea. Instead of considering E(A;Xn)

for different but specific Xn, let us vary E(A;Xn) over all n-dimensional

subspaces X of X. We then search for n-dimensional subspaces (if they exist)

which best approximate A, and also for the associated minimum value of E(A;Xn).

To state this in more precise mathematical terms, we have

DEFINITION 1. X is a normed linear space and A a subset of X. The n-width

of Kolmogorov of A jjr X is given by

N-WIDTHS AND OPTIMAL RECOVERY 53

dn(A;X) = inf E(A;Xn) Xn

= inf sup inf ||x - y| |, Xn x€A y€Xn

where the left-most infimum is taken over all n-dimensional subspaces Xn of X.

We would like, if possible, to identify n-dimensional subspaces X of X

for which d (A;X) = E(A;Xn). Such subspaces are quite naturally said to be

optimal for d (A;X).

The quantity dn(A;X) measures the extent to which A may be approximated by

n-dimensional subspaces of X. It is not only that dn(A;X) is an interesting

theoretical quantity (and it is), but also that knowledge of it can help us in

other problems. For example, suppose that while we may or may not know d (A;X)

precisely, we do know something about its asymptotic behaviour as n + ».

d (A;X) is a lower bound on the extent to which A is approximable by n-dimension­

al subspaces. As such, if we have a given sequence {X } of n-dimensional

subspaces and estimates for E(A;X ), then it is possible to judge whether it is

worthwhile spending energy, time, and money, in using better but more

complicated subspaces in our approximation process.

In the above general framework, yery little of interest can be said. But

let us consider a specific example in detail. Before doing so we remark that

many other n-width concepts now abound in the literature. We will touch upon

some of these in the next few pages.

3. EXAMPLE. Set X = L°°[0,1]. The (Sobolev) space W ^ [ 0 , 1 ] is the set of

absolutely continuous real-valued functions defined on [0,1] for which f' exists

a.e. as an element of L°°[0,1]. We define

( A = ) B ( 1 ) = {f : f€W(1)[0,l], ||f'|| < 1 }, 00 0 0 U ' " J ' , , I I 0 0 '

( 1 ) oo

and we are interested in the quantity d ( B ^ ^ L ).

A natural approximating subspace to consider is X = -n ,, where TT , is the set of algebraic polynomials of degree n-1 (dimension n). The quantity

E(B^J;TT ,) has been much studied and although an exact formula is not known

to the best of my knowledge, there does exist from Jackson's Theorem ( see e.g.

[8,p.22] ) the upper bound E(B (^ •,7rn_1) <3/(n-l).

We now consider an even more elementary subspace. Let S denote the subspace

( of dimension n ) of left-continuous step functions with jumps at i/n, i=l,...,

n-1. Thus s€Sn if s(x) = c. on ((i-l)/n,i/n], i=l,...,n, for some choice of real

constants {c^}". ( Modify the first interval to include the point zero. ) We

claim PROPOSITION 1. E ( B ^ ; S ) = l/(2n), n = 1,2,... .

54 A. PINKUS

PROOF. For f € B ^ , let sf€Sn be uniquely defined by sf((2i-l)/2n) = f((2i-l)/2n), i = l,...,n. We claim that | |f - sf| 1^ <_ l/2n. For xe[0,l], we have xe((j-l)/n,j/n] for some j = l,...,n ( recall that x = 0 is in the first interval ). Thus f(x) - sf(x) = f(x) - f((2j-l)/2n). Since ||f I L ± 1 . it follows that

|f(x) - f((2j-l)/2n)| <Jx - ((2j-l)/2n)| <. l/2n.

Thus ||f - sJI^ <_ l/2n. To prove the desired equality, it remains to find an f * e B 0 ) for Which E(f*;S ) = l/2n. Set f*(x) = x. Then it is easily seen that E(f*;Sp) = l/2n, and s.^ is in fact the unique best approximation to f* from S . This proves the proposition. •

We claim that dn(B(^;L°°) = l/2n, for n = 1,2 ( d^B^O = »

since eyery constant function is in B^J. ) From Proposition 1 we have d (B^ ,L°°) <_ l/2n. It is therefore necessary to prove the lower bound, i.e. E(B^';Xn) >_ l/2n for every n-dimensional subspace Xn of L°°[0,1]. This problem is non-linear in nature and is generally the more difficult. In the proof of this result we use the following general theorem which we will not prove.

THEOREM 2 [2]. Let X be a normed linear space. Let Xn+1 and Xn be_ (n+1)-and n-dimensional subspaces of X, respectively. There then exists an xeXn+1\0 for which

E(x;Xn) = ||x||,

i.e. the zero element is a best approximation to x from X .

This theorem is often used in obtaining lower bounds for n-widths. Let us see how.

Let L , denote the (n+l)-dimensional subspace of continuous functions on [0,1] which are linear on [(i-l)/n,i/n], i = l,,..,n. The key to the lower bound is Theorem 2 and this next result.

PROPOSITION 3. rf f€Ln+1 andj |f| <. l/2n, then f £ B ^ .

PROOF. Obviously L .-. c W ^ . We must therefore prove that if f€Ln+-| and H f I L <J/2n, then ||f U ^ <. 1. Assume that feLn+1 and Ufjl^ <J/2n. Then |f(i/n) | <_ l/2n, i=0,l ,...,n. Since f is linear on [(i-l)/n,i/n], i = 1 ,...,n, then for xe((i-l)/n,i/n),

|f(x)| = n |f(i/n) - f((i-l)/n)| <J . D

PROPOSITION 4. If_ Xn is any n-dimenstonal subspace of L°°[0,1], then

E f B ^ ' . y >:V2n.

PROOF. Let Ln+1 be as above, and Xn be any n-dimensional subspace of L°°[0,1]. Then from Theorem 2 there exists a non-zero feL ,, which we normalize

N-WIDTHS AND OPTIMAL RECOVERY 55

so that ||f|| = l/2n, satisfying

E(f;V = I l^i loo = V 2 n . From Proposition 3, f€& J . Thus

E(B(1);Xn) > l/2n. o

To summarize, we have proved the following result.

THEOREM 5. Let. B^^ be as previously defined. Then dn(B^;L°°) = l/2n,

n = 1,2,... . Furthermore S . the n-dimensicmal subspace of step functions

with jumps at i/n , i = l,...,n-l, is an optimal subspace for d (B\/;L°°).

REMARK. Before proving Theorem 5 we noted that E(B J[> »7rn-l^ i 3 / ^ " 1 ) -While we do not know if algebraic polynomials of degree n-1 are optimal for

( 1 ) co

dn(Bvoo

;;L ), it does follow from the above inequality that they are at least

asymptotically optimal in the sense that both quantities decrease to zero at

the same rate.

4. OTHER N-WIDTHS. Theorem 5 is a special case of a more general result. We

defer the statement of the general result to the next section. We will now use

the above example and its proof to motivate additional problems and definitions.

4.A. LINEAR N-WIDTH. In the proof of Proposition 1 we obtained the upper bound

constant l/2n by the simple linear process of interpolating from S to each

feB^' at (2i-l)/2n, i = l,...,n. That is, we did not calculate the best

approximation to each feB^ 7 from S 2 n. We calculated instead a linear approx­

imation. This suffices because the quantity E(A;X ) is a "worst case" measure.

It is not necessary when calculating E(A;Xn) that we actually determine E(x;X )

for each xeA.

In a Hilbert space setting the best approximation is an orthogonal

projection and is therefore a linear approximation. This is no longer the case

in a non-HiIbert space setting, for there]the best approximation operator is a

nonlinear operator which is generally exceedingly difficult to exactly determine.

Linear approximations are easier to calculate, and are of interest in and of

themselves. Let us therefore consider linear approximations rather than best

approximations. In other words, we replace E(A;X ) by

E(A;P ;X ) = sup ||x - P x|| n n XGA n

where Pn is a continuous linear operator from X to X . ( P is said to be of

rank n if its range space is of dimension n. )

Analogous to the Kolmogorov n-width we now define what is termed the

linear n-width of A in X.

56 A. PINKUS

DEFINITION 2. X is a normed linear space and A a subset of X. The linear

n-width of A jj^ X is defined by

6n(A;X) = inf sup ||x - P x| | , n Pn xGA

n

where the infimum is taken over all continuous linear operators of rank at

most n. •k -k "k

If 6 (A;X) = E(A;P :X ) where Pn is a continuous linear operator of rank * I ju M M II

<_ n, then P is said to be optimal for 6p. From our definitions, it follows that d ±&n- If X is a Hilbert space,

the n d „ = <s . In genera l 6„ i s an easie r quant i t y t o determine . Whe n d „ an d <S„ n n ^ n J n n are unequal, then serious problems generally arise in the computation of the

former. Both d and 6 depend on "worst case" situations. As such they may be

equal even in a non-Hilbert space setting. This is true of the example of the

previous section.

THEOREM 6. Let B ^ and Sn be as previously defined. Then <5 n(B^ ;L°°) oo n — ^ — ~ E -*—-*- • — — n v oo ' '

= l / 2 n , n = 1 , 2 , . . . . Furthermor e P , th e ran k n l inea r operator * define d by

in terpo la t in g from S n t o eac h feB^ J at^ ( 2 i - l ) / 2 n , i = 1 , . . . , n , i s optima l

fo r ^ ( B ^ L 0 0 ) .

4.B. BERNSTEIN N-WIDTH. Let us examine, in our example, the proof of the lower

bound using Theorem 2. The following idea was used. Assume X , is an (n+1)-

dimensional subspace of X, and let S(X ,) denote the unit ball of X ,. If

XS(X -.) c A, then from Theorem 2 ( see Proposition 4 ) , d (A;X) >_x. This

technique for obtaining lower bounds for d has been so often used that it has

been codified.

DEFINITION 3. Let X be a normed linear space and A a closed, convex,

centrally symmetric ( xeA implies -xeA ) subset of X. The Bernstein n-width

of A j_n X is defined by

bn(A;X) = sup sup { \ : AS(XM+-,) E A >> Xn+1

where the X +, range over all subspaces of X of dimension n + 1.

Thus d (A;X) > bn(A;X). The quantity bn is often of interest, other than f 1 ) oo

simply as a lower bound for d . In our example we proved that b ( B ^ ^ L ) = l/2n.

Restating this we may write sup l l f l U l l f l L i 2n , n = 1,2

f € X n + l

N-WIDTHS AND OPTIMAL RECOVERY 57

for any (n+l)-dimensional subspace X , of W ^ . From Proposition 3, equality

holds with X n + ] = L n + 1.

This constant 2n is, in some sense, a smallest "smoothness constant"

relating the size of f' with that of f. This fact should be considered in the

light of Markov's inequality. The well-known Markov's inequality for algebraic

polynomials of degree n states that if pGirn, then on [0,1]

l iP ' IL£2n 2 | |p |L ,

an d equal i t y i s attaine d ( by th e Chebyshe v polynomia l o f th e f i r s t kin d ) .

Equivalentl y we may wr i t e

sup | |p« |L / | |p l L = 2n 2. P €*n

Algebraic polynomials are thus far from being optimal in the above sense.

4.C. GEL'FAND N-WIDTH. There is one other n-width concept which we wish to

introduce and it is the Gel'fand n-width_ of A j£ X. It is related to the

Kolmogorov n-width via a duality relationship which we will not discuss. It is

considered again in Section 6 and will reappear when dealing with optimal

recovery problems. The Gel'fand n-width is defined as follows.

DEFINITION 4. X is a normed linear space and A is a closed, convex,

centrally symmetric subset of X. The 15e1'fand n-width of A in X is given by

dn(A;X) = inf sup ||x||, Ln xGAnLn

where Ln varies over all subspaces of X of codimension n.

A subspace Ln is said to be of codimension n if there exist n linearly •k

independent linear functionals x*GX ( the continuous dual of X ) i = l,...,n,

such that Ln = { x : x*(x) = 0, i = l,...,n }.

There is no a priori relationship between dn(A;X) and dn(A;X). Either may

be larger. Perhaps surprisingly they are often equal. There is a simple reason

for this and it is that the inequalities <$n(A;X) _> dn(A;X) >_ bn(A;X) always

hold. The inequality 6n(A;X) >_ dn(A;X) follows from the fact that every rank

n continuous linear operator P may be written in the form

Pnx = x}(x)Xi

where x*GX , i = l,...,n, and the { x ^ 1 span the range of P . Thus

58 A. PINKUS

sup ||x - P x|| >_ sup ||x|| xGA n x€A

x*(x)=Of i=l,...,n

from which follows the desired inequality. The inequality dn(A;X) >_ bp(A;X) is

even simpler to prove. For every (n+l)-dimensional subspace X , of X, and for

every subspace Ln of codimension n of X, X n + 1nLn 7* (0) Thus if xS(Xn+1) 5 A,

then there exists an x€AnLn for which ||x|| = A. Hence d^AjX) >_A, and the

inequality follows. Thus in our previously considered example we necessarily

have dn(B(i);L00) = l/2n.

5. N-WIDTHS OF SOBOLEV SPACES. As stated earlier, the results of the previous

two sections may be considered as particular cases of a more general theorem.

We present this generalization and the remarks thereafter in an attempt to give

the reader a taste for the type of research done in this subject.

The Sobolev space W^'[0,1] for p€[l,°°], r a positive integer, is defined

as follows:

W(p}[0,l] = {f : f ( r _ 1 )abs. cont., f(r)€Lp[0,l] }.

Let B ^ = {f : f e W ^ L O , ! ] , || f ( r ) | | <_ 1 }, and for a nonnegative integer m, r r r

set f 0, x < 0

+ lxm, x > o .

THEOREM 7 [7]. Fix pG[l,°°] and r a positive integer. Then for n >_ r,

dn(B^);LP) = dn(B^;LP) = «n(B^);LP) = ^ J MP)

Furthermore,

1) X* = span { 1, x,..., x r ~ \ (x-^)^" 1,..., (x-Cn_r)+"1 > is an optimal

subspace for dn(B ;LP) for some choice of 0 < £-, < ... < e < 1.

2) Ln = { f : f(n.j) = 0 , i = l,...,n } is an optimal subspace for d n ( B ^ ; L p )

for some choice of 0 < n-i < ... < n < 1.

3) The rank n linear operator P defined by interpolation from X t£ feBv ' at_

the { n i}" is optimal for 6 n(B^);L p).

It is also possible to identify an optimal subspace for b (B^[/;LP).

REMARK. The proof of Theorem 7 is far beyond the scope of this lecture.

However to give an inkling as to how X„ arises, recall Taylor's formula with (r) remainder in integral form. Bv ; is the set of functions

f(x) = ^l a. x1 + (l/(r-l)l) jj (x-y)^ 1 h(y) dy, i=0

N-WIDTHS AND OPTIMAL RECOVERY 59

P(r) and a. f(i)(0)/i!, i = Q,l,...,r-1 ). The subspace where ||h|| < 1 ( h = fv

XM is the span of the first r monomial terms ( which must appear since there is NT-1 n

no restriction on their coefficients ) and the kernel (x-y)+~ evaluated at n-r

distinct points.

REMARK. Theorem 7 is also valid in certain mixed norm cases. Consider the

n-widths of Er£' in Lq, where p and q are arbitrary numbers in [!,«]. If p = »

or q = 1, then Theorem 7 holds ( except that b (B^';Lq) is unknown ). It is

conjectured that Theorem 7 is valid ( except for b ) for all p q . For p < q,

the situation is considerably more involved. No exact results are known and it

was only some years ago that the asymptotic behaviour of each of the n-widths

was determined. They do not all behave asymptotically in the same manner.

6. N-WIDTHS AS GENERALIZATIONS OF S-NUMBERS.

operator mapping X into itself. Set

A = { Tx :

Let T be a compact linear

< 1 }

The choice of A as the image of the unit ball under a linear map is a common (r) choice in the theory of n-widths. Er ' is of this form, aside from the free

r-dimensional polynomial subspace.

Assume for the moment that X = H is a Hilbert space with inner product -k ie ic

(•,•). Associated with T is its adjoint T . The compact maps T T and TT are

self-adjoint and non-negative. They possess the same eigenvalues {X (T)},

n = 0,1,..., ( given in non-increasing order of magnitude ) which are all 1 /2 non-negative numbers. The values s (T) = [x (T)] ' are called the s-numbers,

or singular values, of T. Functional analysts who study n-widths generally

regard them as generalizations of s-numbers, see e.g. Pietsch [5]. Let us

explain why.

There exist many well-known characterizations for the s-numbers of T. Thus

the "max-min" characterization is given by

sn(T) = sup inf xex LXn+l n+1

sup inf Xn+1 x€Xn+l

(T*Tx , x ) l (x , x)

1 Tx l |x |

1/2

where X n + 1 varies over all subspaces of H of dimension n+1. But this last

quantity is simply a restatement of the definition of the Bernstein n-width

bn(A;H) for this choice of A. Thus bn(A;H) = s (T). In a totally analogous

manner, it may be seen that the classical "min-max" characterization of s (T)

given by

60 A. PINKUS

sn(T>

I V 2 (T*Tx , x)

where the Ln vary over all subspaces of H of codimension n, is essentially

the Gel'fand n-width dn(A;H). In this same vein 6n(A;H) = sp(T), since the

definition of 6 (A;H) corresponds to the classical singular value decomposition

of T ( and dn(A;H) = 6n(A;H) since we are in a Hilbert space ). Thus dn(A;H) =

dn(A;H) = 6n(A;H) = bn(A;H) = sn(T).

B. OPTIMAL RECOVERY

7. INTRODUCTION. Let X be a normed linear space and A a subset of X. In a

yery general sense optimal recovery is concerned with the problem of estimating,

in as efficient a manner as possible, some specific information about elements

of A based on a number of given pieces of information.

Many problems fall into this wide setting. Before presenting a general

framework, let us consider some specific examples.

8. EXAMPLES.

8.A. RECOVERY OF A FUNCTIONAL. Let X = L°°[0,1] and A = B ^ ( see Section 3 ).

Assume that for each f€B J we are given the values f(x.)> i = l,...,n, for some fixed 0 <_ x-. < ... < x < 1. For convenience we set 1(f) = (f(x-,),... ,f(x ))

GRn, and call 1(f) the information vector. Let ye[0,l], fixed. The problem we

consider is that of optimally reconstructing f(y), for feB^ , based only on

the data 1(f).

Any function T which maps 1(f) to R is called an algorithm. The error of

the algorithm T is defined by

E(T) = sup{|f(y) - T(I(f))| : f€B (^ } .

The value

E* = inf { E(T) : T } ~k it ie

is the intrinsic error in our problem. If E = E(T ) for some algorithm T , then

we say that T is an optimal algorithm or provides for an optimal recovery of

f(y). The problem is to find E and an optimal algorithm T .

An important tool in the solution of this problem is the following simple

lower bound for E .

PROPOSITIONS. E* >_ sup{ |f(y)| : f e B ^ 9 K f ) = <L >.

PROOF. Let f e B ^ wit 00

that for eyery algorithm T,

PROOF. Let f e B ^ with 1(f) = 0. Since - f e B ^ and I(-f) = 0, it follows

N-WIDTHS AND OPTIMAL RECOVERY 61

E(T ) > m a x { | f ( y ) - T ( 0 ) | , | - f ( y ) - T ( 0 ) | }

1 ( | f ( y ) - T(0) | + | f ( y ) + T ( 0 ) | )/2

L I f (y) l -The clai m now fo l lows . •

We w i l l prov e tha t equal i t y holds , calculat e E , an d i den t i f y an optima l

algori thm .

PROPOSITIO N 9 . E = min { |y - x i | : i = 1 , . . . , n } . Furthermore , i f

\y - x . | = min { \y - x i | : i = l , . . . , n } , the n th e algorith m T define d by

T ( 1 ( f ) ) = f ( x . ) i s opt imal . , (1 )

PROOF . Th e funct io n f * ( x) = min { |x - x i | : i = 1 , . . . , n } i s i n B voo

/ an d

sa t i s f i e s f * ( x . ) = 0 , i = 1 , . . . , n . Thu s from Propositio n 8 , E >_ f * ( y) =

min { |y - x . | : i = l , . . . , n } . Fo r T as above ,

E* <_. E(T y) = sup { | f ( y ) - f ( x j ) | : f e B ^ }

= |y - x j |

1 E • D

8.B. RECOVERY OF A FUNCTION. Within this same framework, we change our

example somewhat. Assume that based on the same information vector 1(f), we

are now interested in recovering not f(y), but the full function f on [0,1].

Our algorithms T are therefore functions from Rn to L°°[0,1]. As previously

E(T) = sup { ||f -T(I(f))|| w : f G B ^ } ,

and E* = inf { E(T) : T }.

Totally analogous to Proposition 8, we have

PROPOSITION 10. E* >_ sup{ ||f||ro : f € B ^ , 1(f) = 0. }.

Thus, in particular, E >_ 11"F*I I^ where f* is as defined in the proof of

Proposition 9. We will prove equality. To this end set z^ = ( x^ + x.j+-|)/2,

i = l,...,n-l, ( z = 0, z = 1 ). Let S denote the space of step functions

with jumps at {z.j}-j" • For each feB^ , define s^eS by sf(xi) = f(x.), i = 1,.

..,n.

PROPOSITION 11. E = ||f*|| . Furthermore, if T is the algorithm given

by T (1(f)) = Sf, then T is an optimal algorithm.

PROOF. For T as above,

E* < E(T* ) = s u p { | | f - s f | | o o : f € B ( i } } .

Fo r x G ( z i _ 1 , z i ] , | f ( x ) - s f ( x ) | = | f ( x ) - f ( x i ) | 1 |x - x . | = f * ( x ) . Thu s

62 A. PINKUS

E* <: E(T*) < ||f*IL < E*. a

8.C. OPTIMAL INFORMATION. We again alter our problem. We assume that we wish,

as in (8.B),to recover flEB^' based on 1(f). However, now we may choose, a

priori, the n information functionals which constitute 1(f). For ease of

discussion, let us assume that we may choose n points x-j,...,x in [0,1],

which appear in 1(f), at which to sample f.

We exhibit this dependence on I by letting E(T,I) and E (I) denote the •k

E(T) and E of(8.B). We are therefore concerned with the problem of evaluating

£ = inf E*(I), I

where I ranges over all information vectors of the form 1(f) = (f(x-j),... ,f(xn))

with 0 <_ X-, < ... < x < 1. Set x_ = (x-,,... ,x ) , and let f*(x,x_) denote the

f*(x) of(8.B). It is now a simple matter to prove

PROPOSITION 12. £ = inf { | |f*(-,x)| |w: 2L> = V2n. Furthermore, an

optimal x is given by x* = (l/2n,3/2n,...,(2n-l)/2n).

Before continuing note that the value obtained is exactly the value of

n-widths of B ( 1 ) in L°°. This 00

connection in the next section.

the n-widths of B^ ' in Lw. This is not a coincidence. We will discuss the

8.D. OPTIMAL RECOVERY WITH ERROR. We now return to the problem, considered

in (8A), of recovering f (y), for f B ^ , based on 1(f) = (f(x^,... ,f(x )) for

fixed 0 <_ x, < ... < xn <_ 1. However, let us assume that we do not know 1(f) exactly. Errors may occur in our calculation and rather than being given 1(f),

we are given w = (w-j,... ,w ) where |w - f(x-) | <_ e., i = 1,.. .,n. The error bounds e. >_ 0 are given fixed values. We therefore define, for an algorithm T

mapping Rn to R,

E(T;e) = sup { | f (y) - T(w)|: f € B ^ , |w. - f(x.)| £ - ^ , 1 = l,...,n },

and E*(e) = inf { E(T;£) : T }.

This next result is totally analogous to Proposition 8.

PROPOSITION 13. E*U) >_ sup { |f(y)|: f€B (^, \f{^)\ ± e., i = l,...,n }.

Set f*(x;e) = min { e. + |x - x, I : i = 1,... ,n }. Then f*(se)GB^^ and **" 1 1 ' — oo

0 <_ f*(x 9e) £ e i 5 i = l , . . . , n . Le t e . + \y - x . | = min { e i + |y - x^ | : i = l , . . , r PROPOSITION 14. E*U) = f*(y;e) , and the algorithm defined by T (w) = w.

is optimal.

PROOF. From Proposition 13, E*(e) >_ f*(y;e). Let f€B^^ with

lwi - f(x-j)l l e-j 5 "» = l5...*n. Then

N-WIDTHS AND OPTIMAL RECOVERY 63

|f(y) - Ty(w)| = |f(y) - Wj|

llf(y) - f(Xj)| + |f(xj) - wjl

< |y -x.| + e j

= f*(y;£).

Thus E*(e) £ E(Ty;£) <_ f*(y;£). D

Analogues, with error, of examples (8B) and (8C) are similarly constructed.

9. GENERAL THEORY. The above simple examples are prototypes of some of the

problems considered in the theory of optimal recovery. In our general discussion

we will somewhat restrict ourselves. Thus, for example, all our operators will

be linear, and we will not touch upon problems of recovery with error as

exemplified by example (8D).

Let X, Y and Z be normed linear spaces. A is a subset of X which we assume

to be closed, convex, and centrally symmetric. By U we denote a linear

operator from X to Z. U(x), for x€A, will be the element which we wish to

recover, and U is therefore termed the object operator. I is a linear operator

from X to Y, called the information operator. Any function T from 1(A) to Z is

said to be an algorithm. Each algorithm gives rise to a recovery scheme with

error

E(T) = sup { ||U(x) - T(I(x))||: x€A }.

The value E = inf { E(T) : T } where T ranges over all possible algorithms, is

called the intrinsic error of the process. If E = E(T ) for a specific

algorithm T , then T is called an optimal algorithm and we have found an

optimal recovery for U on A.

There is no reason to suppose that optimal algorithms exist, or if they

do, are linear. To obtain such results, additional assumptions are needed.

However, certain very general properties do hold. As an analogue of Proposition

8 we have the following.

PROPOSITION 15. E* >_ sup { ||U(x)||: x€A, I(x) = 0 }.

PROOF. Let xeA and I(x) = 0. By assumption -x€A and I(-x) = 0. Thus for

every algorithm T,

||U(x) - T(0)||, ||U(-x) - T(0)|| < E(T).

Since U is linear, it follows that ||U(x)|| ^ E ( T ) , proving the proposition. •

For ease of exposition, set

e* = sup { ||U(x)||: xeA, I(x) = 0 }.

64 A. PINKUS

In all our examples we had E = e*. However, this is not generally valid as the

following simple example shows.

EXAMPLE. Let X be R3 endowed with the Euclidean norm ||xj|2 = ( x1 + x|

+ X3 ) 1 / 2 . Set Z = X, Y = R, U(x) = x.,

A = { x : Hx.1^ = Ix^ + |x2| + |x3| < 1 )

and l{x) = X-, + x2 + x^. A simple calculation shows that e* = l/\/2. Now

E* = inf sup{ ||x - T(I(x))|L: ||x|U <. 1 } T L '

>. inf max{ H e 1 - T(1)|L: i = 1,2,3 }, T

o

where e1 is the ith unit vector, i = 1,2,3. T(l) is a vector in R . No vector

in R3 is of distance less that \I7JS from each of the e1, i = 1,2,3. Thus

E > VZ73. ( Equality in fact holds for the algorithm T(a) = (a/3,a/3,a/3).)

An opposite inequality to E >_ e* is the following.

THEOREM 16 [3,p.3]. E* <_ 2e*. PROOF. For each yGl(A), choose an x'(y)GA satisfying I(x'(y)) = y. We

define an algorithm T* by

T«(I(x)) = U(x'(I(x))).

Then

E(T') = sup { ||Ux - T'(I(x))||: xeA }

= sup { ||U(x - x'(I(x)))||: xGA }.

Set w = x - x'(I(x)). Then I(w) = I(x) - I(x) = 0, and since A is convex and

centrally symmetric, w/2eA. Thus

E* 1 E(T') 1 2 S UP< I |Uw| I: w€A, I(w) = 0 } = 2e*. n

REMARK. The T' constructed above is, in general, neither continuous nor

linear. If we demand a linear, continuous algorithm, them no inequality of the

above form is valid.

One set of assumptions which implies the equality E = e* is the following.

( Note that our examples do not quite satisfy these assumptions.)

THEOREM 17 [3,p.5]. Assume that there exists a function S from 1(A) to^X • ' . . . . . ^

for which x - S(I(x))eA, anc[ I(x - S(I(x))) = 0 for eyery xGA. Then E = e* anc[

T = US is an optimal algorithm.

The proof of this theorem is an immediate consequence of the definitions

of T and e*.

A totally different set of restrictions gives us similar results.

N-WIDTHS AND OPTIMAL RECOVERY 65

Assume that X is a normed linear space over the reals and Z = R, i.e. U

is a linear functional. In addition to the previous assumptions, we also

suppose that 1(A) is absorbing in Y, i.e. for any y€Y there exists X > 0 •k

such that Xy€l(A). Let Y denote the continuous dual of Y. Then

THEOREM 18 [3,p.16]. Under the above assumptions,

e* = E* = inf* sup {|U(x) - T(I(x))|: x€A }. TeY

Furthermore, if 1(A) is a neighborhood of the origin in Y, and if there exists

a_ TeY for which

sup { |U(x) - T(I(x))|: x€A } < »,

then there exists an optimal algorithm in Y , i.e. one which is linear and

continuous.

We close these notes by examining a connection between certain n-widths

and problems of optimal recovery with optimal information. For convenience, we

now assume that X = Z, Y = Rn, U(x) = x, and I(x) = (x*(x),... ,x*(x)), where

x*eX ( the continuous dual of X ), i = l,...,n. Thus for each algorithm T,

E(T) > E * > e * = sup{ ||x||: xeA, I(x) = 0_ 1.

Set L = { x: I(x) = 0_ }. Then Ln is a subspace of codimension at most n, and

we may write

e* = sup{ ||x||: x€AflLn }.

From the definition of the Gel'fand n-width dn(A;X), it follows that e* >_

d (A;X) for any choice of n continuous linear information functionals. In

particular

inf E* > dn(A;X). Ln

Let us also recall the definition of the linear n-width 5 (A;X). n

<$n(A;X) = inf sup ||x - P x| | n Pn xeA

n

n where Pn is an operator of the form Pnx = J x*(x) x^. Taking the infimum over Pp is equivalent to taking the infimum o v e f u * } " and {x.}?. If we fix the

{x^}" and take the infimum over the {x*}" then we are searching for a best

continuous linear approximation to A from span { x-,,... ,x }. On reversing the

process by fixing the { x*>y and taking the infimum over the (x.}?, we are

searching for the optimal continuous linear algorithm based on the information {x^}". As such,

6n(A;X) >_ inf {E*: L n}.

66 A. PINKUS

Thus the two n-widths 6 and d are upper and lower bounds, respectively, in

the problem of optimal recovery of A with n optimal linear continuous pieces

of information. If 6n(A;X) = dn(A;X), and Pn is optimal for 5n(A;X), then Pn

gives rise to a continuous linear optimal algorithm for this problem ( see

e.g. Theorem 7 ).

BIBLIOGRAPHY

1. Kolmogoroff, A., "liber die beste Annaherung von Funktionen einer gegebenen Funktionenklasse", Annals of Math., 37 (1936), 107-110.

2. Krein, M.G., Krasnosel'ski, M.A., Milman, D.P., "On deficiency numbers of linear operators in Banach spaces and on some geometric problems", Sb. Trudov Inst. Mat. Akad. Nauk SSSR, 11 (1948), 97-112.

3. Micchelli, C.A., Rivlin, T.J., "A survey of optimal recovery" in Optimal Estimation in Approximation Theory, eds. C.A. Micchelli, T.J. Rivlin, Plenum Press, New York, 1977, 1-54.

4. Micchelli, C.A., Rivlin, T.J., "Lectures on optimal recovery", preprint.

5. Pietsch, A., Nuclear Locally Convex Spaces, Springer-Verlag, Berlin, 1972.

6. Pinkus, A., n-Widths in Approximation Theory, Springer-Verlag, Berlin, 1985.

7. Pinkus, A., "n-Widths of Sobolev spaces in L^", Constr. Approx., 1 (1985), 15-62.

8. Rivlin, T.J., An Introduction to the Approximation of Functions, Blaisdell, Waltham, Mass., 1969.

9. Traub, J.F., Wozniakowski, H., A General Theory of Optimal Algorithms, Academic Press, New York, 1980.

DEPARTMENT OF MATHEMATICS TECHNION HAIFA, ISRAEL

Proceedings of Symposia in Applied Mathematics Volume 36, 1986

Algorithms For Approximation

E. W. CHENEY

ABSTRACT. The solution of almost any concrete problem of approximation will require an algorithm ("recipe") for producing a solution. The advent of high­speed computing made it possible to calculate best approximations by means of iterative methods. Several procedures for this type of problem are outlined here. They illustrate some of the techniques used in the construction of algorithms and some of the criteria by which algorithms are judged.

1. Best Approximation from a Finite-Dimensional Subspace. A general problem in approximation can be stated thus: a normed linear space X and a finite-dimensional subspace Y in X are prescribed. For x £ X we define the distance from x to Y by means of the equation

6vit{x,Y)=M{\\x-y\\:yeY} .

We then seek to determine a "best approximation" of x in Y. That term describes any element y in Y such that

| | s - y | | = dist(x,F) .

A straightforward attack on this problem involves first the selection of a basis for y , say {&i,. . . , bn}. Then one attempts to locate a minimizing point for the functional A : ET -> R defined by

n

A(X1,...,Xn) = \\x-^2xibi\\

The functional A has some endearing properties: it is continuous, nonnegative, and convex. On the other hand, it may be nondifferentictble, and it may be expensive to compute. The task of determining one or more minimizing points

1980 Mathematics Subject Classification. 41A45, 41A20, 41A50, 65D15

Key words and phrases. Algorithms, best approximation.

© 1986 American Mathematical Society 0160-7634/86 $1.00 + $.25 per page

67

http://dx.doi.org/10.1090/psapm/036/864366

68 E.W. CHENEY

for A can be turned over to a general-purpose computer program designed for

minimizing "arbi trary" real-valued functions of n real variables. Alternatively,

one can employ a program tha t takes advantage of the convexity of A. Further

degrees of specialization are possible in the codes used. Some procedures exploit

the fact tha t A arises from a norm. Finally, we can use algorithms which are

tai lor-made for the part icular norm and the part icular subspace in the problem

at hand.

A general-purpose algori thm which applies to any normed linear space X and

to any finite-dimensional subspace Y will now be described. Its roots lie in the

work of E. Ya. Remes, and it is sometimes referred to as the First Algorithm of

Remes. See [Remes, 1934].

We begin by selecting a "norm-determining" subset 3> in the conjugate space

X*. This means tha t for each z G l ,

||*|| = m a x { M x ) | : * € * } .

For example, $ can be the entire unit ball in X*, or the surface of the unit ball,

or the set of extreme points of the unit ball, or a set of "half of the extreme

points" . (Clearly, if <f> G $ , it is not necessary tha t —<f> G $.)

The problem to be solved is tha t of determining, for a given x G X , one or

more points y EY for which

| | x - y | | = d i s t ( x , y ) .

The algori thm breaks this problem into a sequence of simpler problems. The

procedure is iterative, and produces a sequence t/i, t/2> • • • in Y with the property

lim \\x — yk\\ = dist(x, Y) . k—*oo

At the fc-th step of this algorithm, a finite subset $& C <£ is given. This finite

set induces a semi-norm in X via the equation

||u||fc = max \<f>(u)\ (u G X) .

One then computes an element yk G Y to minimize the expression ||x — y\\k.

This is a more elementary problem than the original one because $k is finite,

and some s tandard techniques drawn from the subject of linear programming are

applicable. Having found y^, we choose an element <f>k in $ so tha t

\4>k{x-yk)\ = \\x~yk\\ .

The A;-th step terminates with the adjoining of <j>k to $k to form $k+i'

The minimization problem tha t must be solved in step k is not as formidable

as it may seem, since the vector yk-i from the preceding step is a good s tar t ing

ALGORITHMS FOR APPROXIMATION 69

point in searching for y^. If $£ = {^ i , . . . , <^m} and if { 6 1 , . . . , bn} is a basis for

F , then yk will have the form ]Cy=i ^ J ' ^ P a n ^ t n e coefficients A 1 } . . . , An must

be chosen to minimize the expression

n

max i X ^ i ' M M - ' M z ) ! • Kt<m *—*

~ ~ 3 = 1

This is a s tandard problem in mat r ix analysis, and good software is available to

solve it. See [Bartels and Golub, 1968], [Barrodale and Phillips, 1975], [Cline,

1976], and [Bartels, Conn and Charalambous, 1978]. The theoretical basis for

these mat r ix algorithms is discussed in Chapter II of [Cheney, 1966].

The initial set $ 1 in this algorithm should be chosen so tha t || ||i is a genuine

norm on Y. This can always be achieved with a set of n elements if Y has

dimension n. Having made this assumption, we can now prove tha t

lim || x — yk || = dist(x, Y) = d . k—*oo

One s tar ts with an obvious inequality, valid for 1 < k < i and y € Y:

\\x - y||i < \\x - y\\k < \\x - y||» < ||z - y|| .

From this it follows immediately tha t

| | z - yfc||i < \\x- yk\\k < \\x- yi\\i < d .

Obviously the sequence [yk] is bounded in the norm || | | i , and hence also in the

norm || ||, since Y is finite-dimensional. The sequence possesses cluster points ,

and we let y* be any one of them. If € > 0, select k so tha t ||yfc — y*|| < e; then

select i > k so tha t ||yt- — y*|| < e. It now follows tha t

< * < | | * - y * | | < | | x - y f c | | + €

= \4>k{x) -<f>k{yk)\ + e

= \\x- yk\\% + e

<\\*-1H\\i + \\Ui-V*\\i + \\y*-yk\\i + e

<d+3e

Since e was arbitrary, \\x — y*|| = d. This proves tha t each cluster point of the

sequence [y*.] is a best approximation of x in Y.

In order to prove tha t ||x — y^|| —> <i, s tar t with the observation tha t the

sequence pk = \\x — y^W is bounded. Let p* be any cluster point of [pk]> and let

Pki —+ P*'• Let y* be a cluster point of [y/b,.]. Then p* = \\x — y*\\ = d by the first

half of our proof. Thus the bounded sequence \pk\ has only one cluster point,

and must therefore converge to it.

It is clear tha t in si tuations where x has a unique best approximation in

y , the sequence of approximants yk in the algorithm will converge to the best

approximation.

70 E.W. CHENEY

An impor tan t proper ty of this algorithm is tha t at each stage of the i teration, an upper and a lower bound are available for the unknown number d = dist (x, Y). In fact, we have

\\x-yk\\k<d< min | | x - yi\\ . l<t<k

The lower bound in this inequality converges monotonically upward to d) and

the upper bound converges monotonically downward to d.

A further observation is tha t the algorithm is not limited to finite-dimensional

subspaces, nor even to linear subspaces. The computat ions tha t must be carried

out in each step of the algorithm may be more difficult for a more general set of

approximants , but the basic strategy of the procedure can still be followed.

For approximation in a space C(S) of continuous functions on a compact

Hausdorff space £ , with norm ||a;|| = m a x 5 | z (s ) | , the set $ is taken to be the

set of all point-evaluation functionals <f>3:

4>3{x) = x(s) se S , xeC(S) .

In practical realizations of this algorithm, it is possible to keep the sets $k from

growing bigger by eliminating one "old" element from $& at the same t ime tha t

the "new" element <f>k is to be added. This must be done with some care. The

resulting algori thm is sometimes called the "Exchange Method" . See [Stiefel,

I960].

THEOREM. The sequence [yk] generated by the Remes First Algorithm has at

least one cluster point. Each cluster point is a best approximation of x in Y.

Furthermore

\\x- yk\\k < dist(x,Y) < min | | x - y i | | i<*<«

and these bounds converge monotonically to dist{x>Y).

2. T h e s e c o n d a g o r i t h m of R e m e s . This algori thm is designed solely

for linear approximation in a space of continuous functions on a compact in­

terval [a, 6]. In the space C\a) 6], an n-dimensional subspace of approximants is

prescribed, and it must have a special property called the .Haar property. An n-

dimensional subspace Y in C\a% b\ is a H a a r subspace if each nonzero element of

Y has at most n— 1 zeros in [a, b]. This is an abstraction of the crucial property

possessed by the polynomials of degree < n. As in Section 1, we wish to be able

to compute the best approximation in Y for an arbi t rary element x in C[a, b].

The point of discussing another algorithm is tha t this new one, al though limited

in applicability, will be much more efficient than the one in Section 1. In fact,

under favorable circumstances the new one will converge quadratic ally. This

means tha t the successive approximations y i , j/2> • • • generated by the algorithm

will converge to a best approximation y* in accordance with an inequality of the

form

| | y * - y f c + i | | < c | | y * - y f c | | 2 .

ALGORITHMS FOR APPROXIMATION 71

This 2nd algori thm is also iterative. In the A:-th step, a subset Sk of [a, b] is

given. Each set Sk will contain exactly n -f 1 points, n being the dimension of

Y. As in the first algorithm, it is convenient to use the semi-norm

||u||fc = m a x { | u ( s ) | : s G Sk} .

Each of these is a genuine norm on Y because of the Haar condition. The general

theory tells us tha t there is a unique element yk € Y for which ||x — yk\\k is a

minimum. This element yk is characterized by the fact tha t x(s) — t/fc(s) has

the magni tude ||x — yfc||fc at each point of Ski and exhibits alternating signs as

s runs over Sk from left to right. A typical graph of x — yk in the case n = 4 is

shown in the figure. A new set Sk+i is constructed by taking the abscissae of

n-f 1 local ext remum points of x — yk, care being exercised to ensure tha t x — yk

alternates in sign on the points of Sk+i.

In the figure, the points marked "x" compose the set Sk. The abscissae of the

points marked uo" compose the set Sk+i> The set Sk+i is further required to

have the proper ty tha t \x{s) — t/fc(s)| > ||z — yk\\k f ° r each point of £/e+i, and

the proper ty t ha t \\x — t/fc||/e+i = ||x — yk\\. These restrictions make the choice

of 5/e+i a bit complicated.

THEOREM. The successive approximants yk converge to the best approximation

y* of x in Y. The errors converge to zero at least linearly: \\yk — y*\\ < C6k,

with 0 < 6 < 1. If x and the elements ofY are continuously differentiable and if

the endpoints of the interval are maximum points of \x — y*\, then the algorithm

is quadratically convergent: \\y* — r//c+i|| < c||y* — t/fc||2.

The proof of quadrat ic convergence has been given in [Veidinger, I960]. The

linear convergence proof can be found in [Cheney, 1966]. The facts about Haar

72 E.W. CHENEY

subspaces which were used in the preceding discussion can also be found there.

The order-structure of the real line is essential in the 2nd Remes algori thm,

and no satisfying generalization to arbi trary Banach spaces is known. The algo­

r i thm is therefore strictly limited in applicability, but the rapid convergence is a

compensat ing factor which makes it very popular .

The 2nd algori thm of Remes has been adapted for solving various nonlinear

approximat ion problems in the supremum norm. In every case, it is necessary

to have a characterization of best approximations in terms of the equi-oscillation

of the error function x — y*. We say tha t a function u on [a, b] e q u i - o s c i l l a t e s

n t imes if there exist n + 1 points

«o < si < • • • < sn

in the interval such tha t u(s»_i )u(s t ) = — | |u| |2 for 1 < i < n. Wi th this

terminology, the Chebyshev Alternation Theorem can be s ta ted thus: In order

t ha t an element y in an n-dimensional Haar subspace Y C C[at b] be the best

approximation to an element x in C[a, 6], it is necessary and sufficient tha t the

error function x — y equi-oscillate n t imes. The theorem is true also for pseudo-

norms of the type | | Z | | F = S U P « G F lx(5)l> provided tha t F is closed in [a, 6].

Similar theorems are available for approximation by quotients y/z, where y G

F , z G Z, z > 0, and Y and Z are finite-dimensional subspaces of C[a} b]. For

purposes of i l lustration, we give the classical case. Let I I n denote the space of

polynomials of degree < n, regarded as a subspace of C[a, 6]. For u G C[a, b] we

write u > 0 if u(s) > 0 for all s G [a, b], A useful approximating class is defined

by

K, = {p/q: P e nn , q e nm , q > 0}. Each element x of C[a, b] possesses a unique best approximation y* in the set

R^. It is characterized by the property tha t x — y* must equi-oscillate n-\- m +

2 — i t imes, where i is the largest integer for which y* G -RJ^JlV The "normal"

case occurs when y* ^ R%n^i> ^ n e n ^ n e number of equi-oscillations is (at least)

n + m + 2.

References to the Remes algorithm for ra t ional approximation are [Werner,

1962], [Fraser and Hart , 1962], [Ralston, 1965], [Wetterling, 1963] and [Hart et

al., 1968]. For general nonlinear approximation problems see [Novodvorskii and

Pinsker, 1951], [Shenitzer, 1957] and [B raess, 1967],

3 . T h e Dif ferent ia l C o r r e c t i o n A l g o r i t h m . This algori thm is applicable

to the ra t ional approximation problem (discussed in the preceding section) and

for a generalized version of tha t problem, which we now describe. In a space

C ( S ) , two subspaces Y and Z are prescribed, and it is desired to approximate

an element x of C(S) by a function y/z} where y G Y", z G Z% and z > 0. The

restriction z > 0 means tha t z(s) > 0 for all s G S. It entails no loss of generality

in the classical case, when S is an interval, Y = Uni and Z = n m .

The idea of the differential correction algorithm [Cheney and Loeb, 1961] is as

follows. Let y G F , z G Z and z > 0. Pu t M = ||x — y/z\\. We wish to compute

ALGORITHMS FOR APPROXIMATION 73

small corrections 6y and 6z to y and z so t ha t

II* - (y + 8y)/{* + **)|| = Af - 8M

with 8M > 0. Pu t t ing y = y-\-8y and £ = z-\-8z, we have the following pointwise

inequalities:

\x - y/z\ <M-8M

\xz — y\ < zM — z8M = zM — z6M — Sz6M

\xz — y\ — zM 8z6M

For small per turbat ions , we can ignore the second-order term 8z8M. Since 8M

is to be as large as possible, it seems reasonable to select y and z to minimize

the expression

^^\x{s)z(s) - y(s)\ - z{s)M

» z («)

This minimizat ion must be done with a suitable normalization for z} such as

||21| = 1, because otherwise the expression above can be driven to —oo with cer­

tain choices of y and z. In the differential correction algorithm, these corrections

are made iteratively and produce thereby a sequence of approximants r^ = yk/zk

which, under certain conditions, will converge to a best approximation of x.

Investigations of this algorithm are still going on. The quadrat ic convergence

was established in the discrete case in [Barrodale, Powell and Roberts , 1972].

Quadra t ic convergence for approximation on an interval was proved in [Dua and

Loeb, 1973]. An adaptive version of the algorithm has appeared in [Kaufman,

McCormick and Taylor, 1983]. Various further results have been given in [Kauf­

m a n and Taylor, 1981] and [Powell and Cheney, 1986]. See the impor tan t survey

[Taylor, 1985].

In the "classical case", we take 5 to be a compact interval on the real line,

and we define

Ki = {vl* : V € n n , z e n m , z > 0 on S) .

The principal result, due to Dua and Loeb is this:

THEOREM. Let x be an element of C(S) such that distfeyR^) < dt3t(x9R£l\).

If the differential correction algorithm is started with an approximation r*o such

that ||x —ro|| < dist(x, R^^i), then the sequence [r*] converges quadratically to

the best approximation of x in R^.

For "generalized" ra t ional approximation, in which S is arbi trary and arbi t rary

subspaces replace I I n and I I m , the si tuation is not completely unders tood. If an

e > 0 is fixed, and if the approximating family is defined to be

R = {y/z :yeY, zeZ , e < z(s) < 1 on S } ,

74 E.W. CHENEY

then a great simplification occurs in the analysis of the algori thm. In the first

place, each x € C{S) will possess a best approximation in R, provided only

tha t R is nonempty and tha t the subspaces Y and Z are finite dimensional. See

[Kaufman and Taylor, 1981]. These authors also prove the quadrat ic convergence

of the algori thm under certain conditions. The advantage of the differential

correction algori thm over the Remes Second algori thm lies principally in its

wider applicability, part icularly to problems involving functions of two or more

variables.

The constrained minimization problem tha t occurs in each step of the differen­

t ial correction algori thm is to find y EY and z G Z to minimize the expression in

(1), subject to a constraint such as ||£|| = 1. This is usually done approximately

by taking a finite subset of S and applying a linear programming code.

4 . T h e D i l i b e r t o - S t r a u s A l g o r i t h m . This algorithm appeared in [Dilib­

erto and Straus, 1951], and solves the following approximation problem. We

are presented with a continuous function of two variables, i.e., an element x

of C(S X T ) , where S and T are compact Hausdorff spaces. It is desired to

approximate x as well as possible in the form

x(s,t) « u(s) + v(t)

where u and t; are chosen freely in C(S) and C(T), respectively. Even in the

simplest case, where S and T are finite sets, the algorithm is of practical impor­

tance because it solves a problem of opt imal scaling (or "preconditioning") of

matrices.

This algori thm shares with the algorithms discussed previously the property

of being i terative. The formulas defining it are these:

X0 = X X2n+1 = X2n ~ «n ^2n+2 = X 2 n + 1 ~ Vn

un (s) = | max x2n (s, t) + \ min x2n (s, t)

vn(t) = | m a x x 2 n + i ( 5 , t ) + | m i n x 2 n + i ( 3 , t ) . s s

A moment ' s reflection will convince one tha t un is a best approximation of x2n

by a function of s alone. Indeed, if 5 is momentari ly held fixed, then the number

un(s) is the constant which best approximates xs. Here x9 denotes the s-section

of x defined by x8(t) = x(s , t). The t-sections of x are defined by xt(s) = x(s , t).

In the same way, we see tha t vn is a best approximation of x 2 n + i ^Y a function

of t alone. The formulas defining un(s) and vn(t) produce continuous functions,

by elementary arguments .

A crucial property of the algori thm is tha t the iterates x i , x 2 , . . . form an

equicontinuous sequence in C(S X T) . The sequence is also bounded and has the

property | |x n | | j dist(x, F ) , where Y now denotes the subspace C(S) + C(T) in

C(S X T ) . All of this was already established by Diliberto and Straus. A proof

t ha t the sequence [xn] converges in the space C(SxT), i.e., converges uniformly,

was eventually given in [Aumann, 1959]. Since it is clear from the construction

ALGORITHMS FOR APPROXIMATION 75

t ha t x — xn G Yy we conclude tha t limn_+00(a; — xn) is a best approximation of

x in Y. (The subspace Y is closed.)

Although it is possible to construct examples in which the convergence of this

algorithm is slow, it works quite well in most practical problems. In [Bank,

1979], the Diliberto-Straus algorithm is used for preconditioning the matrices

which occur in numerically solving a par t ia l differential equation. Only a few

steps of the algori thm are needed to accomplish the desired purpose. Some recent

work on this algori thm is contained in [Light and Cheney, 1980], [von Golitschek

and Cheney, 1979, 1983] and [Dyn, 1980].

The algori thm works in an arbi trary Banach space X wi th two subspaces

U and V provided tha t these subspaces have c e n t r a l p r o x i m i t y m a p s , and

provided tha t U -f V is closed. A m a p A : X —-• U is called a p r o x i m i t y m a p

if | |x — Ax\\ = dist(x, U) for each x £ X. The proximity m a p A is said to be

c e n t r a l if

||x — Ax -f u|| = \\x — Ax — u\\

for all x G X and all u G U. In [Golomb, 1959] it is proved tha t under the

hypotheses given above, | | z n | | i dist(x, Y) , with Y = U -j-V. The exposition

in [Light and Cheney, 1985] is recommended for this result. If X is uniformly

convex, then lim(x — xn) exists and is the best approximation of x in Y.

In terms of the proximity maps A : X —• U and B : X —• V, the algori thm

reads as follows

XQ = X ,

^ 2 n + l = X2n — Ax2n ,

Z2n+2 = Z2n+1 ~ # Z 2 n + l

Unfortunately, the hypothesis tha t bo th proximity maps be central is very

restrictive. A notable case is Hilbert space, in which every orthogonal projection

onto a closed subspace is a central proximity m a p . Wi th this observation, we

recover an older theorem of [von Neumann, 1950] which states t ha t in Hilbert

space, the orthogonal projection of x onto U -f V is l im n (x — x n ) , where xn is

as above. Thus it transpires tha t von Neumann 's algorithm, which he s ta ted

for Hilbert space, is the same as the algorithm of Diliberto and Straus, which

they stated for a space C(S X T) . But von Neumann 's algorithm (also called the

"al ternat ing algori thm") works for any pair of closed subspaces in Hilbert space,

while the Diliberto-Straus agori thm works for C(S) + C ( T ) , as a subspace of

C[SxT). Generalizations of the algorithm for approximating functions in Li(S X T)

by functions in Li(S) -f Li(T) have been given in [Light, McCabe, Phillips

and Cheney, 1982], [Light and Holland, 1984], [Light, 1983] and [Light, 1984].

Generalizations to smooth and uniformly convex spaces have recently been given

in [Deutsch, 1979], [Pranchetti and Light, 1984, 1985] and [Deutsch, 1984].

There is a na tu ra l extension of the algorithm to subspaces of C(S X T) having

the form G <S> C(T) + C(S) ® H. Here G and H are finite-dimensional subspaces

in C(S) and C(T) respectively. If G has a basis {gi,..., gn} then G (8) G(T) is

76 E.W. CHENEY

the linear space of all functions

n

where the coefficient functions y» run over C(T). The subspace C(S) ® H is

defined similarly. The functions being used for approximants here are therefore

very general. Good approximations can be obtained by the so-called blending

methods introduced in [Gordon, 1971]. But methods for producing best approx­

imations are still lacking. Tha t the na tura l extension of the Diliberto-Straus

algori thm fails was proved in [Dyn, 1980]. See also [von Golitschek and

Cheney, 1983]. For these more general subspaces, questions of existence of best

approximations remain open.

5 . R e c e n t W o r k o n N o m o g r a p h i c F u n c t i o n s . A continuous bivariate

function x is said to be nomographic if there exist continuous univariate functions

u, v and / such tha t

as(a, 0 = /(«(»)+«(*) ) • In some intuitive sense, a nomographic function should have simpler s tructure

t han a completely general bivariate function. As an example, the function cos(st)

is nomographic on the domain where s > 0 and t > 0, since

cos(s£) = cos oexp( logs + logt) .

Nomographic functions derive much of their interest from the fact tha t they are

"building blocks" from which all continuous functions on H 2 can be constructed.

In fact, the following remarkable theorem of Kolmogorov and Arnold is valid:

THEOREM. Every continuous bivariate function on the square, 0 < s < 1, 0 < t < 1, is a sum of at most five nomographic functions.

Various refinements of the Kolmogorov-Arnold Theorem have been made, and

the reader should consult [Lorentz, 1966] for these improvements.

An interesting open problem is to devise an algorithm for producing good

approximations by nomographic functions. Some progress in this direction has

recently been made by von Golitschek, whose algorithm will now be described.

See [von Golitschek, 1984],

We are given x e C{S x T ) , / G C(1R), g G C{S)9 and h G C(T). We seek

two functions u G C(S) and v G C(T) which yield a minimum deviation in the

approximation

(2) x{s,t)p*f(u{8)h{t) + v{t)g{a)) .

Further assumptions tha t are made are tha t / is strictly increasing, h > 0,

and g > 0. Of course, if we take h(t) = g(s) = 1, then in (2) we have a

nomographic approximation to the function x\ notice however, t ha t / has been

ALGORITHMS FOR APPROXIMATION 77

prescribed. Von Golitschek proves t ha t the opt imal u and t; exist, and his proof is

constructive (algorithmic). The sets S and T can be arbi trary compact HausdorfF

spaces.

The algori thm proceeds as follows. First, a value of the parameter a is selected.

Ideally, we would use a — p, where

p = inf ||x — / o [uh + vg)\\

but the value of p is usually not known. Next we define two functions to facilitate

notat ion: K{s,t) = f-1{x(s,t)-a)/g(s)h(t)

L{stt) = r1(x{8,t) + a)/g{8)h{t)

The algori thm generates two sequences [un] and [vn] s tar t ing with

u0(s) = 0 vo{t) = m f L(si t)

Subsequent functions are defined by the formulas

«n(s) = Wn- l (s) V SUp[it(s,t) - Vn_i(t)]

t>n(t) = v n - l M A mf [L(s, t) - Un(s)]

The algori thm has two "stopping criteria". These are tests to be made in each

step as follows. If vn = v n - i or vn < —2||L|| — | | if ||, STOP. In these equations,

V and A are the pointwise maximum and minimum operations. The formulas in

the algorithm permit us to make some immediate observations:

(i) un and vn are continuous.

(ii) 0 < uo < ui < • • •

(iii) v0 > v± > v2 > - • • (iv) vn< L-un

or un -h vn < L = Z"1 o (x -f a)/gh

or / o (ungh + vnhg) < i + a or —a < x — f o (ungh -f- vnhg)

(v) Similarly,

x - f o (ungh -h Vn-xhg) < a

(vi) If vn = vn-i then by the preceding inequalities,

II* - / o (ungh -f vngh)\\ < a

These elementary arguments establish the first half of the next theorem.

THEOREM. / / the algorithm stops at the n-th step with vn = vn-i then

||z ~ / © {ungh + vnhg)\\ < a

78 E.W. CHENEY

Hence a > p. If a > p then for some n, vn = v n - f i .

Sta ted informally, if a is chosen greater than the minimum deviation p, then

the algori thm will produce in finitely-many steps an approximation to x giving

precision a.

The dual result is as follows.

THEOREM. The inequality a < p is true if and only if the inequality vn <

—2\\L\\ — \\K\\ is true for some n.

The only case in which infinite sequences are generated by the algori thm is the

case when a = p. In this case, the sequences [un] and [vn] are equicontinuous

in C(S) and C(T). Furthermore, they are bounded. By the monotonicity of

these sequences, they converge to continuous functions, say u and v. Then our

previous inequalities show tha t

||x - / o [ugh + vhg)\\ < p

This provides the constructive proof of existence of an optimizing pair, which is

(ug, vh).

B I B L I O G R A P H Y

1. G. Aumann, Uber approximative Nomographic, I, II and III, Bayer. Akad. Wiss. Math.-Nat. Kl. S.B. (1958), 137-155; ibid. (1959), 103-109; ibid. (1960), 27-34. MR 22#1101, 22#6968, 24#B1289.

2. R. Bank An automatic scaling procedure for a d'Yakanov-Gunn iteration scheme, Linear Algebra and its Applications 28 (1979), 17-33.

3 . I. Barrodale and C. Phillips, Algorithm 495: Solution of an overdetermined system of linear equations in the Chebyshev norm, ACM Trans. Math. Software 1 (1975), 264-270.

4. I. Barrodale, M.J.D. Powell and F.D.K. Roberts, The differential correction algorithm for rational approximation, SIAM J. Numerical Analysis 9 (1972), 493-504.

5. R.H. Bartels, A.R. Conn and C. Charalambous, On Cline's direct method for solving overdeter­mined linear systems in the LOQ sense, SIAM J. Numerical Analysis 15 (1978), 255-270.

6. R.H. Bartels and G.H. Golub, Stable numerical methods for obtaining the Chebyshev solution to an overdetermined system of linear equations, Comm. Assoc. Comput. Mach. 11 (1968), 401-406, 428-430, ibid. 12 (1969), 326.

7. D. Braess, Approximation mit Exponentialsummen, Computing 2 (1967), 309-321. 8. E.W. Cheney, "Introduction to Approximation Theory", McGraw-Hill, New York, 1966.

2nd Edition, Chelsea Publ. Co., New York, 1982. 9. E.W. Cheney, Five lectures on the algorithmic aspects of approximation theory, in "Topics in Nu­

merical Analysis", ed. by P.R. Turner, Lecture Notes in Math., Springer, New York, 1984. 10. E. W. Cheney and H.L. Loeb, Two new algorithms for rational approximation, Numerische Math.

3 (1961), 72-75. MR 22#21692. 11. A.K. Cline, A descent method for the uniform solution to overdetermined systems of equations, SIAM

J. Numerical Analysis 13 (1976), 293-309. 12. A.R. Curtis and M.J.D. Powell, On the convergence of exchange algorithms for calculating minimax

approximations, Computer J. 9 (1966), 78-80 13. F. Deutsch, Von Neumann's alternating method: the rate of convergence in "Approximation Theory

IV", C. Chui, L.L. Schumaker and J.D. Ward, eds., Academic Press, New York, 1984, 427-434.

14. F. Deutsch, The alternating method of von Neuman, in "Multivariate Approximation Theory", W. Schempp and K. Zeller, eds., ISNM Vol. 51, Birkhauser, Basel, 1979, 83-96.

15. S.P. Diliberto and E.G. Straus, On the approximation of a function of several variables by the sum of functions of fewer variables, Pacific J. Math. 1 (1951), 195-210, MR 13, p. 334.

ALGORITHMS FOR APPROXIMATION 79

16. S.N. Dua and H.L. Loeb, Further remarks on the differential correction algorithm, SIAM J. Nu­merical Analysis 10 (1973), 123-126.

17. C.B. Dunham, Chebyschev approximation by rationals with constrained denominators, J. Approxi­mation Theory 37 (1983), 5-11.

18. C.B. Dunham, The weakened first algorithm of Remez, J. Approximation Theory 31 (1981), 97-98. MR 82m:41026.

19. N. Dyn, A straightforward generalization of Diliberto and Straus' algorithm does not work, J. Approx­imation Theory 30 (1980), 247-250.

20. C. Franchetti and W.A. Light, The alternating algorithm in uniformly convex spaces, J. London Math. Soc. (2) 29 (1984), 545-555.

21 . C. Franchetti and W.A. Light, On the von Neumann alternating algorithm in HUbert space, J. Math. Analysis and Applications (to appear).

22. W. Fraser and J.F. Hart, On the computation of rational approximations to continuous functions, Communications Association for Computing Machinery 5 (1962), 401-403, 414.

23. G.A. Gislason, An algorithm for constrained, nonlinear Tchebycheff approximation in "Theory of Approximation" ed. by A.G. Law and B.N. Sahney, Academic Press, New York, 1976, 298-307.

24. A.A. Goldstein, On the stability of rational approximation, Numer. Math. 5 (1963), 431-438. 25. M. von Golitschek, Shortest path algorithms for the approximation by nomographic functions in

"Approximation Theory and Functional Analysis" , P.L. Butzer, R.L. Stens and B. Sz.-Nagy, eds., Birkhauser Verlag, Basel, ISNM Vol. 65, 1984.

26. M. von Golitschek and E.W. Cheney, Failure of the alternating algorithm for best approximation of multivariate functions, J. Approximation Theory 38 (1983), 139-143.

27. M. von Golitschek and E.W. Cheney, On the algorithm of Diliberto and Straus for approximating bivariate functions by univariate ones, Numer. Funct. Analysis and Optimization 1 (1979), 341-363. MR 80g:41023.

28. M. Golomb, Approximation by functions of fewer variables in "On Numerical Approximation", R. Langer, ed., University of Wisconsin Press, Madison, Wisconsin, 1959, 275-327, MR 21#962.

29. W.J. Gordon, Blending-function methods of bivariate and multivariate interpolation and approxima­tion, SIAM J. Numerical Analysis 8 (1971), 158-177, MR 43#8209.

30. J.F. Hart, et al., "Computer Approximations", John Wiley, New York, 1968. 3 1 . E.H. Kaufman, Jr., S.F. McCormick and G.D. Taylor, An adaptive differential-correction algo­

rithm, J. Approximation Theory 37 (1983), 197-211. 32 . E.H. Kaufman, Jr. and G.D. Taylor, Uniform approximation by rational functons having restricted

denominators, J. Approximation Theory 32 (1981), 9-26, MR 84b:41014. 33 . W.A. Light, The Diliberto-Straus algorithm inLi (XxY), J. Approximation Theory 38 (1983),

1-8. MR 84h:41048. 34. W.A. Light, Convergence of the Diliberto-Straus algorithm in L\ [X X Y), J. Numer. Functional

Analysis and Optimization 3 (1981), 137-146. 35. W.A. Light and E.W. Cheney, "Approximation Theory in Tensor Product Spaces", Lecture

Notes in Mathematics, Springer-Verlag, New York, to appear, 1986. 36. W.A. Light and E.W. Cheney, On the approximation of a bivariate function by the sum of univariate

functions, J. Approximation Theory 29 (1980), 305-322, MR 82d:41023. 37. W.A. Light and S.M. Holland, The L\-version of the Diliberto-Straus algorithm in C{S X T ) ,

Proc. Edinburgh Math. Soc. 27 (1984), 31-45. 38. W.A. Light, J.H. McCabe, G.M. Phillips and E.W. Cheney, The approximation of bivariate

functions by sums of univariate ones using the L\-metric, Proc. Edinburgh Math. Soc. 25 (1982), 173-181.

39. G.G. Lorentz, "Approximation of Functions", Holt, Rinehart and Winston, New York, 1966. (To be reprinted by Chelsea Publishing Co., 15 E. 26th Street, New York, N.Y. 10010.)

40. F.D. Murnaghan and J.W. Wrench, The determination of the Chebyshev approximating polynomial for a differentiable function, Math. Tables and Other Aids to Computation 13 (1959), 185-193.

41 . E.N. Novodvorskii and I. Sh. Pinsker, On a process of equalization of maxima, Uspehi Math. Nauk 6 (1951), 174-181 (Russian).

42. M.J.D. Powell, "Approximation Theory and Methods", Cambridge University Press, 1981. 43 . M.J.D. Powell and E.W. Cheney, The differential correction algorithm for generalized rational func-

80 E.W. CHENEY

tions, CNA Report, University of Texas, 1984. 44. A. Ralston, Rational Chebyshev approximation by Hemes algorithms, Numer. Math. 7 (1965),

322-330. 45. E. Ya. Remes, Sur le calcul effectif des polynomes d'approximation de Tchebichef, C.R. Acad. Sci.

Paris 199 (1934), 337-340. 46. A. Shenitzer, Chebyshev approximation of a continuous function by a class of functions, J. Assoc.

for Computing Machinery 4 (1957), 30-35. 47. E.L. Stiefel, Note on Jordan elimination, linear programming and Tchebycheff approximation, Numer.

Math. 2 (1960), 1-17. 48. G.D. Taylor, The differential correction algorithm in "Delay Equations, Approximation and

Applications", ISNM Series, Birkhauser Verlag, Basel, to appear. 49. L. Veidinger, On the numerical determination of the best approximations in the Chebyshev sense,

Numerische Math. 2 (1960), 99-105. 50. J. von Neumann, "Functional Operators, Vol. II", Annals of Mathematics Studies 22,

Princeton University Press, 1950. 51 . H. Werner, Rationale Tschebyscheff-Approximation, Eigenwerttheorie, und Differenzenrechnung,

Arch. Rational Mech. Analysis 11 (1962), 368-384. 52. W. Wetterling, Ein Interpolationsverfahren zur Losung der linearen Gleichungssysteme, die bei der

rationalen Tschebyscheff-Approximation auftreten, Arch. Rational Mech. Analysis 12 (1963), 403-408.

Proceedings of Symposia in Applied Mathematics Volume 36, 1986

Algebraic Aspects of Interpolation

Charles A. Micchelli IBM T. J. Watson Research Center

P.O. Box 218 Yorktown Heights, N.Y. 10598

Introduction. This lecture contains basic facts about interpolation. As the title suggests we only discuss constructive methods for interpolation and do not address questions of convergence.

Conceptually, interpolation is the simplest method of approximation. A particular function is selected from a class of functions by the requirement that it match given values at a finite set of points in its domain.

Applications of interpolation in science and engineering are manifold indeed. Interpolation is a basic part of many numerical methods and so a rudimentary understanding of the elements of interpolation is important.

Much of what we present here is standard material and most books on numerical analysis and approximation theory treat this topic, [ 1 , 3 9 ] . We will try to contrast the relative simplicity of univariate interpolation with the greater complexity and challenge encountered in the multivariate case.

The lecture is organized as follows:

Part 1. Univariate Interpolation:

1.1. Polynomial Interpolation . 1.2. Trigonometric Interpolation . 1.3. Chebyshev Systems. 1.4. Spline Interpolation .

Part 2. Multivariate Interpolation:

2.1 Interpolation on Special Configurations . 2.2 Optimal Interpolation . 2.3 Radon Transform and Interpolation .

AMS (MOS) Subject Classification 41A05

Key Words: interpolation, divided difference, conditionally positive definite, Radon transform.

© 1986 American Mathematical Society 0160-7634/86 $1.00 + $.25 per page

81

http://dx.doi.org/10.1090/psapm/036/864367

82 CHARLE S A . MICCHELL I

Univariate Interpolation.

1.1. Polynomial Interpolation.

We denote by irn the set of complex polynomials of degree < n,

Pto = a0 + axx + ••• + a„K .

Theorem 1 . Given any distinct points *Q, ... ,xn+1 (real or complex) and data>'0. ... ,yn+i (also real

or complex) there exists a unique polynomial p e irn such that

/>(*,•) =yit i = 0, 1, ... ,n.

Proof. This is immediate since any nonzero polynomial/; e irn has at most n zeros. Alternatively,

the determinant of the linear system which determines the coefficients of the polynomial has the

value

n (xt - x). 0<i<J<n J

The interpolation procedure above can be written in terms of Lagrange polynomials,

(X - XfjU {X-)

w(x) = (x-x0) ... (x-xn)

as

n

Theorem 1 extends to interpolation of consecutive derivatives, sometimes called Hermite in-k

terpolation. Thus whenever mlt ... , mk are positive integers with Im, = « + 1 there is a unique

p e irn with

pU\tj) « / % ) . i = 0, 1 mj- 1J= 1 k.

We denote this polynomial by H(f | r0 /„)(/) where each /; is repeated with its multiplicity.

The leading coefficient of H{f | f0 tn)(t) is defined to be the n-th divided difference of f,

t'O <nV

ALGEBRAIC ASPECTS OF INTERPOLATION 83

and H{f | tQ, ... , tn) has the Newton representation

(1) H{f | tQ, ... , *„)(/) = X «'o» - . '/W(' - 'o) - . 0 - '/-!>• y-0

It can be shown that

k mt "-1

and even more can be said when m,= 1, i = 1, ... , k:

/up (2) [<o '„]/=£

.to.n.C,-'.) J i*J

Another formula of totally different sort says

(3) fro. - . . tnV= f fWiZ °tid°l .» d°n J Sn i -0

where

i -0

is the regular n-simplex. This formula, called the Hermite-Genocchi formula, makes it apparent

that whenever f is a polynomial, [t0, ... , tn]f is a polynomial in t0t ... , /„. One would have a dif­

ficult time seeing this directly from (2).

Divided differences are quite useful. They are even helpful in a discussion of the plane wave

approach for the construction of the fundamental function of a hyperbolic equation. In this con­

text, it was shown in [ 26 ] that whenever f is a polynomial and zlt ... , zm are the zeros of some

homogeneous polynomial Q(z, x) in (z,x) where x e C for some s, then [zu ... ,zm]f is again a

polynomial in x.

The Neville-Aitken formula

W I 'o '*X0 = ' * - 'o

84 CHARLE S A . MICCHELL I

can be used to evaluate H(f)(t). Also, we mention that the Newton form for Hermite interpo­

lation , (1), can be evaluated by n nested multiplications

Pit) = ( . . . {dn{t - tn_x) + dn_x){t - tn_2) + ••• + dx){t - /0) + d0> dj=[tQ, ... , t)f.

1.2. Trigonometric Interpolation.

A trigonometric polynomial has the form

n

t{x) = OQ + S (ajcos Jx + bjsmjx). / - I

We use Tn to represent the class of all such functions.

Theorem 2. Given any N = 2n+1 distinct points XQ, ... , xN_t in some interval of length < 2tr and data

o» ••• »>V-i there is a unique t e Tn such that

(4) '(**)=*•, i = O fl AT- 1.

The Lagrange form for t(x) is

N-l

tix) = YJ M W

I -O

where N-l X — Xk N-l Xt — Xk

tAx) = n sin — / II sin — 1 Jfc-0 2 *-0 2

km km

Proof. The proof is as in the polynomial case. We may argue either that a trigonometric

polynomial in Tn has at most N-1 zeros in any interval of length < 2IT or that the determinant

of the linear system (4) has the value

N+l „ . xi~~xj 2 n sin ——A

0<i<j<N-l 2

When the points are equally spaced, xk = 2irk/N, k = 0, 1, ... , N — 1, an important case for ap­

plications, more information is available.

Theorem 3 . Let t(x) = X ae'JX and y-0 r

ALGEBRAIC ASPECTS OF INTERPOLATION 85

«,-i |V-*,-o,. »-. lvi_

03 = e N

then

Proof.

*<**)=.&. * = 0, 1, . . . ,N- 1.

N-l 1 N-l _Jg

tf-1 /V- l

E l V> ~jt Jk

yt-ji 2J<* a

In this form, the coefficients of t{x) can be computed in 0{N log N) multiplications by using

the fast Fourier transform, see Cooley, Tukey [ 7 ] .

1.3 Chebyshev Systems.

The previous results suggest the following

Definition 1. A linearly independent set of continuous functions {u0(x), ... ,u„(x)} is called a

Chebyshev system on [ a,b ] whenever for any a <XQ< ••• <xn<b andy0, ... ,y„ e IR there is a R

unique u(x) = £ auXx) satisfying y-0 J J

u(Xi) = yt, i = 0, 1, ... ,n.

Thus the determinant of the linear system

\x0,xlf...,xn/ :det !"/(*/) II/,y-«o,...,*

W

has a fixed sign for a < XQ < • • • < x„ < b.

Chebyshev systems are important in approximation theory. They have been extensively

studied by many including S. Karlin [ 28 ] , S. Karlin, W. J. Studden, [ 27 ] , and M.G.Krein [30].

86 CHARLE S A 0 MICCHELL I

Of course, irn is spanned by a Chebyshev system on ( — oo, oo) and Tn is a Chebyshev space on

any interval of length < 2IT. We list below several other examples.

Example 1. uXx) = — , j: = 0, 1, ... , n , | x \ < 1, \> distinct and | Xt | < 1. These J 1 + Kfc J J

functions form a Chebyshev system since

( 0, 1, . . . , / ! \ n ) = n a j

xQtxlt...txn/ y-0

n (a— ak)(xj-xk) 0<k<J<n J J -1

Il(oj + xk) J J

(cf.Achieser [1]) .

The fundamental functions for interpolation at x, = X, are

L{x) = -—. , B{x) = n -±-

Example 2. u,(x) = ex ', i = 0, 1, ... , «, A, distinct, x e ( — oo, oo). To see that these functions

form a Chebyshev system we suppose that

has n+1 zeros then v = u — A0w has at least n zeros. Since

induction on the number of functions can be used to show that they form a Chebyshev system.

Example 3. Given any positive continuous functions w0(x) > 0, ... , wn(x) > 0, x e [a, b] then

ux{x) = wx{x) \ w0{o0)do0 J a

unW = *„(*) J w„- i ( f f „ - l ) j wn_2(an_2) . . . ^ . j . . . fi?a0, J a J a

form a Chebyshev system, see Karlin, Studden [ 27 ] . More examples are given in [ 27,28 ] .

1.4. Spline Functions.

ALGEBRAIC ASPECTS OF INTERPOLATION 87

Definition 2. Let 0 = £o < ii < '" < Zm < £m+i = 1, Am = {£,}m, and set

^ ( A m ) = {S:S | U[tc+i) e *„, 5 £ C""1©, 1)}.

An element of ^„(Am) is called a spline function of degree n with knots at £lt ... , £m.

It is easy to see that dim ^7„(Am) = n + m + 1.

Spline functions are a frequent choice for interpolation of experimental data. Usually, in

practical application splines of degree five or less are used. The degrees of freedom intro­

duced by the knots are then used to interpolate data. Unlike polynomials, spline functions have a

local character. The spline curve can be altered in a part of its domain without dramatically af­

fecting it elsewhere. Spline functions are especially useful for constructing shape preserving

(monotone/convex) interpolation. As for the spline interpolation problem we next present a

fundamental result of Schoenberg and Whitney [ 38 ]

Theorem 4. Given any points 0 < x1 < ••• < x„+m+1 < 1, n > 2, and datay l ty2, ... . A+m+i there ex­

ists a unique spline S e ^?„(Am) such that

S(Xj) = yt, i = 1, 2, ... , n + m + 1

if and only if

Xi<Si<Xi+„+l, i = 1,2, . . . , m .

Proof. A proof of this result can be based on RoUe's theorem and induction on m and n. A proof

using determinants is given in [28 ] .

The recommended approach for computing a spline interpolant is to express it in terms of

B-splines. Thus we write

m

S(x) = 2<*«C*I€ , |,.+„+1) —n

where the additional knots are chosen arbitrarily andA/(x | £,-,..., I l+n+i) is the B-spline defined

by

(6) M{x \i, ii+n+1) = [«, ii+n+1]( • - x)l

88 CHARLES A0 MICCHELLI

(x* = ( max(x,0))rt). This leads to a banded linear system for the B-spline coefficients which can

be solved by Gaussian elimination.

For cubic spline interpolation a tridiagonal system results because each B-spline has three

knots in the interior of its support. Frequently, the knots are chosen to be the point of interpo­

lation. Thus additional conditions are required to specify the spline uniquely .These can be chosen

as boundary conditions at 0 and 1 (rather than additional interpolation conditions). For a dis­

cussion of various choices for these boundary conditions and numerical methods for the solution

of the corresponding tridiagonal systems see de Boor [ 3 ] .

The special structure of the matrices for periodic spline interpolation ad­

mits fast algorithms for their computation. Given any periodic data y0, ... ,yN_lf yl+N = yit we

pass an odd degree periodic spline with knots £, = i/N, i = 0, 1, ... , N through the data,that is,

5 U , ) = ^ J = 0 , 1 , . . . , A T - 1 , Se&^iAx)

SU)(0) = SU\l), i = 0, 1, ... , 2r - 2.

The existence of a unique periodic spline interpolant can be established by showing that any peri­

odic spline S(x) satisfies

f s (WW= o

for any periodic function g e C(r)[0,l] which vanishes at £,, i = 0, 1 AT— 1. If we expand S

in a Fourier series

$(*) = £ s/*iJx — 00

then s.•,= T/i; where af = — X yku J is the discrete Fourier transform of the data, see (5), and

J j J J N *-o

Tj are attenutation factors given by T0 = 1, Tk = 0, k = 0( mod JV) and otherwise by

/ sin irxk \ T * = \ vxk ) ?2r( c o s ™A:)> k^O(modN),

where qt are polynomials given recursively by

2

4 0 ( 0 = 1 , g/(0 = fr/.i(/)+ ° ^ /1

) q'l-iiO, / = l , 2 , . . . ,

ALGEBRAIC ASPECTS OF INTERPOLATION 89

[ 17 ] . Two useful applications of periodic spline interpolation are given in [ 22 ] .

Spline interpolation methods are known to have many optimality properties. Next we de­

scribe a particularly striking instance of optimal spline interpolation (see also Theorem 10). For

this purpose we need

Definition 3. A perfect spline P of degree n with knots £lt . . . , £ m such that — « =

£o < £i < *' * < £m < £m+i = °° i s a function with the following properties:

P e Cn~\ - oo, oo), P j ( y e vH, i = 0, 1, ... , m and P{n\x) \ {i.A. } = ( - l ) ' a for some a e R.

The following result is proved in [ 34 ]

Theorem 5. Given any points 0 < xt < • • • < xn+m < 1 there is a unique perfect spline P (up to a sign

) with knots £lt ... , £m such that P(x,) = 0, i = 1, . . . , « + m, and

1 P{n) || m = max | P{n\x) 1 = 1 . Moreover, there is a unique S e ^„_1(AW) such that 0<x<l

5(x,) = /(*,), i = 1, ... , n + m

and this interpolant satisfies

(7) I S{x) - / ( * ) | < | P{x) | | | / n ) IL, 0 < x < 1.

Furthermore, there is no function ;4:IRn+m-*IR such that A(J{x^), ... ,f(x„+m)) gives a better estimate

for f{x) , in the sense of (7), than S(x)

In the terminology of [ 33 ] , S(x) above is the optimal recovery of f(x), in ^ [ 0 , 1] from the

information / ( x j , ... ,/(xB+m). The interpolant described in Example 1, is the analog of this

interpolant for the Hardy space on the disk, see [ 15 ] . In this case we suppose that f(z) is ana­

lytic on the disk and the error in estimating f(x) using the information/(\0), ... , f{Xn) is bounded

by the norm —— / * | f(e,e) \ 2d6. See the lecture of A. Pinkus for more on optimal recovery.

Part 2. Multivariate Interpolation.

Chebyshev systems are helpful for studying univariate interpolation. Unfortunately, though,

as soon as we turn to multivariate interpolation we must leave them behind since there are no

Chebyshev systems on IR\ s > 2, [ 31 ] . To see this , suppose to the contrary that

UQ, ... ,uH,n> \, are continuous functions on 1R' with the property that

90 CHARLE S A . MICCHELL I

n det | ut(xJ) | 5* 0

for any distinct points x , ... ,xn e IR'. We join x0 ,*1 along nonintersecting paths

x\t),x\t), e R' /{x2 , ... ,xn} 0 < f < 1, so that (*°(0), x\0)) = (JC°,x1) and

(x°(l), JC^I)) = (x \*°) . Then the determinant above changes sign along these paths, which is a

contradiction.

Thus we see that there is no set of n universal functions which can be used for interpolation

at any n distinct points. We discuss below several ways to deal with this intrinsic difficulty of

multivariate interpolation. First we consider interpolation by polynomials.

Let 7r„(R') be the space of polynomials of total degree < n on IR" and /*n(R') the homogeneous

members of 17„(R5). with exact degree n .

Theorem 6. Given any distinct points x°, ... tx" e IR* there is a/? e ir^R') such that

/>(*') = # , i = 0, 1, ... ,n.

( n + s\ 1 > n + 1 for s > 2 and so the polynomial above is

not unique. Here is an easy way to construct such a polynomial: Choose any vector X e IR* such

that the points X • x, i = 0, ... , n are distinct.Then by Theorem 1 there is a polynomial

q e w„(JR}) satisfying

q(X*x) =yit i = 0, 1, ... ,n.

Thus p(x) = q(X • x) provides the required interpolant.

Hence we see that there is a universal set of interpolating functions provided we are willing

to use more functions than data ! This observation suggests several questions. The first is whether

or not there is a universal set of interpolating functions with dimension smaller than irn(R')?

Better yet, what is the minimal m such that there exists continuous functions M0, ... , um on RJ

with the property that for any distinct x°, ... , xn and dataj>0, ... ,yn the equations

YJ api(xJ) = yjf j = 0, 1, ... , n i-0

ALGEBRAIC ASPECTS OF INTERPOLATION 91

has a solution? We just showed that in the plane n < m < —(n + 2)(/i + 1). It is possible to do

better than this! In fact, if we view x°, ... ,xn e R 2 as complex number and use Theorem 1 we

see that the real and imaginary parts of zk, z = x + iy, k = 0, 1, ... , n can be used for interpo­

lation; Thus in the plane m <2n + 1. For more information about this problem, see [ 37 ] .

The proof of Theorem 6 also suggests using the functions {uQ(X • x), ... , un(X • x)} for

mulivariate interpolation where {u0, ... , u„] is a Chebyshev system. Even spline functions can be

used provided due consideration is given to Theorem 5. We call this ridge function interpolation.

The advantage of this method is clear: univariate interpolation methods can easily be used for

multivariate interpolation. However, in general this approach is bound to give disastrous results.

Ridge function interpolants only vary in the direction of X . Thus two values of X • xl can be

close, while corresponding x/s and y's are far apart. Nevertheless, there are data sets well fitted

by ridge functions. For really difficult terrain, one may try using several directions,

{WQCA1 • x)t ... , u0(Xk • x), ... , K^A1 • x), ... , u„(Xk • x)} and search for good choices for

X , ... , X - projectional pursuit. Statisticians have investigated similar questions, [ 11 ] .

2.1 Interpolation on Special Configurations .

In this section we distinguish between methods of interpolation which are applicable for any

distinct points x , ... ,xn, scattered data interpolation and those which apply for special choices of

x°, ... , xn. For simplicity of presentation we restrict ourselves to R2. All our discussion extends

to IRJ, the only difference being in notational complexity.

There are many methods for interpolation in the plane at special points. Probably, the simplest

is tensor product interpolation on a rectangular grid .

Theorem 7. Given Chebyshev systems {uQ(x), ... , «„(*)}, x e (a,b) and {v0(y), ... , vm(y)},y e (c^),

and any points a < XQ < ••• < x„ < b, c <y0 < ••• <ym<d. Then for any data n m

yu, i = 0, ... n,j = 0,1, ... m there is a unique function fixy) = 1 1 dtp£x)vf(y) such that / 1=0 y'=0 J J

Ax^y,) =ytJ, i = 0, 1, ... ,n y = 0, 1, ... ,m.

The proof is straightforward. However, the importance of this result should not be underesti­

mated. It provides a computational useful way to interpolate data on a rectangular grid and is

often used in applications (oil wells and mineral deposits are found with this method). Abstractly,

the interpolant is formed as a tensor product of univariate interpolation. Thus if (Ug)(x) is the

92 CHARLES A. MICCHELLI

unique interpolant to g at XQ, ... ,xn from {w0, ... ,un] and (Vg)(x) is similarly defined, then the

interpolant above is the tensor product of U and V. For example, tensor product polynomial in­

terpolation has the Lagrange representation

/=0y= 0

where

z^x) = -f , mAy) = -t

{x - xt)u (*,-) y {x - yj)v (y)

<*{x) = UQ(X - xj), vO) = nQ(y - yt)

Our next observation provides a set of points for which interpolation by u„(IR ) is solvable.

Theorem 8. Let i0, £u ... , /„ be distinct parallel lines and x'°, ... ,xlyl distinct points on /,-. Then

for any dataj>l0, ... ,yttl, i = 0, 1, ... , n there is a unique/? e TTW(IR ) such that

P(*J)=yij • y'= 0,1, . . . , / , i = 0 , l f . . . . it

Proof. Bezout's theorem says that if f , g are polynomials of degrees n,m and intersect only at

points, they have at most nm simultaneous solutions, [ 20 ] . Thus if p e TT„(IR ) vanishes at

x1,0, ... ,x1^ i = 1, ... , n it must be a multiple of £lt ... , £n. (Actually, in this case we can see di­

rectly that any polynomial of degree < n which vanishes at n + 1 points on a line must contain

that line as a factor.) Since it also vanishes at x ' it must be identically zero.

The Lagrange representation of p can be obtained inductively as follows: We let L be the

Lagrange interpolant to the data along £„. Then

p = L + £rtq

yiJ0 - L(x' ) where q is an interpolant to on £„ i = 0, ... , n — 1 and q e ir^OR ).

Another result of this type is the following interpolation scheme which came up in the study

of multivariate B-splines [ 10 ] . We will later give a "dual" version of this method.

ALGEBRAIC ASPECTS OF INTERPOLATION 93

Theorem 9. Let £0, ... , fB+1 be lines such that each pair of lines intersect at a point and every such

point lies on exactly two lines. If x1 xN are the points of intersection, iV = n(n + l ) / 2 then

given any data j ^ , ... ,yN there is a uniquep e ir^flR ) such thatp(x) = yit i = 1, ... , N.

The Lagrange polynomials for this interpolation method are easy to obtain. Suppose xl,j is the

intersection of £it ij then

II EAx)

"Vv

'A n ik{x)

In the remaining sections we consider methods for interpolation of scattered data.

2.2. Optimal Interpolation.

Here is a general procedure for interpolation of scattered data: We take a linear space X of

functions on some set DeR* and a semi-norm on X. As our interpolant, we choose/opt e A'such

that fopt(x) = yt, i = 0, 1, ... , n, [x°t ... , x"} £D, and

l/opt I < l / l

for all fe Xv/ithf(x) = # , i = 0, 1, ... , n.

For this to be a computationally viable method, computing /opt should not be prohibitively

expensive. This suggests (at the least) using Hilbert space semi-norms. Here is a well-known ex­

ample of this type called natural spline interpolation.

Theorem 10. Let XQ<X^ < ••• <xn, n > m — l , m > 2 , then there is a unique function /opt in

H^fo.xJ = { / : / , . . . , / m l ) , absolutely continuous on [*b,xj, fm) e L2[xo,x„]} such that

ZoptW = ft* i = 0, 1, ... , n, and

fXVoptW)2^< fXn(fim\x))2dx

for a l l / e WTxb, xn] with/(;*:,) = y„ i' = 0, 1, ... , n. Moreover, /opt is determined by the equations

94 CHARLE S A . MICCHELL I

/opt(*o) = /opt(**) = °> i = m, ... , 2m - 1

/optW = J*. '' = 0. 1. ••• . *

V I (xt,x.+1) e W 2m-L ' = 0, 1, . . . , rt - 1

fopt e C m " 2 ( - oc, oo).

Proof. Integration by parts can be used to show that

f*"/<ptWmW=o

whenever g e W1 and g(x,) = 0, i = 0, 1, ... , «. This equation leads both to the existence,

uniqueness and minimality of fopt.

The next result extends Theorem 10 to higher dimensions, [ 12 ] , see also [ 32 ] .

Theorem 11. Let W?(1R0, m > s/1 have norm

l/l!2= S f f m ) I J>°/W l 2 ^ | a | - m ^ V a /

j

a = (alf ... , a,), | a | = X | a, | . Assume that if r e w^GR') /•(*') = 0, i = 0, 1, ... , n it fol­

lows that r = 0. Then there is a unique/opt e W^OR') such that fopl(x') = y, / = 0, 1, ... , «, and

l/opt« < H/ll

for a l l / e W%(1R.') with/(jc') = y, i: = 0, 1, ... , n. Moreover, /opt is given by

/opt(*) = PW + X «i*(* - *')./> e Wjn-lOR') /=0

for some p € 7rm_i (MO where

-I 1 x 1 log I x ||, n even <*>(*)= i 2m-n

he , n odd ,

1 x 1 is the Euclidean norm of x and

£ afl{x) = 0, i -0

ALGEBRAIC ASPECTS OF INTERPOLATION 95

f o r a l l ^ ^ O R ' ) .

The proof of this result uses the fact that 4>(x) is the fundamental function for the 2m-th iter­

ate of the Laplace operator

2m A <f>(x) = c8(x), c#0.

The special case s = m = 2, is called thin plate spline interpolation and it has been used for

practical data fitting. Grimson, [ 2 1 ] gives compelling reasons for the use of this interpolant in the

computational theory of interpolating visual surface information. In this case, the semi-norm is

f JdL + 2& + &»ty-

The computational problem of determining/optis studied in [ 14 ] . It is suggested there that

the appropriate linear system be preconditioned and a Richardson type iteration used, see also

[21].

Optimal interpolation gives some justification for choosing one interpolant among all possible

interpolating functions. However, to make use of such methods we are led to difficult optimiza­

tion problems. Thus it would be worthwhile to have an alternative view of Theorem 11 which

suggests other methods of interpolation. To this end, let us recall that the thin plate spline

interpolant has the form

/opt(*) = ^oto + 2 °i A* - *' II log 1* - *' I i -0

for some linear function i0 which is determined by the equations

(8) 2 ati(x) = 0, t e W(R ), i-0

/oPt(*') =yv '" = 0 ' i. — 'w-

The existence and uniqueness of this function is established by using the Hilbert space struc­

ture of extremal problem described in Theorem 11. When the points x , ... ,xn are not collinear

then the existence of a unique solution of the above equation also follows just from the fact that

96 CHARLE S A . MICCHELL I

n n

£ 2 aflj\x-J\\og \\x-xJ\\ > 0 »-0 y-0

whenever (8) holds and a = (OQ, ... , an) & 0. Of course, this property of conditional positive

definiteness, see [ 18 ] , satisfied by <f>(x) is directly linked to the extremal problem. Nevertheless,

as it has an independent formulation it provides us with a matrix theoretic means to study other

methods for multivariate interpolation, [ 35 ] . In particular, multiquadric surfaces (MQS) [ 25 ]

described in the next theorem can also be treated from this point of view.

Theorem 12. Given any distinct points x , ... ,xn e RJ and data yu ... ,yn there exists a unique

function of the form

such that

fix) = yit i = 1, ... ,/!.

Proof. We will actually show more, namely, that the matrix

y f = ( \ / l + U ' - ^ | 2 ) W . 1 n

has n-1 negative eigenvalues and one positive eigenvalue. We prove this by first observing that

A has at least one positive eigenvalue since

n n

\n = m a j ^ x , x) > Y, £ Aij > °

i - l y - 1

where Xx < • • • < Xn are the eigenvalues of A. We show that \n_t < 0 by noticing

n n n

(9) £ £ At/Vj<0, if £ «/ = 0 i - l y - 1 i - l

since it then follows that

X l l_ 1<nM« i {Aa,a)<0, e = ( l , ... , 1).

(fl,e)-0

ALGEBRAIC ASPECTS OF INTERPOLATION 97

To prove (9) we use the formula

00

(10) Sx = 1 + -4=" f t~V2{\ - e~*)dtt x>0. 2/ i T •>()

Substituting 1 + I*' - xJ | 2 into (10) gives us

£ £ a + wx-xjwv%= « - l y - l

1 f " -3/2 - / A A - n u ' - ^ i i 2

when X a, = 0. Since (e~',x ~ ' ),,;=!,..,„ is a strictly positive definite matrix for / > 0, the result

follows. For numerical experiments with the MQS method see [ 16 ]

The proof used for Theorem 12 leads to other results on scattered data interpolation which

includes the optimal interpolation method described in Theorem 11 see [ 35 ] .This method also

yields the result that the matrix (fix1— xJ ||a), ;f«it..., „, has one positive eigenvalue and n-1 nega­

tive eigenvalues independent of * \ ... ,x" e JR.3 when 0 < a < 2. For s = l , the number of positive

and negative eigenvalues of this matrix are still independent of xlt ... ,xn (scalars in this case) for

all a > 0, [ 13 ] . However, even for a = 3 s = 2 , n = 4 , the number of positive and negative

eigenvalues does depend on the locations of JC1, ... , xn, [ 2 ] .

2.3. Radon Transform and Interpolation .

In this last section we depart from our discussion of methods of interpolation and treat proce­

dures which use more than function values. In particular, we point out ways to pin down those

extra degrees of freedom in scattered data interpolation by polynomials described in Theorem 6.

We state the following result from Kergin [ 29 ] .

Theorem 13. Given any integers st n > 0 and points x , ... , xn e IR5, not necessarily distinct. There

is a unique map •#f:C'l(IR,)-*Tn(IR') satisfying :

(i) Jf is linear.

98 CHARLES A. MICCHELLI

(ii) for every / e COR')* every (homogeneous) polynomial q e hk(JR.s), 0 < k < n and every

/ c { 0 , 1, ... , n] with | / | = k + 1 there exists x e [{xJ:j e J]] ( = convex hull of

{xJ:j e J}) such that

q{D){je{f)-f){x) = 0.

This theorem extends to IR'the following mean value property of Hermite interpolation: for

every j, H{j)(f)(x) agrees with f})(x) at some x e [XQ, JCJ, XQ < ••• <xn,

Notice that by choosing k=0 in Theorem 12 we get

#V)(x)=flx\ i = 0, 1 #t

and if x° is repeated i times then all derivatives of order t — 1 of f are matched at XQ by

&f(f). However, only in exceptional cases, s = l or x = ••• = xn, do these conditions determine

2fC{f). The remaining functionals which determine this polynomial are obtained as follows. We

define

then [{xJ:j e J]]q(D)f, q e h{Jl _t(IR0 are all invariant under Ctf{f). The number of linearly inde­

pendent such functionals was shown by Kergin to be dim wm(lR')f [ 29 ] . This was the way he

proved Theorem 13. In particular, when s=2 and *0, ... , xn are in general position, that is , any

three points are noncollinear, the linear functionals which determine 3€(f) are point evaluations,

f(xJ),j = 0 , 1, ... , n and line integrals between pairs of points, itj = [x, x*], i ^ j , of the derivative

of f in the direction normal to L,

J i . dntj'

Thus 36 picks out in this fashion a unique polynomial in 7r„(]R/) which interpolates f at

x , ... ,xn. Why is this a natural interpolation method and what is the algebraic foundation behind

its existence? The answer is that 3C{f \ x , ... ,xn) and H{f \ t0, ... , O.are closely connected.

One can actually define Jif by the property that

(11) &{fx I x°,...,xn)W = H{g | \.x°,...,\.xn)(\.x)

ALGEBRAIC ASPECTS OF INTERPOLATION 99

whenever X e R ' , / X t o = gft •* )>££ CO*1)- T h i s equation brings to mind the Radon transform,

[ 24 ] . Recall that the Radon transform (RJ)(0, IX | « 1, / > 0 of / e l?(&) is given by

(*x/)M = f /W<K.

Jmx is Lebesgue measure on X • x = /. A distributional definition takes the form

f , g{W)J){t)dt = f g(X . x)f(x)dxtg e CoiR1). JR JJRS

Thus, specializing equation ( l l ) t o j = « + l , x = 0, (*'), = 8^, i,y = 0, 1, ... , /i we get

^ ( / x I e°, ... , / ) ( 0 ) = H(g | X0, ... , X„)(0).

This means that the Radon transform of the distribution g-+H(g | X0, ... , X„)(0) is the distrib­

ution f-+£f?{f | e , ... , en)(0). The connection of Radon transform to multivariate interpolation

is developed further in [ 4-6 ] . We use the terminology of these papers and say £tf lifts H because

equation (11) holds. This equation suggests the following constructive approach to Theorem 13,

[36]. Observe that since

(k) v q{D)fx{x) = q{\)gV \ \ • *), q e hk(R )

we can verify that

(12) *if\x . . . . . *") (*) = ] £ [x ,...,x]Dx_xo...Dxx,-if. i=0

{Dyf = y • If) because the right hand side becomes the Newton form of

H(g | x • x°, ... , X . xn)(\ . JC) when f{x) = g(X . x).

As is well-known, certain compatibility conditions must be satisfied for a function to be in the

range of the Radon transform, [ 24 ] . In the present case, this is a property previously mentioned

about Hermite interpolation. Namely, whenever p is a polynomial, H{p | X0, ... ,XJ(0) is a

polynomial in X0, ... , X„. This is the algebraic explanation behind Theorem 13. As such, it is a

guide to other maps which can be lifted. For instance, it clearly holds for the family of mappings

Hfe I /0. ... , /„)(/) = H{n(g{-° | /0, ... , *„)(/), 0 < / < n.

Notice that when / = 1, /0 < • • • < / „ , # , is the unique polynomial of degree < n - 1 such that

100 CHARLE S A . MICCHELL I

Hl(g\t0,..., tn)(t)dt=J g{t)dt, i = 0 , l , . . . , n - l ,

that is, H is "area matching". It was shown in [ 6 ] , that Ht can be lifted and for any i in

[8,19,26 ] . The form of the lifted map depends on the dimension to which Ht is lifted . A partic­

ularly nice case occurs when s = £ + 1, which was introduced in [ 23 ] . Hakopian showed that

&! on IR/+1 is uniquely determined by matching the integrals

[{xJ:j el}]f, V | / | = / .

In particular, for s=2 and £ = 1 we see that «#V*e ^„_i(IR2) matches all line integrals of f

formed by pairs of distinct points chosen from x , ... , xn. This polynomial is in a sense dual to the

interpolant of Theorem 9. We also mention that the Lagrange representation for 3tt on IR'

when £ + 1 > s is identified in [5 ] . The Lagrange polynomials in this case are ridge functions.

It is interesting to note that certain multivariate splines come from lifting distributions. For

instance, the multivariate B- spline , M(x \ x°, ... , xn) is a distribution defined by

f M{x \x°, ... ,xn)f(x)dx= f / ( V opcSda^ ... ,dan. JtiC J Sn 0

As the name suggests, when A/( • | x , ... ,xn) is a function it can be shown to be a piecewise

polynomial which is the natural extension of the univariate B-spline , [ 9 ] , see also equation (6).

In general, for any polyhedral set CQlR." and JC1,.... , / e IR* the corresponding polyhedral spline

^c( • I * , . . . , * " ) is defined by

ff(x)Pc(x | x\ ... , A i x = J f(Z «V*W . - ,don. c i—1

When C = R" we obtain 1\x \ x1, ... fxn), the truncated power and C = [0, 1]" gives the box

spline. Detailed properties of these spline functions are given in [ 9 ] , see also the lecture of K.

Hollig.

BIBLIOGRAPHY

1. N. I. Achieser, Theory of Approximation , Frederick Ungar Publishing Co., New York, 1956.

ALGEBRAIC ASPECTS OF INTERPOLATION

2. L. P.Bos and K. Salkauskas , On the matrix [ I *,- - Xj | 3 ] and the cubic spline continuity equations , to appear JAT.

3. C. de Boor, A Practical Guide to Splines , Springer Verlag, Berlin- Heidelberg, 1978.

4. A. S. Cavaretta, C. A. Micchelli, and A. Sharma, Multivariate interpolation and the Radon transform, Math. Z., 174 (1980), 263-279.

5. A. S. Cavaretta, Jr., T. N. T. Goodman, C. A. Micchelli and A. Sharma, Multivariate in­terpolation and the Radon transform, Part III: Lagrange representation, in Canadian Math­ematical Society Conference Proceeding, 3 (1983) 37-50.

6. A. S. Cavaretta, C. A. Micchelli, and A. Sharma, Multivariate interpolation and the Radon transform, Part II: Some further examples in Quantative Approximation, eds. R. De Vore, K. Scherer, Academic Press, New York 1980, 49-62.

7. Cooley, J.W. and J. W. Tukey, An algorithm for the machine calculation of complex Fourier series, Math. Comp. 19 (1965), 297-310.

8. W. Dahmen and C. A. Micchelli, On the linear independence of multivariate B-splines II: complete configuration, Math. Comp., 41 (1983) 143-163.

9. W. Dahmen and C. A. Micchelli, Recent progress in multivariate splines, in Approximation Theory IV, eds. C. K. Chui, L. L. Schumaker, J. W. Ward, Academic Press, New York, 1983, 27-121.

10. W. Dahmen and C. A. Micchelli, On the limits of multivariate B-splines, J. d'Analyse Math., 39(1981), 156-178.

11. D.Donoho and I. Johnstone, Projection- based smoothing and a duality with kernel methods, Department of Statistics, Report # 238. Stanford University, 1985.

12. Duchon, J., Splines minimizing rotation - invariant semi-norms in Sobolev spaces, Construc­tive Theory of Functions of Several Variables,, Lecture Notes in Mathematics 571 eds. W. Schempp and K. Zeller, Springer, Berlin-Heidelberg, 1977, 85-100.

13. N. Dyn, T. N. T. Goodman and C. A. Micchelli, Positive powers of certain conditionally negative definite matrices, IBM Research Report, #11202, 1985.

14. N. Dyn, D. Levin, and S. Rippa, Surface interpolation and smoothing by "Thin Plate Splines", Approximation Theory IV, eds. C. K. Chui, L. L. Schumaker, J. D. Ward, Academic Press, New York, 445-449

15. S. Fisher, C. A. Micchelli, Optimal sampling of holomorphic functions, Amer. J. Math, 106 (1984), 593-609.

16. Franke, R. Scattered data interpolation: tests of some methods, Math. Comp., 38(1982), 181-200.

17. Gautschi, W. Attenuation factors in practical Fourier analysis, Numer. Math. 18(1972), 373-400.

18. I. M. Gel'fand and M. I. Graev, N. Y. Vilenkin, Generalized Functions, Vol. 4 , Academic Press, New York, 1965.

19. T. N. T. Goodman, Interpolation in minimum semi-norm and multivariate B-splines, JAT, 37 (1983), 212-223.

20. Phillip Griffiths and Joseph Harris, Principles of Algebraic Geometry, John Wiley and Sons, New York, 1978.

21. W.E.L. Grimson, From Images to Surfaces, a Computational Study of the Human Early Visual System M.I.T. Press, Boston, 1981.

22. M. H. Gutknecht, Two applications of periodic splines, in Approximation Theory III, ed. E. W. Cheney, Academic Press, New York, 1980, 467-472.

23. H. Hakopian, Multivariate divided differences and multivariate interpolation of Lagrange and Hermite type, JAT,34 (1982), 286-305.

102 CHARLES A„ MICCHELLI

24. Helgason, S.,The Radon Transform, Birkhauser, Basel 1980.

25. R. L. Hardy, Multiquadratic equations of topography and other irregular surfaces, J. Geophys. Res. C. (1971).

26. K. Hollig and C. A. Micchelli, Divided differences, hyperbolic equations and lifting distrib­utions, IBM Research Report,#l 1133, 1985.

27. S. Karlin and W. J. Studden, Tchebycheff Systems: with Applications in Analysis and Statistics,

Interscience, New York, 1966.

28. S. Karlin, Total Positivity, Stanford University Press, Stanford, 1968.

29. P. Kergin, A natural interpolation of C* - functions, JAT,19 (1980), 278-293. 30. M. G. Krein, The ideas of P. L Chebyshev and A. A. Markoff in theory of limiting values of

integrals and their further developoments, AMS Transl. Ser. 2, 12 (1951) , 1-122.

31. Mairhuber, J., On Haar's theorem concerning Chebyshev approximation problems having a unique solution, PAMS 7 (1956), 609-615.

32. Meinguet, J., An intrinsic approach to multivariate spline intepolation at arbitrary points, Polynomial and Spline Approximation, ed. B. N. Sahney, D. Reidel, Dordrecht, 1979, 163-190.

33. C.A.Micchelli and TJ.Rivlin, Lectures in Optimal Recovery, Lectures Notes in Mathematics 1129, Springer- Verlag, Berlin - Heidelberg, 1985.

34. Micchelli, C. A., T. J. Rivlin and S. Winograd, The optimal recovery of smooth functions, Numer. Math., 26(1976), 191-200.

35. C. A. Micchelli, Interpolation of scattered data: distance matrices and conditionally positive definite functions, IBM Research Report, 1984, to appear in Constructive Approximation.

36. C. A. Micchelli, A constructive approach to Kergin interpolation, Rocky Mountain Journal of Mathematics, 10 (1980), 485-497.

37. Saskin, Ju. A., Interpolation families of functions and imbeddings of sets in Euclidean and projective spaces (Russian) Dokl. Akad. Math SSSSR 174 (1967), 1030-1032, Soviet Math. Dokl. 8 (1967) 722-725.

38. I. J. Schoenberg and A. Whitney, One Polya frequency functions, HI: The positivity of translation determinants with application to the interpolation problem by spline curves, TAMS, 74 (1953), 146-259.

39. J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, Springer-Verlag, Berlin-Heildeberg, 1980.

Proceedings of Symposia in Applied Mathematics Volume 36, 1986

MULTIVARIATE SPLINES

Klaus Hollig1

In this lecture the construction of multivariate splines on triangular meshes via multi­

variate B-splines is described. B-splines in several variables can be defined geometrically, as

volume densities of convex polyhedra. From this general definition smoothness properties and

recurrence relations are derived. The B-splines corresponding to simplices and parallelepipeds

give rise to natural generalizations of univariate splines. For both cases it is shown how linear

combinations of B-splines have to be selected to yield a smooth spline space which admits

a local representation of polynomials. This yields the standard approximation properties for

piecewise polynomials familiar from univariate theory.

For simplex splines, the underlying mesh can be chosen almost arbitrarily while maximal

smoothness is preserved. While this is a definite advantage over tensor products, new ideas

are still needed to overcome computational difficulties resulting from the fairly complicated

structure of the mesh. Box splines are defined on regular (triangular) meshes. Therefore, many

of the advantages of tensor products and Bezier representations are maintained. In particular,

efficient algorithms based on subdivision techniques have been developed and this has led to

application of box spline methods in computer aided design.

1980 Mathematics Subject Classification 41A15 1 Supported by International Business Machines Corporation and National Science Foundation

Grant No. DMS-8351187

Sponsored by the United States Army under contract DAAG29-8Q-C-0041

© 1986 American Mathematical Society 0160-7634/86 $1.00 + $.25 per page

103

http://dx.doi.org/10.1090/psapm/036/864368

104 KLAUS HOLLIG

Mul t iva r i a t e B-Splines

There are several equivalent ways of defining the univariate B-spline B(-\t0, • • • ,tn)- Per­

haps the least common approach would be to use a variant of the Hermite-Genocchi formula

[ B(x\t0,...,tn)<j>(x)dx = n\ f ^ ( ] ^A( i / ) t „ ) d\{l)...d\{n) (l)

or the geometric interpretation of the B-spline due to Curry and Schoenberg,

B{z\t0, • . •, tn) = voln_x(T n ({x} x JR^^J /vo ln fT) . (2)

Here, a(n) := {(A(l ) , . . . , A(n)) : \(v) > 0, J22=o^(u) = *} 1S t n e ^-simplex with vertices

e0 = (0 , . . . ,0), ei — ( 1 ,0 , . . . ,0), . . . , en = (0 , . . . ,0,1) and T is an n-simplex for which the

first component of each vertex coincides with one of the knots tv. Both of the above identities

admit a natural generalization to several variables.

Definition 1 [BH82]. For n > m denote by P : IRn —* IRm the canonical projection and

let Q C IRn be a convex polyhedron with affine dimension m -f k. The multivariate B-spline

B is the linear functional defined by

< £,<£> := [ (j>oP, 0 e C o ( I R m ) , (3) JQ

where the integral is taken with respect to ( m + A;)-dimensional measure. If volm(PQ) > 0, B

can be identified with the bounded function

B(x):=yo\k(Qnp-1x), (4)

i.e. B(x) is the fc-dimensional volume of the cross-section of Q which is projected onto x (cf.

Figure 1).

The equivalence of (3) and (4) follows from Fubini's Theorem since, if volm(PQ) > 0, the

right-hand side of (3) can be written as

/ dx(f <t>(x)dy) = I vo\k{Qop-1x)dx. JPQ JQnP-^x JPQ

Strictly speaking, the pointwise definition (3) is valid only for almost every x (in the sense

of Lebesgue measure). In the univariate case this difficulty is less apparent since a consistent

definition of B at discontinuities is possible, e.g. all B-splines are assumed to be continuous

from the right. In several variables there does not seem to exist a simple convention which

is compatible with the recurrence relations of Theorem 1 below. However, if B is continuous,

which is the case of practical interest, the problem does not arise.

MULTIVARIATE SPLINES 105

Rk

B(x)

< Figure 1 >

The geometric definition (4) is essentially due to de Boor [B76] who considered the special

case when Q is a simplex. The usefulness of the analytical definition (3) for analyzing simplex

splines was discovered by Micchelli [M80] which finally led to the general definition given in

[BH82].

It is obvious that B is nonnegative (as a functional on C0(IRm)) with support equal to

PQ. If volm(P<?) > 0, it follows from Theorem 1 below that B is a polynomial of degree < A:

on any subset of IRm which is not intersected by the projection of any (m - 1) dimensional

face of Q. Theorem 1 also implies that B is (r - 1) times continuously differentiate where r

is the smallest integer for which an (m -f k - 1 - r)-dimensional face of T is projected by P

into an (m - 1) dimensional set.

Denote by D^ the derivative in the direction f, i.e. (Dtf) := £ „ £{v)du<t> where dv is

the derivative with respect to the z/-th variable. Moreover, denote by Qi the (m + k - 1)-

dimensional faces making up the boundary of Q, by r)x the corresponding outward normals

and by JB; the B-splines corresponding to the polyhedra Qx (cf. Figure 2).

106 KLAU S HOLLI G

Theorem 1 [BH82]. Assume that voln(Q) > 0, i.e. that k = n — m.

(i) For any z e IRn,

Dp,B = -Y,(z'Vi)Bi. i

(ii) For all points x — Pz where B and B{ are continuous,

kB{x) = ^2{{bi-Z)-r,i)Bi(x) i

where bi is any point in the hyperplane containing Q%.

The assumption that the polyhedron Q is nondegenerate is not essential. If k < n — m,

the affine hull of Q can be identified with IRm + and the Theorem applies.

< Figure 2 >

A repeated application of Theorem 1 yields that, for £ £ IRm, (D^)rB is a linear combi­

nation of B-splines corresponding to (m + k — r)-dimensional faces Q\. For r > k the supports

PQ\ of of these B-splines (interpreted as linear functionals in the sense of definition (3)) are

contained in hyperplanes. Therefore, B is a polynomial of degree < k on any region which is

not intersected by any of the sets PQ% + 1 (since all (k + l)-th order derivatives of B vanish on

such a region).

If volm(P<5[) > 0, the B-spline corresponding to Q\ can be identified with a bounded

function. Therefore, if volm(P<2[) > 0 for all i, the derivatives of order r of B are bounded

which implies that B is (r - l) times continuously differentiate.

Proof of Theorem 1. The proof of (i) is immediate:

< DPzB,<t>>= - < B,DPz<t> >=- [ (DPz<j>)(Py)dy

= -J(Dz(<f>oP))(y)dy=-Y,J (z-m)HPy)dy

= - J2(z'^) < Bz,<p> .

MULTIVARIAT E SPLINE S 10 7

This uses the fact that, by definition, the derivative DtB is the linear functional given by

0 ,_» _ < B,Dt<f> > and that, by the chain rule,

Dz{4>oP) = (DPz<t>)oP.

Define (D<t>)(x) := (Dx(j>)(x). The recurrence relation (ii) is a consequence of the identity

DB = kB-^2{bi'fii)Bi. (5) i

With x = Pz it follows from (i) and (5) that

0=({D-DPz)B){x)

= kB[x) - J2ik • rn)Bi(x) + ] T ( z • m)Bi(x) i i

if B and B{ are continuous at x.

It remains to prove (5). By definition of D and the chain rule,

(D4>)(Py) = (DPy4>)(Py) = (Dy(<f>o P))(y) = (D(</>oP))(y). (6)

Denote by \u the i^-th coordinate function, i.e. Xv(x) = x(y). Then, integrating by parts and

using definition (3),

m

- <DB,4>>=- < J2x»dvB,4, >=< B,J2d4Xv<t>) >

= E [ (dAXu<f>)) o P = m [ ^ o P + r [ (Xvdu<P)°P v JQ JQ JQ

= m < £,<£> + / ( ^ ) o P , JQ

and similarly,

£ f du(xu{<t>oP)) = n<B,<p>+ f D(<f>oP). „=iJQ JQ

By (6), the last integral in the first identity equals the last integral in the second. Therefore,

n „

< DB,d>>= (n-m) < B,<f>> - JZ / du{Xu{<P° P)).

This proves (5) since, with r]{y) denoting the boundary normal of Q at t/,

E / #AxA<t>tP))= f (v(y) • y)4>(Py)dy v=lJQ JdQ

and, for y E Q%, v{y) = W a n d Vi • V is constant.

108 KLAU S HOLLI G

Multivariate splines are, by definition, linear combinations of B-splines. However, it is

not obvious how the B-splines should be selected to yield good approximation properties of

the resulting spline space. De Boor [B76] suggested the following geometric construction.

Definition 2. Let Q* C lRfc be a convex polyhedron and assume that the collection of

convex polyhedra {Q : Q e A} forms a partition of IRm x Q, and that \o\rn(PQ) > 0 for all

Q £ A. The spline functions corresponding to the partition A are defined by

5(A) := { £ aQBQ : aQ G IR} QeA

where BQ denotes the B-spline corresponding to the polyhedron Q.

(7)

< Figure 3 >

It is clear from (4) that the B-splines B Q , Q G A, form a partition of unity, i.e. that

£ > Q ( * ) = volfc(Q+) (8) Q

for all x where the B-splines are continuous. This implies that the spline spaces S(A) are

dense in continuous functions as the partition A is refined.

Proposition 1. Set h := max{diameter(<2) : Q G A} and choose XQ £ PQ. Then, for

any continous function / ,

Q

where || Ijoo denotes the L^ norm on IRm and uo is the modulus of continuity of / .

MULTIVARIATE SPLINES 109

Proof. By (8) and since the B-splines are nonnegative, we have for almost every x

\-f{x) ~ J2 / ( * Q ) * * ( * ) I = I E l / W - f(*Q))(BQ(x)/volk{Q.))\

BQ(x)^0

and for all Q for which BQ{X) is nonzero \x - XQ\ < h.

In this generality, little more can be said about the approximation properties of the

spline spaces 5(A) . However, a particularly rich theory results if Q is either a simplex or a

parallelepiped. This is due to the fact that in both cases the faces which make up the boundary

of Q are of the same type as Q itself.

Simplex Splines

Historically, the case when Q is a simplex has been considered first. Simplex splines were

defined by de Boor in [B76] generalizing the geometric interpretation of univariate B-splines

due to Curry and Schoenberg. Micchelli [M80] discovered the recurrence relations. Then,

the author [H82] and independently Dahmen and Micchelli [DM82] described an appropriate

choice for the space 5(A) which yields the approximation properties familiar from the uni­

variate theory. Subsequently many interesting results have been obtained and the reader is

referred to the survey article [DM84i].

Let U be a collection of not necessarily distinct vectors {u : u £ U} and denote by #£/

the number of vectors in U counting multiplicities and by \U] the convex hull of the vectors in

U.

Definition I S . Let [U] be a simplex in IRn (#U = n + 1) and denote by V := {v = Pu :

u E U} the projections of the vertices of \U\. The normalized simplex spline My is defined by

Mv := B{u]/yo\[U}. (9)

To justify this definition, one has to show that the right hand side of (9) does only depend

on the projections of the vertices v £ V. This follows from definition (3) by a change of

variables,

volft/j"1 / 4>(Py)dy= [ <t>{Y X(u)u)d\, (10)

where o(n) := {(A 0 , . . . , A„) : £ \{v) = 1, A(i>) > 0}.

110 KLAUS HOLLIG

T h e o r e m I S [M80]. Let V be a collection of n 4- 1 points in fltm which span a proper

convex set.

D^My = n Y^ \{v)MV\v

V

where V\v is obtained from V by decreasing the multiplicity of v by one (e.g. by deleting v if

this vector occurs only once in V).

(ii) If x — Ylvev ^(v)v with Ylvev ^(v) ~ 1 anc^ ^-y\v> v £ ^ ? a r e continuous at re, then

^W — L^n.W-

To derive this Theorem from Theorem 1, let [U] be a simplex in IRm with {v :— Pu : u E

U} = V. Set B := vol[C/] My and B^ := vol[l/\ti] My\v and denote the normal of the face

[U\u] by 7?u. Then, for any bu E [C7\t*],

(fc - «M -r, - i n vo\[U}/vo\[U\u}, if u = u'; l*« « j * u - j 0 > otherwise. (11)

< Figure 4 >

To prove (i), fix u' 6 1/ and set

* : = £ A ( I ; ) U = £ A ( « ) ( * - U ' )

using that the sum of the weights A(t>) is zero. By (11), for u ^ u',

-z-r}u = -\(v)(u -u')'fiu = n Hv) J ^ L y

MULTIVARIATE SPLINES 111

and similarly,

-z • r}u> = ~n ] T A(v)vol[l/]/voI[l/\ti'] = n \(v') vo\[U]

vol[l7\«']'

and (i) follows from the normalization of the simplex splines.

To prove (ii) we define z as before and note that

bu> - z — 2^, Hv)(bul - u)-

ueu

Again, by (11),

[bu, A(w')(6u' - u') • r)u> — n \(u')~ *1[U]

fwo\(U\u'Y

In view of the remarks following Theorem 1, the simplex spline is a piecewise polynomial

of degree < k = n — m which is (r — l) times continously differentiable where r is the smallest

integer for which (m + k — r) points from the "knot set" V lie in a hyperplane. Thus, if the

knots are in "general" position, My is (k - 1) times continuously differentiable.

Figure 5 below gives a few examples of knot sets and corresponding meshes for simplex

splines in two variables. While in some cases the structure of the mesh (i.e. the hyperplanes

where derivatives of My are discontinuous) is fairly complicated, this is no disadvantage in

itself since the explicit form of My on each of the subregions is not needed in computations.

C° - quadratic

C 1 - quadratic

C1 - cubic

< Figure 5 >

112 KLAU S HOLLI G

Example 1. Let \W] be a proper simplex in IRm with vertices {if : w G W} and denote

by {Qw(x)}wew the barycentric coordinates of x with respect to W, i.e.

X —

1 =

„(z)w

] T £«,(*)• uew

If the knot set V consists of the vertices of [W] with multiplicities a (if), if G W, then, up to a

normalizing factor, the simplex splines coincide on [W] with the polynomials in the Bernstein

form, i.e.

Mv(x) = —iH Qw(xyW/a(w)L (12) m!

wew

This is most easily seen by checking that the right side of (12) satisfies the recurrence relation

(ii) of Theorem IS.

In principle, simplex splines can be defined by (7) with Q* :— a(k) and BQ := vol [l/]My.

However, without further restrictions, the simplex splines My, \U] G A, need not be linearly

independent. Nevertheless, their linear span does contain all polynomials of degree < fc, and

this is the minimal requirement for good local approximation properties.

Theorem 2S [DM82, H82]. For f G ffi,m define the mapping

(x,y)i->Gz{x,y):={x,(l + t'x)y) : IRm x a(k) - • IRm x JRk.

If all simplex splines My are continous at rr, then

(i + e-^)fc= E cv(e)My(x) (is) \u)eA

where

c v ( 0 := (*!/n!) «tpn(t/) d e t | | G ^ | |

with det| |G^l/| | denoting the determinant of the ( n + l ) x ( n + l) matrix with columns

1

and sign{U) G { — 1,4-1} chosen so that cy(0) is positive.

Identity (13) is the multivariate analogue of Marsden's identity for univariate splines. As in

the univariate case, this identity is the basis for the construction of dual linear functionals and

local approximation schemes [DM82, H82]. In two variables the identity is due to Goodman

and Lee [GL81] who also obtained a more explicit formula for the simplex spline coefficients.

MULTIVARIATE SPLINES 113

Proof. For fixed x both sides of (13) are polynomials in £ and we may therefore assume

that ||f || is small. Small perturbations of the vertices do not change the combinatorial structure

of a triangulation. Moreover, Ge maps the hyperplanes which form the boundary of IRm x a

onto hyperplanes. Therefore, for fixed x and small £, the simplices

\GCU] := \{Gcu:ue U}}

form a partition of 0 := G^TR™ x <r(k)) in a neighborhood of x. This implies that (cf. Figure

6)

(x,(l + t-x)o(k)) = (x,m.k)n [) {G(U}. x€P\U\

Computing the volume on both sides of this identity it follows from (4) and (9) that

i ( l + e-*)*= £ voUflG^nP-1*) xeP[u]

£vol„[G{l/]Mv(x) which yields the Theorem.

x

< Figure 6 >

A drawback of definition (7) is that the spline space SM(A) is defined via a triangulation

in n dimensions while the simplex splines depend only on the knots in IRm. In [H82] a method

for constructing spline spaces from a triangulation of IRm was described. This construction is

a generalization of the process of "pulling apart" knots, i.e. of obtaining smooth splines as a

perturbation of piecewise polynomials without smoothness constraints.

Denote by

[Wi] := [w»(o),...,«>*(m)]> *€l,

the simplices of a triangulation A m of IRm with vertices W := {..., w_i , wo, i u i , . . . } . More­

over, assume that the vertices are consistently ordered, i.e. if

t'(iz) = t ' ( i / ) , i(/z) = i'{fir), with v < /x and i,i' G / ,

114 KLAUS HOLLIG

then

v' < / / .

with

Denote by T all "index" sets of the form

7=((a(0) , /9(0)) , . . . , (a (n) , /3(n)) )

ot(v) G {*(0),..., i{m)} for some i € I

0(i/) e {o , . . . , * } (14)

and

<*{v) < a ( i / + l ) , 0(i/) < 0 ( i / + l)

where one of the inequalities is strict.

As is indicated in Figure 7 below, the index sets 7 corresponding to a simplex [Wi] can

be identified with the ordered sequences of length n + l = m- f f c - f l from the set

{w»(o),...,ti;»(m)} x {0 , . . . , / c} .

i(0)

_ - 3 _ _ 1 <*>

id)

o

i(m)

< Figure 7 >

Definition 2S [H82]. Let F be a mapping from {..., - 1 , 0 , 1 , . . . } x {0 , . . . ,ife} to IRm

and denote by ^(7) the collection of vectors {F(e*(0),/3(0)),..., i r(a(n),/9(n))}. Assume that

the union of the sets [-^(7)] covers IRm, that the range of F has no limit point and that each

x G IRm in contained in at most finitely many of the sets [^(7)]- Then, the spline space

S(F,T) is defined as the linear span of the simplex splines MF^y 7 G T.

MULTIVARIATE SPLINES 115

Note, that the mapping F can be chosen almost arbitrarily, i.e. in analogy with the

univariate situation there is almost no restriction on the placement of the "knots" ^ (7 ) . How­

ever, there is no canonical choice for F which yields maximal smoothness or a well conditioned

simplex spline basis. This must still be viewed as one of the major drawbacks of these spline

spaces. On the other hand, for "almost all" choices of F , the space S(F, T) consists of piecewise

polynomials of smoothness k — 1 and degree k which is in general fairly difficult to achieve

with other constructions.

Example 2. Let W be a partition of IR, i.e. the "simplices" [Wi] are the intervals

[wijttft+i]. Define F by

F(a,P) := <a(*+i)_0

where {... , t_ i ,$o^i? • • •} is a n increasing sequence of knots. Since m — 1, the index sets 7

are of the simple form

7 - ( ( z , 0 ) , ( z , l ) , . . . , ( e , i ) , ( e + l , i ) , . . . , ( e - f 1,*))

where 0 < j < k and i is any integer. Thus F(i) consists of the k -f 2 consecutive knots

*i(A + l ) - j > * t ( * - H ) - i + l ? • • • > * ( i + l ) ( M - l ) - . 7

and therefore S(F,T) is the standard space of univariate splines. However, Definition 2S is

more general since the sequence of knots does not have to be monotone increasing.

Example 3. For the particular choice

F*{a,0):=wQ, T=(a,P)eT, (15)

S(F*, r ) consists of all piecewise polynomials of degree < k with respect to the triangulation

A m . This can be seen as follows. For F± defined by (15), the simplex splines which correspond

to different index sets i via (14) have disjoint support. Therefore, restricted to a simplex

[wi(o)'> - • • 5 wi(m)\ °f Am? the spline space 5(F,T) reduces to the linear span of Mp^\ where

F(i) = (wa(0), . . . ,Wc(n)) with a(u) e {t(0),. . . , t (m)} ,

i.e. the linear span of simplex splines with multiple knots. From Figure 7 it is clear that

all combinations of multiplicities occur and by Example 1 the corresponding simplex splines

coincide with the polynomials in the Bernstein form.

A small perturbation of the mapping F+ can be interpreted as "pulling apart" multiple

knots, i.e. as deforming the space of (nonsmooth) piecewise polynomials into a space of smooth

splines. However, Definition 2S allows arbitrary perturbations as long as the combinatorial

relationship between the knot sets is preserved.

116 KLAU S HOLLI G

Theorem 3 [H82]. With

U := {(F(a(0)J(0)),em),..., (F(a(n),/3(n)),C / 3 ( n ))}

and \U] e A replaced by 7 6 I \ Theorem 2S remains valid for the spline space S(F,T).

The proof of this result is based on the fact that the Fourier transform of the right hand

side of identity (13) is an entire function of the knots. Therefore, if the identity holds for small

perturbations of the knots, it remains valid globally.

Under additional assumptions on F , the linear independence of the simplex splines Mp^)-,

7 £ r , can be established. Moreover, the standard error estimates are valid for simplex splines.

The practical implementation of algorithms for computing with simplex splines still seems to

be the major unsolved problem. However, one might think that, as for box splines, new

algorithms based on subdivision techniques can be developed.

Box Splines

The other natural choice for Q in Definitions (3,4) is a parallelepiped, and this leads to

the definition of box splines. These splines have been introduced by de Boor and DeVore

in [BD83] and their basic properties were studied in [BH82/3]. Box splines can be viewed

as generalizations of univariate cardinal splines. A variety of results on interpolation opera­

tors [BHR85], combinatorial problems [DM85] and smooth piecewise polynomials on regular

meshes [BH83i,2] have been obtained. Moreover, efficient algorithms for manipulating box

spline surfaces have been developed [B683, CLR84, DM842, P83/84] which is the basis for

applying box spline techniques to computer aided design.

Definition IB [BD82, BH82/3]. Denote by \U\ the parallelepiped in IRn which is

spanned by the vectors {u : u G U}, i.e.

I l / ] : = { ^ A ( « ) t t : 0 < A ( u ) < l } u€U

and jfU = n. The corresponding normalized box spline is defined as

JVV := Bmlvo\\V\ (16)

where V := {t; := Pu : u € £/}.

As for the simplex spline, the right hand side of (16) does only depend on the projections

of the vectors in U and

< Nv,4> >= volp}-1 [ 4>{Py)dy= [ 4>[T, Hv)v)d\. (17)

MULTIVARIAT E SPLINE S 117

By the remarks following Theorem 1, Ny is a piecewise polynomial of degree k — n — ra

which is (r — l) times continuously differentiable where r is the smallest integer for which

(ra + k — r — l) of the vectors in V do not span IRm. In contrast to the simplex spline the

mesh for Ny is quite regular. It consists of translates of hyperplanes which are spanned by

(ra — 1) linearly independent vectors in V. Figure 8 below shows a few examples of meshes for

bivariate box splines.

C°-linear

C l -quadratic

C2-quart ic

< Figure 8 >

Example 4. (i) If ra = 1 and v = 1 for all v £ V, Ny is the forward cardinal B-spline

B(-\0,..., k -f 1). To see this let [£/] be a parallelepiped with v — Pu — 1 for all u € U and

consider the standard triangulation of [17] into n! simplices [Uu] with equal volume. For all

simplices \UV] the projections of the vertices are the integers 0 , 1 , . . . , k + 1. Therefore,

BW\ = E BW') = ( £ ^ D M{o,i *+D = vol|r/] B(-|0, . . . , * + 1). V V

(ii) If V consists of the unit vectors c i , . . . , cm with multiplicities a ( l ) , . . . , a(ra) respectively,

then iVV coincides with the tensor product B-spline with equally spaced knots,

m

( i( l) , . . .,x[m)) -» J ] B«v) °> • • •.«("))•

This could be verified directly from (17), but is more easily seen from formula (19) below for

the Fourier transform of Ny.

118 KLAU S HOLLI G

(iii) For m = 2 and V = {(1,0), (1,1), (0, l )} , Ny is the standard linear finite element. Adding

the vector (1 , -1) to V, one obtains the quadratic element which has been independently

derived by Zwart [Z73], Powell and Sabin [PS77]. Further examples can be found in the work

of Frederickson [F71].

Theo rem I B [BH82/3], Let V be a collection of n vectors which span IRm.

0) ^ e = Zvev Hv)v, then

DtNy = ] T \(v){Nv\v - NV\V(. - v)). v

(ii) If x — Ylvev ^{v)v anc^ the DOX splines AV\V, v G V, are continuous at z, then

^ (*)= ; r ^ £(M»)*v\t,(*) + (i - H"))*v\v(x - v)). n m v

The recurrence relation (i) has a particularly simple form if £ = v for some t; G V. Then,

DVNV = VVNV\V,

where (Vvf)(x) := f(x)-f(x — v) is the backward difference operator. With Dw '-— \[wew

D,„

and Vw := HwEW V w , this yields

DWNV = VWNV\W.

In particular,

DVNV = Vv6, where S denotes point evaluation at 0, i.e. < S^<j> >:= 0(0). Therefore,

/ NvDv<t>=(Av<f>){0), (18) Jmr"

which gives an integral representation for the forward difference operator Ay in terms of the

box spline Ny.

MULTIVARIATE SPLINES 119

The derivation of the recurrence relations is almost identical with the proof of Theorem

IS. Let \U] be a parallelepiped in IRn for which V = {v := Pu : u G U} and apply Theorem 1

with Q :— [£/] and B := vol [l/J Ny. The boundary faces of \U] consist of the parallelepipeds

|£/\tfc] and their translates tt + [£/\u] with normals r\u and -r\u respectively. The corresponding

box splines are B^j\u^ = vol[£/\u] Nv\v and Bu^V\uj = vol[£/\u] Nv\v(- - v) (cf. Figure

9).

u + IU\ul

< Figure 9 >

To prove (i), set z = J^ueu M t ,)u- Then, since the vectors u, u / tt', span the boundary

face \U\u%

-z-r\u< — - 2 J X(v)u • r)u> = -X(v')u' • r)u> = A(v v \o\\U\u'l

and the assertion follows from the normalization of the box spline.

To prove (ii), define z as before and choose the points bu in the boundary faces [C/\u] and

u + lU\u] as 0 and u respectively. Then,

yo\\u] (0-z)-rju = \(v)

vo\\U\uY and

(u - z) • ( - I J « ) = (1 - X(v))u • {-r,u) = (1 - A(t/))

r) in (17), one

*v(y)= II

vol ! 'vol[ l / \ t t ] '

Setting <f>(x) = exp (—iy • x) in (17), one sees that the Fourier transform of Ny is 1 — exp { — iy • v)

vGV zt/ • t>

(19)

From this it follows that

Nyuv = Nv * Ny (20)

where / * g(x) := / / ( z — y)g(y)dy denotes the convolution of / and g. In particular, if V

consists of a single vector £,

tfvu*(*)= / 7Vv(x-A£) rfA. (20') ./o

120 KLAUS HOLLIG

This identity provides an alternative definition for Ny via repeated "averaging" in the

direction of the vectors v G V.

Definition 2B [BD83, BH82/3]. Assume that the vectors in V have integer coordinates

and that V contains the unit vectors c i , . . . , cn . The space of cardinal splines corresponding

to V is defined as

S(V) := { £ a,Nv (• - j) : a,- € 1R} (21) j '€2ZT O

where 2Z denotes the integers.

Definition 2B is a special case of Definition 1 with Q* :— [0, l ] n _ m and the partition A

consisting of translates of the parallelepiped which is spanned by the vectors (ei, 0 ) , . . . , (em , 0)

and ( t ; m +i , e m + 1 ) , . . . , ( t ; n , e n ) where V = {eu . . . , c m , v m + 1 , . . . , v n } . The assumption that V

contains the unit vectors is no loss of generality since this can always be achieved by a change

of variables. However, for the proof of Theorem 2B below it is essential that all vectors v are

chosen from a common lattice which, again by a change of variables, can assumed to be the

lattice of vectors with integer coefficients.

T h e o r e m 2B [BH82/3]. Denote by (W) the linear span of the vectors {w : w G W} and

define

A : - {W c V : (V\W) ^ IRm}.

Then,

7 r n S ( V ) = p | ke r Dw, (22 ) weA

where n denotes the space of polynomials.

E x a m p l e 5. (i) As was pointed out in Example 4 (ii), the tensor product B-spline is

obtained when V consists of the unit vectors. Assume that each unit vector occurs with

multiplicity a, then A contains the sets

Wu - { e„ , . . . , c „} , i/ = l , . . . , m ,

ot t ime s

and any other set in A contains one of these sets as subset. Therefore, by (22), a polynomial

p is in S(V) if and only if it is annihilated by

Dw„ =d", i/ = l , . . . , m .

(ii) If the vectors in V are in "general" position, then all sets in W G A satisfy #W > k. Thus,

by (22), all polynomials of degree < k are in S(V). This is, e.g., the case for the quadratic

box spline of Example 4 (iii).

MULTIVARIATE SPLINES 121

Proof of Theorem 2B. Let

p:= ^ T ajNv{'-j)ennS{V). j€7Ljn

By the remark following Theorem IB,

DvP= Ylaj(Nv\v(' ~ J) ~ Nv\v(' - 3 ~ v)) = ^2ia3 ~ a3-v)Nv\v(- - j ) ,

where it was used that v has integer coefficients. Repeating this argument,

Dwp = ^ ( V v a ) ^ v ^ ( . - j). (23)

For W G A, the box splines Nw\v{m ~ j) have support on a set of measure zero which implies

that the polynomial DwP vanishes identically, i.e. lies in the kernel of Dw>

For the converse statement we first prove that

L := p | ker Dw C n. (24) WEA

Fix f G IRm. If V ^ A, then £ can be written as linear combination of the vectors in V \ V ,

Therefore,

Iterating this identity, replacing (D^)r Dy by a linear combination of (i)^) r _ 1ZV"UI*;#J w' £

V\V", one arrives at

(DtY = ( E «(V)(^)r-#v'i>v.)+( E «(n*v.) (25) VGA, #V '<r v'cv. VgA

# V ' = r

with certain coefficients a(V).

This proves (24) since, for r > # V , the second sum on the right hand side of (25) is empty

and the derivatives Dy> in the first sum vanish on functions in L.

To complete the proof of the Theorem we show by induction on r that

7 r r n L cS(V)

where 7rr denotes the space of polynomials of degree < r. For the induction step, we prove

that

p G 7rr n L implies q := p - ),p(j)Ny(' - j) G nr-i n L.

122 KLAUS HOLLIG

By (23),

DW(P -g) = ^2(VWP)U)NV{- ~ J)-3

If W G A, then Dwp = 0 and by (18) also (Vwp)(j) = (Awp)(j') = 0. which shows that

p- ge L.

By (25) and since q G L (i.e. Dv*q = 0 for V G A),

(De)r<7 - £ a (V ' ) (ZVp - E ( V v p ) ( i ) J V v \ v ( - - i ) ) .

V'CV. V g A jf

Since p is a polynomial of degree < r, Dy'P — Vy»p, and since J^ -ATv\v (* — i") — 1 it follows

that ( ^ ) r g = 0.

From Theorem 2B one can derive error estimates for approximation by box splines. More­

over, the result is useful for studying approximation order for piecewise polynomials on regular

triangulations. For this and further results, the reader is referred to the work by de Boor and

the author [BH82/3, BH831?2] and Dahmen and Micchelli [DM84l5 DM85!].

Surface Approximation

As pointed out in section 3, box splines are natural generalizations of tensor product

B-splines. The underlying triangular meshes yield more flexibility in the choice of degree

and smoothness while some of the attractive computational features of tensor products are

maintained. In this section a simple approximation scheme is described and shape preserving

properties of box spline expansions are discussed.

Denote by Ny the (bivariate) box spline corresponding to a grid of meshsize h and, slightly

changing the notation of the previous section, assume that Ny is centered at 0, i.e

Nb(x)i=Nv(x/h-£v) (26)

where £y •= ^2v^v v'/2 is the center of the box spline Ny defined in (16). Moreover, denote by

N+ the piecewise linear box spline corresponding to the directions V* := {(1,0), (0,1), ( l , 1)}.

In the following it is always assumed that V contains V*. This excludes tensor product

splines and certain degenerate cases where the translates of the box splines Ny are not linearly

independent and therefore is no significant loss of generality.

Define the approximation scheme

f~S$f:= £ f{jh)N$(--jh). (27)

MULTIVARIATE SPLINES 123

which is a generalization of Schoenberg's univariate variation diminishing spline approximant.

In particular, if V = V*, then Sf / is the piecewise linear interpolant to / with respect to the

triangulation of IR which is generated by the three directions (1,0), (0,1) and (1,1).

Proposition 2. The method (27) is second order accurate, i.e.

f(x)-(S>}f)(x) = 0(h2) (28)

for any smooth function / .

Proof. The piecewise linear interpolant S^f is second order accurate. Therefore, arguing

by induction, it is sufficient to show that the estimate (28) remains valid if a vector w is added

to the set of vectors V. From (20') and (26) one sees that

NvuJ*) = h'HK * N$)(x) := f N$(x - Xhw)d\ (29) J-l/2

which implies

S$Uwf = h-1N**Sbf. (30)

Write the left hand side of (28) in the form

(/ - h-'Nt * f)[x) + (h-*Nt * (f - S*/))(*).

The second term is of order h2 since convolution by h~1N^u does not increase the maximum

norm. The first term equals

r l / 2

/ (f(x)- f(x- Xhw))dX. J-1/2

, 1 /2

1/2

Adding 0 = j]_^/2{Dwf){z)Xhd\ to th is expression, it follows that this term is also of order

O(h').

Obviously, Sy is a positive operator, i.e., if / is nonnegative, then so is Syf. Moreover,

Sy preserves monotonicity and convexity which is made more precise below.

Proposition 3 [DM852, G85].

(i) If, for some $ E IR2, D^S^f is nonnegative, then so is D^Syf.

(ii) If S^f is convex, then so is Syf.

The piecewise linear spline S^f is called the "control polygon" of the box spline surface

Syf. It interpolates the box spline coefficients at the points j £ TZ?'. The Proposition states

that the box spline surface has roughly the same "shape" as its control polygon which is a

desirable feature for design purposes.

124 KLAU S HOLLI G

The Proof of Proposition 3 is quite simple: It follows from the identity (30) and the

observation that convolution with a positive kernel preserves monotonicity and convexity.

E.g., for the proof of (ii) assume by induction that Sy f is convex. Then, for x — Y^u Q{V)XU

with Ylu Q(V) — 1 and q{v) > 0,

, 1 /2

1/2 [Svuv,f)(*) = I ( ^ v / ) ( ( E ^ K ) - Xh™)dX

In principle, box splines can be evaluated via the recurrence relation of Theorem IB (ii).

However, for approximate evaluation as is required, e.g. for rendering techniques, algorithms

based on subdivision techniques are considerably faster. For box splines such algorithms have

been developed by Bohm [B683], Cohen, Lyche and Riesenfeld [CLR84], Dahmen and Micchelli

[DM842] and Prautsch [P83/84]. The idea can be described as follows.

A box spline expansion V av (j)Ny (• — jh) can be rewritten as a linear combination of

the box splines Nv' (• - jh/2) corresponding to a refined grid, i.e.

£ 4 ( j ) JV* (x - jh) = £ ahJ2{j + (v)K'2(x - (j + £v)h/2), (31) 0 3

where £y '•— J2vev v/^- The s n ^ by £v is necessary only if £v ^ 2Z2 since then the mesh for hi 1 i>

Ny is not a refinement of the mesh corresponding to Ny. The subdivision process can be

repeated and, as has been shown in [D85], the sequence of control polygons converges to the

box spline surface at a quadratic rate. The coefficients av' in (31) can be computed via the

following

Y^av{j)N^{ih/2-jh), i EH2.

Algorithm.

(i) Define

(ii) Set V ' :=V*.

(iii) if V = V stop

else choose w 6 V\V

ay[ (1) ::

and define

< 4 ' L ( i + tv + W2) := (avU+ tv) + a £ ( i + £v + w))/2 for j e 2L2.

(iv) Set V := V U w and go to step (iii).

Example 6. As was first observed by Bohm [B683], the algorithm takes on a particularly

simple form if

V = K U . . . u V, ,

MULTIVARIATE SPLINES 125

i.e. if V contains the vectors (1,0), (0,1) and (1, l) with equal multiplicity r. In this case three

applications of step (iii) of the Algorithm can be combined which results in

"0 1 1 2

L 1 *

l l l 0

V'uv. I?) == M 1 2 1 U M J ) , J € K Z

where the weights in the square matrix are applied centered at the (double) index j .

Derivation of the Algorithm. For V = V*, the new coefficients are obtained by linear

interpolation since the control polygon interpolates the box spline coefficients. This explains

step (i) of the algorithm. Now, one has to show that (31) remains valid if a vector w is added

to the set V and the coefficients aVUw are computed via step (iii). Convolve both sides of (31)

with h~lN^. Then, by (29), on the left hand side N$ is replaced by N$Uw. For the right

hand side one obtains

/

1/2 N*/2(x- Xhw)d\

-1/2

= (1/2) / Ny/2(x - \{h/2)w)d\

H 1 / 2 M / ; . . . + £ . . . )

= {l/2){K'^{x - hw/4) + Ntfjx + hw/4)).

Therefore, using that y := x - (j + £v){h/2) - (h/2){w/2) = x - (j + £vuw)(h/2), the right

hand side of (31) equals

X > v / 2 0 ' + £v) (1/2) (Ntfjx - V+tvu*)h/2) + (K'Jjx - (j+(Vuw)h/2 + hw/2)) j

= £ (1/2) ( 4 / 2 ( i ) + ahv/2U + «,)) Ntfjx - (j + Zv^)h/2)

3

which establishes the formula for the coefficients.

Bibliography

[B683] W. Bohm, Subdividing multivariate splines, Computer Aided Design 15 (1983), 345-352.

[B76] C. de Boor, Splines as linear combinations of B-splines, in Approximation Theory II,

G. G. Lorentz, C. K. Chui and L. L. Schumaker, eds., Academic Press (1976), 1-47,

[BD83] C. de Boor and R. DeVore, Approximation by smooth multivariate splines, Trans. Amer.

Math. Soc. 276 (1983), 775-788.

[BH82] C. de Boor and K. Hollig, Recurrence relations for multivariate B-splines, Proc. Amer. Math. Soc. 85 (1982), 397-400.

126 KLAU S HOLLI G

[BH82/3] C. de Boor and K. Hollig, B-splines from parallelepipeds, J. Analyse Math. 42 (1982/3),

99-115.

[BH83i] C. de Boor and K. Hollig, Approximation order from bivariate C^-cubics: A counterex­

ample, Proc. Amer. Math. Soc. 87 (1983), 649-655.

[BH832] C. de Boor and K. Hollig, Bivariate box splines and smooth pp functions on a three-

direction mesh, J. Comput. Appl. Math. 9 (1983), 13-28.

[BHR85] C. de Boor, K. Hollig and S.D. Riemenschneider, Convergence of cardinal series, Proc.

Amer. Math. Soc, to appear.

[CLR84] E. Cohen, T. Lyche and R. Riesenfeld, Discrete box-splines and refinement algorithms,

Computer Aided Geometric Design 1 (1984), 131-148.

[D85] W. Dahmen, Subdivision algorithms converge quadratically, Tech. Rep. 710 (1985),

Sonderforschungsbereich Universitat Bonn.

[DM82] On the linear independence of multivariate B-splines, I. Triangulations of simploids, SIAM

J. Numer. Anal. 19 (1982), 993-1012.

[DM84x] W. Dahmen and C. A. Micchelli, Recent progress in multivariate splines, in Approxi­

mation Theory IV, C. K. Chui, L. L. Schumaker and J. Ward, eds., Academic Press,

New York (1984), 27-121,

[DM842] W. Dahmen and C. A. Micchelli, Subdivision algorithms for the generation of box-spline

surfaces, Computer Aided Geometric Design, to appear.

[DM85i] W. Dahmen and C.A. Micchelli, Combinatorial aspects of multivariate splines, Tech. Rep.

722 (1985), Sonderforschungsbereich Universitat Bonn.

[DM852] W. Dahmen and C.A. Micchelli, Convexity of multivariate Bernstein polynomials and box

spline surfaces, Tech. Rep. 735 (1985), Sonderforschungsbereich Universitat Bonn.

[F71] P.O. Frederickson, Generalized triangular splines, Tech. Rep. 7-71, Lakehead University,

1971.

[GL81] T.N.T. Goodman and S.L. Lee, Spline approximation operators of Bernstein-Schoenberg

type in one and two variables, J. Approx. Theory 33 (1981), 248-263.

[G85] T.N.T. Goodman, private communication.

[H82] K. Hollig, Multivariate splines, SIAM J. Numer. Anal. 19 (1982), 1013-1031.

[M80] C. A. Micchelli, A constructive approach to Kergin interpolation in IR : Multivariate

B-splines and Lagrange interpolation, Rocky Mountain J. Math. 10 (1980), 485-497.

[PS77] M.J.D. Powell and M.A. Sabin, Piecewise quadratic approximation on triangles, ACM

Trans. Math. Softwares (1977), 316-325.

MULTIVARIAT E SPLINE S 12 7

[P83/84] H. Prautsch, Unterteilungsalgorithmen fiir multivariate splines, ein geometrischer Zugang,

Dissertation, Technische Universitat Braunschweig (1984).

[Z73] P. Zwart, Multivariate splines with nondegenerate partitions, SIAM J. Numer. Anal. 10

(1973), 665-673.

COMPUTER SCIENCES DEPARTMENT

UNIVERSITY OF WISCONSIN-MADISON

MADISON, WISCONSIN 53706

This page intentionally left blank

INDEX

adaptive, 18 alternating algorithm, 75 alternation, 4-5, 71-72 analytic

continuation, 44 function, 17, 22, 35, 89

approximation, adaptive, 17-18 best, 2, 67 L2, 22, 24 complex, 21 good,10 linear, 55 near-best, 22 nonlinear, 16, 72 polynomial, see polynomial rational, see rational

attenuation factors, 88

Bernstein, 10, 12, 16, 34, 56, 111, 115 Bezier representation

of a pp function, 103 Bezout's theorem, 92 bivariate, 76-78 Blaschke product, 31 blending methods, 76 box spline, 100, 116-125 B-spline, 87, 92, 100, 103

calculation of b.a., 8 Caratheodory-Fejer Theorem, 32 capacity, 29 CF approximation, 32 characterization

of b.a., 5, 30 Chebyshev

expansion or series, 28 polynomials, 9, 11, 28, 31, 57 space, 6

system, 6, 85-86, 89 Chebyshev's Theorem, 4 Christoffel numbers, 42 circularity (of the error), 31-32 computer-aided design, 116 continued fraction, 38 control polygon, 123 convexity

strict, 3 convolution, 13

degree of approximation, 12, 23, 34, 43-44, 121

differential correction (algorithm), 72 Diliberto-Straus (algorithm), 74 dist, 2, 67-68 divided difference, 82 duality, 57, 68, 92

eigenvalues, 32, 59, 96-97 electrostatics, 29 entire (functions), 23, 40 equi-oscillation (criterion), 71-72 exchange method, 70 existence, 2, 43 exponentials, 6

Faber polynomials, 26 series, 26-28

Favard, 12 Fejer, 32 Fekete points, 29 finite element, 117 Fourier

series, 8, 88 transform, 119

discrete, 88 fast, 85

129

130

Gauss quadrature, 42 Gel'fand n-width, 57, 65 von Golitschek (algorithm), 76

Haar space, 6, 30, 70, 72 Hankel matrix, 32 Hardy space, 89 Hermite, 23, 28, 82

-Genocchi formula, 83, 104

interpolation, 10, 56, 81-102 by polynomials, 9, 21, 23, 28, 82-84

good points for, 11, 28-29 quasi-, 16

intrinsic error, 60

Jackson kernel, 14

Jackson's Theorem, 12, 34, 53 Joukowski transformation, 27

Kergin interpolation, 97-98 Kolmogorov

-Arnold Theorem, 76 criterion, 30 n-width, 52

Lagrange form, 11, 82, 84, 86, 92, 100 Lebesgue function, 11 least-squares, 22 lifting of a map, 99-100 linear

n-width, 55-56 programming, 68, 74

Markov inequality, 57 Marsden's identity, 112 Mergelyan's theorem, 33 minimization, 67-68 modulus of continuity, 12, 108 multipoint, 42 multiquadric surface, 96 multivariate, 42, 76-78, 89-100, 103

near circularity, 31 von Neumann (algorithm), 75 Neville-Aitken formula, 83 Newman's approximation

to the absolute value, 16-17, 44 Newton form, 83 nonlinear, 16, 55, 72 nomographic (functions), 76

INDEX

n-width, 51-60 Bernstein, 56 Gel'fand, 57, 65 linear, 55

optimal algorithm, 60 interpolation, 93-95 recovery, 60 spline interpolation, 89, 93 subspace, 53

asymptotically, 55 orthogonal, 8, 22

polynomial, 41 projector, 55

Pade approximant, 35

multipoint, 44 multivariate, 44

table, 37 partition of unity, 19 periodic (spline), 88 Perron, 40 piecewise polynomials, 14-15, 16

see also spline Poly a frequency sequence, 40 polynomials,

algebraic, 1, 5, 9, 11, 16-17, 21-22, 55 trigonometric, see trigonometric

positive definite, conditionally, 96

potential, 29 power function, 6

truncated, 14 projector or projection, 10, 22

minimal, 10, 22 proximity map, 75

central, 75

quasi-interpolant, 16

Radon transform, 99 rational (function)s, 16, 18, 32, 33, 35,

42-45, 72-74 realist, 27 recovery,

optimal, 61-66 recurrence relations, 104-106, 109-110,

117-118 Remez algorithm, 68, 70

INDEX 131

Riemann mapping theorem, 25 ridge function, 91 Rouche's theorem, 31 Runge's theorem, 25, 33

scattered data, 91 Schoenberg, 87, 104, 109, 122 shape preservation, 122 signature

extremal, 33 simplex spline, 109-116 singular value decomposition, 60 Sobolev space,

n-width of, 58-59 spline, 14-15, 54-55, 86-89

box, see box spline B-, see B-spline cardinal, 117 free knot, 17 "natural", 93 perfect, 89 periodic, 88 polyhedral, 100 simplex, see simplex spline thin plate, 95

Stieltjes function, 40, 44 subdivision algorithm, 103, 123 Swiss cheese, 43 s-numbers, 59

Taylor polynomial, 14, 21-22, 26, 33, 58 tensor product, 75, 91, 117, 120, 122 Toeplitz determinant, 37 transfinite diameter, 29 trigonometric polynomials, 6, 9, 13, 52,

84-85 truncated power, 100

uniqueness, 2, 4, 5, 43, 69

Vandermonde, 9, 29, 82 vive la difference!, 42

Walsh, 34 array, 43

Weierstrass Approximation Theorem, 1, 11,33

winding number, 31

ABCDEFGHIJ-89876