approximation theory output
TRANSCRIPT
AMS SHORT COURSE LECTURE NOTES Introductory Survey Lectures
A Subseries of Proceedings of Symposia in Applied Mathematics
Volume 36 APPROXIMATION THEORY Edited by Carl de Boor (New Orleans, Louisiana, January 1986)
Volume 35 ACTUARIAL MATHEMATICS Edited by Harry H. Panjer (Laramie, Wyoming, August 1985)
Volume 34 MATHEMATICS OF INFORMATION PROCESSING Edited by Michael Anskel and William Gewirtz (Louisville, Kentucky, January 1984)
Volume 33 FAIR ALLOCATION Edited by H. Peyton Young (Anaheim, California, January 1985)
Volume 32 ENVIRONMENTAL AND NATURAL RESOURCE MATHEMATICS Edited by R. W. McKehey (Eugene, Oregon, August 1984)
Volume 31 COMPUTER COMMUNICATIONS Edited by B. Gopinath (Denver, Colorado, January 1988)
Volume 30 POPULATION BIOLOGY Edited by Simon A. Levin (Albany, New York, August 1988)
Volume 29 APPLIED CRYPTOLOGY, CRYPTOGRAPHIC PROTOCOLS, AND COMPUTER SECURITY MODELS By R. A. DeMillo, G. I. Davida, D. P. Dobkin, M. A. Harrison, and R. J. Lipton
(San Francisco, California, January 1981)
Volume 28 STATISTICAL DATA ANALYSIS Edited by R. Gnanadesikan (Toronto, Ontario, August 1982)
Volume 27 COMPUTED TOMOGRAPHY Edited by L. A. Shepp (Cincinnati, Ohio, January 1982)
Volume 26 THE MATHEMATICS OF NETWORKS Edited by S. A. Burr (Pittsburgh, Pennsylvania, August 1981)
Volume 25 OPERATIONS RESEARCH: MATHEMATICS AND MODELS Edited by S. I. Gass (Duluth, Minnesota, August 1979)
Volume 24 GAME THEORY AND ITS APPLICATIONS Edited by W. F. Lucas (Biloxi, Mississippi, January 1979)
Volume 23 MODERN STATISTICS: METHODS AND APPLICATIONS Edited by R. V. Hogg (San Antonio, Texas, January 1980)
Volume 22 NUMERICAL ANALYSIS Edited by G. H. Golub and J. Oliger (Atlanta, Georgia, January 1978)
Volume 21 MATHEMATICAL ASPECTS OF PRODUCTION AND DISTRIBUTION OF ENERGY Edited by P. D. Lax (San Antonio, Texas, January 1976)
http://dx.doi.org/10.1090/psapm/036
PROCEEDINGS OF SYMPOSIA IN APPLIED MATHEMATICS
Volume 20 THE INFLUENCE OF COMPUTING ON MATHEMATICAL RESEARCH AND EDUCATION Edited by J. P LaSalle (University of Montana, August 1978)
Volume 19 MATHEMATICAL ASPECTS OF COMPUTER SCIENCE Edited by J. T. Schwartz (New York City, April 1966)
Volume 18 MAGNETO-FLUID AND PLASMA DYNAMICS Edited by H. Grad (New York City, April 1965)
Volume 17 APPLICATIONS OF NONLINEAR PARTIAL DIFFERENTIAL EQUATIONS IN MATHEMATICAL PHYSICS Edited by R. Finn (New York City, April 1964)
Volume 16 STOCHASTIC PROCESSES IN MATHEMATICAL PHYSICS AND ENGINEERING Edited by R. Bellman (New York City, April 196S)
Volume 15 EXPERIMENTAL ARITHMETIC, HIGH SPEED COMPUTING, AND MATHEMATICS Edited by N C. Metropolis, A. H. Taub, J. Todd, and C. B. Tompkins (Atlantic City and Chicago,
April 1962)
Volume 14 MATHEMATICAL PROBLEMS IN THE BIOLOGICAL SCIENCES Edited by R. Bellman (New York City, April 1961)
Volume 13 HYDRODYNAMIC INSTABILITY Edited by R. Bellman, G. Birkhoff and C C Lin (New York City, April 1960)
Volume 12 STRUCTURE OF LANGUAGE AND ITS MATHEMATICAL ASPECTS Edited by R. Jakobson (New York City, April 1960)
Volume 11 NUCLEAR REACTOR THEORY Edited by G. Birkhoff and E. P. Wigner (New York City, April 1959)
Volume 10 COMBINATORIAL ANALYSIS Edited by R. Bellman and M. Hall, Jr. (New York University, April 1957)
Volume 9 ORBIT THEORY Edited by G. Birkhoff and R. E. Longer (Columbia University, April 1958)
Volume 8 CALCULUS OF VARIATIONS AND ITS APPLICATIONS Edited by L. M. Graves (University of Chicago, April 1956)
Volume 7 APPLIED PROBABILITY Edited by L. A. MacColl (Polytechnic Institute of Brooklyn, April 1955)
Volume 6 NUMERICAL ANALYSIS Edited by J. H. Curtiss (Santa Monica City College, August 1958)
Volume 5 WAVE MOTION AND VIBRATION THEORY Edited by A. E. Heins (Carnegie Institute of Technology, June 1952)
Volume 4 FLUID DYNAMICS Edited by M. H. Martin (University of Maryland, June 1951)
Volume 3 ELASTICITY Edited by R. V. Churchill (University of Michigan, June 1949)
Volume 2 ELECTROMAGNETIC THEORY Edited by A. H. Taub (Massachusetts Institute of Technology, July 1948)
Volume 1 NON-LINEAR PROBLEMS IN MECHANICS OF CONTINUA Edited by E. Reissner (Brown University, August 1947)
AMS SHORT COURSE LECTURE NOTES Introductor y Surve y Lecture s
publishe d as a subserie s o f Proceeding s o f Symposi a in Applie d Mathematic s
CONTRIBUTORS
E. W. CHENEY, Department of Mathematics, University of Texas at Austin, Austin, Texas
RONALD A. DEVORE, Department of Mathematics and Statistics, University of South Carolina, Columbia, South Carolina
KLAUS HOLLIG, Computer Sciences Department, University of Wisconsin-Madison, Madison, Wisconsin
CHARLES A. MICCHELLI, IBM T. J. Watson Research Center, Yorktown Heights, New York
A. PINKUS, Department of Mathematics, Technion, Haifa, Israel
E. B. SAFF, Institute for Constructive Mathematics, Department of Mathematics, University of South Florida, Tampa, Florida
PROCEEDINGS OF SYMPOSIA IN APPLIED MATHEMATICS
Volume 36
Approximatio n Theor y Carl de Boor, Editor
American Mathematical Society Providence, Rhode Island
LECTURE NOTES PREPARED FOR THE AMERICAN MATHEMATICAL SOCIETY SHORT COURSE
APPROXIMATION THEORY HELD IN NEW ORLEANS, LOUISIANA
JANUARY 5-6 , 1986
The AMS Short Course Series is sponsored by the Society's Committee on Employment and Education Policy (CEEP). The series is under the direction of the Short Course Advisory Subcommittee of CEEP.
Library of Congress Cataloging-in-Publication Data Approximation theory.
(Proceedings of symposia in applied mathematics, ISSN 0160-7634; v. 36) (Proceedings of symposia in applied mathematics; v. 36. AMS short course lecture notes)
Includes bibliographies and index. 1. Approximation theory—Congresses. I. De Boor, Carl, 1937—
II. American Mathematical Society. III. Series, IV. Series: Proceedings of symposia in applied mathematics. AMS short course lecture notes. QA221.A653 1986 51l'.4 86-10846 ISBN 0-8218-0098-1 (pbk.: alk. paper)
COPYING AND REPRINTING. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy an article for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication (including abstracts) is permitted only under license from the American Mathematical Society. Requests for such permission should be addressed to the Executive Director, American Mathematical Society, P.O. Box 6248, Providence, Rhode Island 02940.
The appearance of the code on the first page of an article in this book indicates the copyright owner's consent for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law, provided that the fee of $1.00 plus $.25 per page for each copy be paid directly to the Copyright Clearance Center, Inc., 21 Congress Street, Salem, Massachusetts 01970. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale.
1980 Mathematics Subject Classification (1985 Revision). Primary 41-01; Secondary 41-02, 30E10, 65Dxx.
Copyright ©1986 by the American Mathematical Society. All rights reserved. Printed in the United States of America.
This volume was printed directly from copy prepared by the authors. The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
CONTENTS
Preface
Approximation of Functions RONALD A. DEVORE
Polynomial and Rational Approximation in the Complex E. B. SAFF
N-Widths and Optimal Recovery A. PlNKUS
Algorithms for Approximation
E. W. CHENEY
Algebraic Aspects of Interpolation
CHARLES A. MICCHELLI
Multivariate Splines
KLAUS HOLLIG
Index
ix
PREFACE
This book is the result of the 1986 American Mathematical Society Short Course entitled Approximation Theory given at the annual meeting at New Orleans, on January 5-6, 1986.
Approximation Theory is properly a subfield of Analysis, but derives much of its impetus from applications such as data fitting, the representation of curves and surfaces for design and display, the reconstruction of functions from partial information, the numerical solution of functional equations and the like. For this reason, Approximation Theory offers ready-made applications of the basic ideas of Analysis.
The first lecture describes and illustrates the basic concerns of Approximation Theory. The other lectures are intended to provide a quick introduction into some of the areas of current research interest. Topics highlighted are: Approximation in the complex domain, n-width, optimal recovery, interpolation, algorithms for approximation, and splines, with strong emphasis on a multivariate setting in the last three topics.
I thank the authors very much for the considerable and selfless effort they have put into the preparation of the lectures and these notes.
Carl de Boor Madison, Wisconsin March, 1986
XI
Proceedings of Symposia in Applied Mathematics Volume 36, 1986
APPROXIMATION OF FUNCTIONS
Ronald A. DeVore
Approximation Theory began at the end of the last century with the study
of the approximation of functions by polynomials and rational functions. It is
a broad subject which interacts with various aspects of real, complex and
functional analysis. Some of its recent popularity comes from its importance in
the development of numerical algorithms and the solution of problems of
optimization.
One hundred years ago, Weierstrass proved his famous theorem on the
approximation of continuous functions by algebraic polynomials. Undoubtedly,
everyone of you has seen this theorem. But, in order to guarantee that we are
all starting at the same point, let's begin with a formulation of this theorem
which uses the notation of Approximation Theory.
We want to approximate functions f which are continuous on an interval
I:=[a,b]. We let C(I) denote the set of all such functions and let
||f||:- sup |f(x)|,
xsl
be its norm. We are interested in approximating f by algebraic polynomials
P(x)= a^ + a ^ + ... + a x of degree at most n. If II denotes the set of o l n ° n
a l l such polynomials, we l e t E (f) be the e r ro r of approximation to f from II :
(o.i) En<f>:s= inf M f - p M -Pe n
n
With this, we have
THEOREM 0.1. (Weierstrass [W]) If fsC(I), then E (f)-K) as n->°°.
In other words, each continuous function can be approximated arbitrarily well
in the uniform norm by polynomials. There are many wonderful proofs of
© 1986 American Mathematical Society 0160-7634/86 $1.00 + $.25 per page
1
http://dx.doi.org/10.1090/psapm/036/864363
2 RONALD A. DEVORE
Weierstrass' theorem. We shall give one of these a little later (§7).
Important theorems often open more doors than they close. This is
certainly the case with the Weierstrass theorem. Now that we know polynomial
approximation is possible, we are confronted with questions like:
Are there polynomials P* e P which attain the infimum in (0.1)?
If such a P* exists, is it unique?
How can we calculate P*?
Can we say anything about how fast En(f) tends to zero?
These fundamental questions about polynomial approximation were studied
around the turn of the century and their solution forms the foundation of
Approximation Theory. It is appropriate therefore in a course on
approximation, whether it be short or long, that we begin with a look at the
solution to these questions.
§1. Best Approximation. The polynomials P* are called polynomials of best
approximation to f of degree n. Let us begin with the questions of existence
and uniquenes of the P*. These can be discussed in the following more general
setting. We have a normed linear space X, | | . | | and one of its finite
dimensional subspaces Y. We are interested in approximating xeX by the
elements Y. For this, we introduce the distance d(x,Y) of x to Y:
(1.1) dist(x,Y):= inf ||x-y||.
yeY
If y*sY assumes the infimum in (1.1), then we say that y* is a best
approximation to x from Y and we let B(x) denote the collection of all such
best approximants. A simple but useful remark is that B(ax) = cxB(x) for any
scalar a. It is rather easy to see that best approximants always exist.
THEOREM 1.1. For each xsX, B(x)^0.
Proof. The infimum in (1.1) can be restricted to those yeY with ||y||< 2||x||.
Indeed any other y gives ||x-y||> ||y|| - ||x|| > ||x||, so that 0 is a better
approximation than y. Now the closed ball ||y||< 2||x|| is a compact set
(because Y is finite dimensional) [L,p.l6], hence the continuous function
||x-y|| attains its infimum on this set. In this way, we see that best
approximants exist.
A more subtle question is to decide when best approximation is unique,
that is, when does B(x) have just one element? This depends very much on the
APPROXIMATION OF FUNCTIONS 3
geometry in the space X, namely, on the shape of the unit ball U:={x: ||x||<l}.
This ball is always convex: if x,x'sll, then the line segment [x,x']:=
{zeX: z= ax+(l-a)x, a>0} is contained in U. We say U is strictly convex if
(1.2) ||o«+(l-a)*'|| < 1, for all x,x'eU, x^x', and all 0<a<l.
Strict convexity means that the interior of the line segment [x,x'] is
contained in the interior of U, for all x,x'€ U.
THEOREM 1.2 If the unit ball U of X is strictly convex, then best
approximation from a finite dimensional space Y ijs unique, that is, B(x) is; a
singleton for all xeX.
Proof. Suppose y,y'e B(x):
l|x-y|| - l|x-y'|| = dist(x,Y).
If dist(x,Y)=0, then x=y=y' as desired. Otherwise, we can rescale (that is
multiply x,y,y' by the same constant) so that dist(x,Y)=l. Then %(y+y') is in
Y. If y^y', then from (1.2),
llx - %<y+y')ll < dist(x,Y),
which is an obvious contradiction. Hence y=y' and best approximation is
unique.
The most important normed spaces X with strictly convex unit balls are the
L (I) spaces. This space consists of all Lebesgue integrable functions on I
for which the following norm is finite:
r D 1 / p
(1.3) ||f|l(I):= { |f(x)|p dx} , l<p<-. P Jj
When p=°°, the right side of (1.3) is replaced by the essential supremum of f.
The spaces L are strictly convex when Kp<°°. This is proved by examining when
equality can hold in the triangle inequality for the L norm.
From the strict convexity of the L spaces, Kp<°°, it follows that any
function fsL (I) has a unique best approximation P* from II .
Analogous to the L are the I spaces which consist of all sequences
x=(x.)i for which the norm
/ m xl/p I E l^ i p , I<P<° M = I * }
|x||p:=
max |x.|, Ki<m
4 RONALD Ao DEVORE
is finite. These spaces also have strictly convex unit ball if K p O but not
when p=l or ». This is easily seen when m=2 where the boundaries of the unit
balls are depicted in Figure 1 for the values p= 1,2,°°.
P=l
Figure 1
Unit balls of I1 for p=l,2,«>
Unfortunately, the space C(I), of most immediate interest to us, does not
have a strictly convex unit ball. For example, the functions <|>(x)=x and 2
\J/(x)=x each have norm one in C[0,1] but %(<|>+ ) also has norm one. Even more
to the point, it is easy to construct finite dimensional subspaces Y of C[0,1]
from which best approximation is not unique. Consider, for example, the space
Y:= span {<j>, }. Any non-negative function in Y of norm at most two is a best
approximation to f(x):=l.
Nevertheless, the situation is not as bad as it seems. It was shown by
P.L. Chebyshev that each feC(I) has a unique best approximation from II . The
special properties of II that make this true is the next subject for
discussion.
2. Chebyshev's Theorem. This theorem gives that best approximation P* from II
to a continuous function f is unique. To prove this theorem, P.L. Chebyshev
analyzed the behavior of the error function E(x):= f(x)-P*(x). He showed that
there are many points where E alternately takes on the values ±||E||.
THEOREM 2.1 (Chebyshev) If feC(I) and P* is its best approximation from IIn,
then there are points x ,. s- o ,x -, and a value X)= +1 such that > n+1*
(2.1) E(x.) = (-1)1 n ||E||, i=0,...,n+l.
Hence, this theorem shows that there are at least n+2 points where the error E
alternately takes on its maximum (||E||) and its minimum (-||E||).
Proof of Theorem 2.1. Let x be the first point x (from the left) on I where
|E(x)| = ||E||; such a point exists because E is continuous. Then, let x. be
the first point x>x where E(x)=-E(xJ. Continuing in this way, we create a o o
sequence of points x <...<x where E alternately takes on the values ±||E||.
APPROXIMATION OF FUNCTIONS 5
We claim that m>n, as desired. Indeed, if m<n, then because of the continuity
of E, we can find points £-,..., £ with x <£-<x..<...<£ <x such that
[£., £. -] contains no points x with E(x)=-E(x.). Now,for a proper choice
of Y> the polynomial P(x):= YCX-S^)*••( x -£ m) *s of degree < n and agrees in
sign with E at each of the points xQ,...,xm. If E(x^)>0, then P(x)>0 on
(t i,^ i + 1) and also E(x)> -E(xi) on [^i»^i+il« Hence, for f|>0 sufficiently
small, |E(x)-r|P(x)| < ||E||, for x£[^.,^. + 1]. The same is true when ECx.^0
and also on the end intervals. Since there are only a finite number of
intervals we can choose one T\>0 and obtain | | f-f|P| | <| |E| | . But then, this
means that P*+f)P is a better approximation to f than P*. This contradiction
means that m>n and proves Chebyshev's theorem.
From the Chebyshev alternation theorem, it is easy to prove the uniqueness
of best polynomial approximants.
THEOREM 2.2. If f is in C(I), then f has a unique best approximation P*eIIn.
Proof. If P^ and P2 a r e t w o best approximants to f from IIn then so is P: =
%(Pl+P2). Let xQ,...,xn be alternation points for f-P. Then,
(2.2) Y2(f(x.)-P1(xi)) + y2(f(xi)-?2(xi)) = f(Xi)-P(x.) = ±||E||.
Since ||f-P1||<||E||, and likewise for f-P2, the only way that (2.2) can
hold is if f(xi)-P1(xi) = f (xi)~-P2(xi). That is P1(xi) = P2(xi), i=0,...,n.
This means that the polynomial ^\~^2 wnich is of degree < n has n+1 zeros and
hence must be the zero polynomial. Therefore, P- = P2.
Actually the proof of Theorem 2.2 shows more. Namely, we have the
following Chebyshev characterization of when a polynomial P is the best
approximation to f.
THEOREM 2.3. ^f feC(I) and PeIIn are such that f-P alternately takes on the
values ±M at least n+2 times with M:= ||f-P||, then P=P* i^ the best
approximation to f and M=En(f).
Proof. Let x-, i= 0,... be the alternation points of f-P. If Q is any other
polynomial with ||f-Q|| < M, then Q-P = f-P - (f-Q) has the same sign as f-P
at each of the x^. Hence Q-P has at least n+1 zeros and Q=P. This is the
desired contradiction.
In view of this theorem, the search for the best approximation is reduced
to finding a polynomial P such that f-P has sufficiently many alternations.
6 RONALD A. DEVORE
What are the essential properties of polynomials which were used in the
above proof of uniqueness? Well, in the Chebyshev alternation theorem, we
constructed a polynomial which changed sign precisely at the points
£^,...,£m, and in the proof of uniqueness we used the fact that any non-
trivial polynomial of degree n has at most n zeros. There are other n+1
dimensional subspaces Xn+^ of C(I) which have these properties. They are
called Haar spaces and any basis $ 0 y ' t $ n for X n + 1 is called a Haar system
(sometimes called a Chebyshev system).
DEFINITION 2.4. An n dimensional subspace Xn of C(I) is called a Haar subspace
if each function <j>e XR has at most n -1 zeros on I unless <$> is identically
zero.
Of course, IIn are the most important Haar spaces. Some other a. x
interesting examples are the span of the exponentials e , i=l,...,n or a.
the span of the power functions x , i=l,..,n. Here, OI,...,OL can be any
non-negative real numbers.
Haar spaces are much like polynomial spaces. For example, we have.
THEOREM 2.5. If Xn is a Haar space and feC(I), then f has a unique best
approximation from Xn.
The proof is essentially the same as that given above for polynomials
except that now one has to work much harder to show that there is a function <f>
which changes sign at any prescribed points £ p • • •»^m' m^n> f r° m the interior
of I.
Remarkably, the notion of Haar space actually characterizes the Chebyshev
spaces of C(I) (i.e. the spaces from which best approximation is unique).
Indeed, we have the following theorem of Haar.
THEOREM 2.6. If every fsC(I) has a unique best approximation from the subspace
XR, then XR is a Chebyshev space.
A proof of this theorem can be found in the book of Lorentz [L]. Haar
systems are important in many fields other than approximation. The interested
reader should consult the seminal paper of Krein [Kr] or the book of Karlin and
Studden [K-SJ.
3. Trigonometric polynomial approximation. It is not necessary for the
interval I to be closed in the definition of a Haar system. In fact, one of
the most important Haar systems is the space t n of trigonometric polynomials of
degree < n. A trigonometric polynomial of degree n is an expression of the
form
APPROXIMATION OF FUNCTIONS 7
n
T(x)= aQ + Yt (ak c o s ^ x + ^k s* n kx)
with a2 + b2 > 0. Any trigonometric polynomial T of degree n has at most
2n zeros on [0,2n). Hence T n is a Haar system on this interval.
Approximation by trigonometric polynomials is quite similar to algebraic
polynomial approximation except that now we approximate functions f which are
2n periodic. We let jC(T) denote the space of all such continuous functions
and let ||.|| be the supremum norm on (-00,00) or equivalently any interval of
length 2n. If feC(T), then f has a unique best approximation T*stn:
||f-T*|| = inf ||f-T||. T8T n
The error of approximation in this case is denoted by E*(f) = ||f-T*||.
There is a very useful and important connection between trigonometric and
algebraic polynomial approximation which is obtained by using the
transformation x= cos 9 to identify points on [-1,1] with points on [0,n]. If
fe C(I), I:=[-l,l], then the function g(9):= f(cos 9) is an even 2n periodic
continuous function in C(T). Similarly, if P is an algebraic polynomial of
degree n then T(9):= P(cos 9) is an even trigonometric polynomial of degree
n.
We can go the other way as well. Namely, for any even trigonometric
polynomial T, the function P(x):= T(arccos x) is an algebraic polynomial
n of degree at most n. In fact, T(9) = Z ak c o s ^9 and so P(x) is a linear
o combination of the functions Cjc(x):= cos k(arccos x) , k=0,l,...,n. The C^
are algebraic polynomials of degree k (see §5).
It follows from the uniqueness of best approximations that the best
approximation T* to the even function g is an even trigonometric polynomial.
Hence, the above one-to-one correspondence between algebraic polynomials P and
even trigonometric polynomials T gives that P* is the best approximation to f
if and only if T* is the best trigonometric approximation to g. We also have
(3.1) En(f) = E*(g).
This simple remark allows us to prove results about algebraic approximation by
considering their analogue in trigonometric approximation.
4. Computing best approximants. It is generally difficult to compute best
approximations. An exception is when X has an inner product ( , ) and its
induced norm: ||f||2: = (f,f). For example, L2(I) has the inner product
8 RONALD A. DEVORE
(4.1) (f,g):= f f(x)g(x)dx. JI
Now suppose that Xn is an n-dimensional subspace of X and we wish to
compute the best approximation to f€ X from X . We take a basis <f>-,...,<|> for
X which satisfies the orthonormality conditions
(4.2) <*i'V = hy i'* - 1>--->n»
with 6i. the usual Kronecker 6 notation. The best approximation <|>* from X to
f is then given by
(4.3) <f>* = £ (f,*k)<i>k.
In f a c t , s ince f-<f>* i s orthogonal to each <f>v, k = l , . . . , n , i t i s or thogonal to
every <|>€X . Therefore, we have
||f-<f>*-<|>||2 . ( f -**-+, f _$*_$) = (f-4>*,f-<|>*) + («,,*) > \\f-t*\\2.
for a l l <f>€ X , which c l e a r l y says that <|>* i s the best approxiamation to f from n
X . n
For example, when X=L2(ir), the space of 2n-periodic square integrable
functions, then the best approximation to f€L«(]T) by trigonometric polynomials
of degree at most n is S (f), the n-th partial sum of the Fourier series of f:
(4.4) Sn(f,x):= a /2 + £ (a, cos kx + b,sin kx), o
i rn i rn
a k : = H f ^ cos kx dx 5 b k : = i f^x^ sin kx dx*
In this case, the error of approximation is simply f J] (a,+b,) ] ^ n+1 K K ;
1/2
For approximation in the space C(I), there are only a few special cases
where best approximants can be computed exactly. The simplest of these is for
approximation by constants. For any feC(I), its best approximation from Il0 is
a:= %(m+M) with m the minimum of f on I and M the maximum of f on I. Indeed
since f takes on both its maximum and minimum on I, f-a has two alternations
and the Chebyshev criterion of Theorem 2.3 shows that a is the best
approximation.
APPROXIMATION OF FUNCTIONS 9
A similar result holds for the approximation of a convex (or concave)
function f by linear polynomials. If Q is the linear polynomial which
interpolates f at the end points of I, and M:= ||f-Q||, then P*:= Q-M/2 is the
best approximation to f from n^.
Another very important example is the approximation of xn by polynomials
of degree < n. This problem was solved by Chebyshev and gave rise to a very
important sequence of polynomials which bear his name.
5. Chebyshev Polynomials. We take I:=[-l,l]. To find the best approximation
to xn from nn_-p we need only find a polynomial Q(x) = xn + an_^x
n + ••• + a
such that Q alternately takes on the values ±||Q|| at least n+2 times on I.
From Theorem 2.3, xn-Q is the best approximation to xn from IIn_^. Now, the
trigonometric polynomial cos n9 has such alternation properties. Recalling our
discussion in &3 of the transformation x= cos 9, we see that C_(x):=
cos(n arccosx) is an algebraic polynomial of degree n which has norm one
and has the required n+1 alternations; namely, Cn(xjc) = (-l)n~k, for
Xk.:= cos (n-k)n/n, k=0,...,n.
Now the polynomial Cn is not quite the Q we are looking for since it does
not have leading coefficient one. But it is easy to compute the leading
coefficient of Cn. For this we use the recurrence relation
(5.1) Cn(x) = 2xCn_1(x) - Cn_2(x)
which follows from the corresponding trigonometric identity.
Since CQ(x)s 1 and C^(x)s x, it follows by induction from (5.1) that
(5.2) Cn(x) = 2 n _ 1 xn + lower order terms.
Hence Q(x):= 2~ n + 1 Cn(x) is our sought after polynomial and P*(x):= xn-Q(x)
is the best approximation to xn from Iln^. This also give the error of
approximation En(xn) = 2~ n + 1.
We do not have time to go into all the wonderful properties of Chebyshev
polynomials but we should mention one of their other applications to estimating
the size of En(f). Let
(5.3) xk:= cos (2k-l)rt/2n, k=l,...,n,
be the zeros of Cn. If feC[-l,l], we let P(x):=P(f,x) be the polynomial of
degree n-1 which interpolates f at the points x^, that is, P(x^)=f(x^),
k=l,...,n. The existence and uniqueness of P is well known and equivalent to
the non-vanishing of the Vandermonde determinant. Also, one can represent the
error (see [B,p. 9]) of interpolation by
10 RONALD A. DEVORE
(5.4) E(x):= f(x) - P(f,x) = ^ ^ f(n+1)(£;x) (x-x^.. . (x-xn).
We recognize that (x-x-^)...(x-xR) = Cn(x), so that the right side of (5.3) does
not exceed ||f(n)|| 2" n + 1/ (n+1)!. This gives THEOREM 5.1 (Bernstein). Tt f has n continuous derivatives, then
En(f) < ||f<n+1>|| 2- n + 1/ (n+1)!.
6. Interpolation. Usually, we cannot determine the best polynomial
approximants for a given feC(I). Instead, we look for polynomials which are
"good" rather than best approximations. The typical way of constructing such
polynomials is to find linear operators Ln which map C(I) onto IIn and have good
approximation properties. One posibility (others are considered in the next
section) is for Ln to satisfy
(6.1) ||f-Ln(f)|| < cn En(f).
with cn a constant which may depend on n. Then, except for the constant cn,
the polynomial Ln(f) is just as good an approximation to f as is the best
approximation. Of course, the smaller the constant cn, the better the operator
Ln and therefore we would like to find Ln which will make cn as small as
possible. It turns out, as we will explain in a little more detail shortly,
that the best constants cn behave like const, log n; in particular, they tend
to infinity with n. Thus unfortunately, the cn in (6.1) cannot be replaced by
a constant c which is independent of n.
Finding operators L which satisfy (6.1) is intimately connected with the
construction of projectors onto the space IIn. In fact Ln satisfies (6.1) if
and only if it is such a projector, that is, if and only if LR(P)= P for all
Pe IIn. Indeed, if (6.1) holds and f is in IIn, then En(f)=0 and hence Ln(f) = f.
On the other hand if L is such a projector then for any fsC(I) and Pell , we
have
(6.2) ||f-Ln(f)|| = ||Ln(f-P) - (f-P)|| < (HLJI+I) ||f-P||,
where ||Ln||:= sup ||Ln(f)||/||f|| is the norm of Ln onC(I). feC(I)
Taking an infimum over all P in (6.2) shows that (6.1) holds with cn =
imji+i. The smallest constant cn which can be used in (6.1) is roughly speaking
||Ln||. We have seen that we can always take cn < ||Ln||+1. On the other
hand for some appropriate f, and with I the identity operator, we have
APPROXIMATION OF FUNCTIONS 11
|f-Ln(f)|| = ||I-Ln|| ||f|| > (||Ln||-l)||f|| > (||Ln||-l) En(f).
Hence, whenever (6.1) holds, we must have cn > ||Ln||-l.
This means that to make cn small, we had better make ||Ln|| small. We
are therefore led to the problem of constructing Ln with the smallest possible
norm. This turns out to be a very difficult problem which is only solved in the
special cases n=0,l. Nevertheless, it is possible to construct operators Ln
which have close to the smallest possible norm. One of the simplest and most
important methods of doing this is to use polynomial interpolation.
If X: x ,...,xn are n+1 points from the interval I and fe C(I), then
there is a unique polynomial Pn(f):= Pn(f,X) which interpolates f at the points
in X. In fact, we have the Lagrange representation for Pn:
n .ILXx-x.:) (6.3) Pn(f,x) = J f(xk)lk(x) ; lk(x):=
3 n K( x , _ x < ) •
k=o 3 A K k 3J
Then PR is a linear operator which is a projector from C(I) onto IIn. It is
simple to compute the norm of PR:
(6.4) MPnll= max |A(x)|. X6l
where
n (6.5) A(x):= A(X,x):= £ |lk(x)|
k=o
is called the Lebesgue function of Pn.
There is no simple description of interpolation points X which will
minimize ||Pn||; however, the work of Kilgore [K] and de Boor-Pinkus [B-P] give
their uniqueness and some of their properties. The most obvious choice of
interpolation points is to space the x^ equally in the interval I. But
disappointingly, the norms of the resulting projector are then very large, in
fact they grow exponentially with n. A much better choice for interpolation
points X is the zeros of the Chebyshev polynomial Cn given in (5.3). In fact,
with this choice, ll^nll - (2/n) log n + 1 , n=l,... . Hence, this projector
has within constants the smallest possible norm. With this, we have
THEOREM 6.1. Ijl Pn is the projector corresponding to interpolation at the
zeros of the Chebyshev polynomial Cn, we have I|PnlI < (2/n) log n + 1 .
For any fe C(I),
12 RONALD A. DEVORE
(6.6) ||f-Pn(f)|| < [(2/n)log n + 1] En(f) n=l,2,... .
For a proof of this theorem, we refer the reader to the book of Rivlin
[R, p.18] on Chebyshev polynomials.
For n small, log n is not too large so that the approximation P
comparable with the best approximation. On the other hand, there are functions
f for which the right hand side does not tend to 0 and even more to the point
for which Pn(f) does not converge to f. So in spite of the attractiveness of
polynomial interpolation, this type of approximation can not even give a proof
of the Weierstrass theorem.
7. Degree of approximation. We have yet to discuss the behavior of En(f). We
expect that the nicer the function f then the faster En(f) converges to zero.
One result in this direction is the following:
THEOREM 7.1. TE f ij3 r times continuously dif ferentiable on I=[-l,l], then
(7.1) En(f) < Cr ||f(r)|| n"r, n=l,2,... .
Thus for example, we know that En(f) tends to zero at least as fast as 1/n o
whenever f is differentiable, 1/n when it is twice differentiable, and so on.
Estimates of the type given in Theorem 7.1 have a rich history. The first
results of this type were given at the beginning of this century by Bernstein
[Be]. Later, Favard [F] found the best constant Cr. Jackson [J] then refined
(7.1) by using subtler measures of the smoothness of a function f such as its
modulus of continuity oo(f,t) defined for fsC(I) by
co(f,t):= sup |f(x)-f(y)|. |x-y|<t x,ysl
THEOREM 7.2 (Jackson) Let r= 1,2,... . If f is r times continuously
differentiable, then
(7.2) V f > < Cr n~ r fcKf^^n"1), n=l,2,... .
The continuity of f insures that oo(f,t)->0 as t-K) and therefore (7.2) with r=0
shows that En(f)-K), n->». Hence (7.2) contains the Weierstrass theorem as well.
There are now several different techniques for proving Jackson's theorem.
One of the most important is to use the transformation x= cos 9 as described in
§3. If fs C(I), then g(9):= f(cos 9) satisfies co(g,t)< co(f,t) and if f is r
times continuously differentiable so is g. Using these ideas, Theorems 7.1 and
7.2 follow from their counterparts for trigonometric approximation.
APPROXIMATION OF FUNCTIONS 13
To approximate a function ge C (T), we can use convolution operators.
Namely, if KR is a trigonometric poynomial of degree n, then
(7.3) Ln(g,9):= g*Kn(0):=2i jg(9-t)Rn(t)dt
is likewise a trigonometric polynomial of degree n. In order that L preserve
constant functions, we shall require that
(7.A) jY(t>d t = 2rc n - '
n It is also convenient to take K non-negative and even.
If we want L (f) to provide a good approximation to f, the kernel K
should concentrate its mass near the origin (similar to the delta function).
In fact, in order to
it is enough to have
In fact, in order to prove Theorem 7.1 for r=0 or (7.1) for r=l by using L ,
Jl o 2
(7.5) [ sin2t/2 K (t)dt < const, n ^n n
Let us indicate how (7.5) gives a proof of these results. From (7.5) and
the Cauchy-Schwarz inequality for positive functionals, we find
pJt pH r pH o -.1/1 r
(7 .6) | t |K ( t )dt < il | s in t / 2 | K ( t )dt < n s in z t / 2 K ( t )d t < -J.TT « T T n L J_ n j n
Now if f is continuous and M:= ||f'||, then |f(G-t)-f(9)| < M |u|. Since
L (f(9),9) = f(9) (because L preserves constants), we have
(7.7)|Ln(f,9) - f(9)| < ji J |f(9-t)-f(9)| Kn(t)dt < J M|t| Kn(t)dt < CM/n,
which is (7.1) when r=l.
In this same way, we can also prove Theorem 7.2. This requires the
inequality w(f,t) < (nt+l)«(f,1/n), t>0, which follows from the
subadditivity of w: (w(f,t-+t2) < w(f,t-.) + w(f,t«)). Using this and the
inequality |f9-t)-f(9)| < co(f,|t|) as in (7.7) gives Theorem 7.2 because of
(7.4) and (7.6).
There are many choices of kernel K which satisfy (7.4). Indeed, since 2 n
sin t/2 = V2(l-cos t), (7.4) can be restated as a condition on the first
Fourier coefficients $n(l) = f e_ltKn(t)dt of Kn, namely: « TT
14 RONALD A. DEVORE
(7.8) l-£n(l) < C n'2.
Thus any positive trigonometric kernel K which have integral one and
satisfies (7.8) has the desired properties.
One of the simplest example of such a kernel was given by Jackson:
4 K n ( t ) = Xn ( sin T/l ) ' m : = (n/4l+1> n=l,2,...,
with Xn chosen so that Kn satisfies (7.4). One easily shows that X^ ~ n and
then deduces (7.4) (see [L, p.55]).
8. Piecewise polynomial approximation. Polynomials are not usually the best
choice for applied problems or numerical computation. For one thing it is not
easy to evaluate a polynomial of high degree. Piecewise polynomial functions
of low degree are much more desirable in computation. Indeed, it is the case
that most numerical algorithms are based in one sense or another on some form
of piecewise polynomial approximation.
We consider once again the interval I:=[-1,1] and let
T: -l=tQ<t1<..•<tn=l be an increasing sequence of points from I. Then T
partitions I into n intervals I -: = [ t-t- , t j), j=l,...,n-l, In: = ( tn_p t n]. By
S (T) , we denote the piecewise polynomial functions of degree r-1 on T. That
is, Ss S (T) means that S(x)= P^(x), xsl •, with P- a polynomial of degree < r
for j=l,...,n. Sometimes we are only interested in functions from S (T) which
have some prescribed continuity at the points t . These are called spline
functions. For example the continuously differentiable functions in -S^(T) are 1 2
called C cubic splines; those that are twice differentiable are called C
cubic splines, etc. The simplest spline functions are the truncated power
functions (x-c)^, k=0,... with
x * ( 0, x < 0
\xk, x > 0
To describe the error in approximation by piecewise polynomials it is
enough to consider how the error in polynomial approximation on a subinterval
J:=[a,b] of I depends on the length of J. For this, we let
(8.1) Er(f,J):= inf ||f-P||(J) Pen -
r-1
with the norm being the sup norm (on the interval J), as usual.
Now if f has r continuous derivatives we can form the Taylor polynomial
Ta of f at a:
Ta(x):= f(a) + f'(a)(x-a) + ... + f(r-1>(a)(x-a)r/(r-l)!.
APPROXIMATION OF FUNCTIONS 15
We have the well known error formula for
(8.2) f(x) - Ta(x) = ( ^ ) r J f(r)(t) (x-tj^dt.
If in the integral on the right side of (8.2), we replace f(r) by ||f(r)|| and then integrate, we see that
(8.3) E r(f,j) < 4 r Mf ( r ) i i u i r ,
with |J| the length of J. Thus, as the length of J tends to zero the error of
approximation E (f,J) goes to zero like |J|r.
The simple inequality (8.3) already tells us a lot about approximation by
the elements of S (T). Namely, if
6T:= max |I.|,
l<j<n
then we have
THEOREM 8.1. IjE f has r continuous derivatves on I, there is a piecewise
polynomial Se 5 (T) such that
(8.4) l|f-s||(i) < 4 l M f < r )M $r
Indeed, we can define S on the interval I. to be the Taylor polynomial of f
for the left end point of I. so that (8.4) follows from (8.3). 1 (r)
Sometimes it is useful not to assume that f is continuous but only that f ~ ' is absolutely continuous and f ' is in L for some Kp<>. In
P this case, if we apply Holder's inequality to the integral in (8.2), we find
(8.5) Er(f,J) < j ||f(r)||p(J) |J|r"1/p.
Hence, there is a spline Sc£ (T) which satisfies
(8.6) l|f-S||(D < j ~ i Mf ( r )M p(D 4"1/P-
There are a variety of other estimates (see [S]) for the error in spline
approximation. For example, E (f,J) can be estimated by const. «(f,6T) or (k) k r
by const. ||fv '|| §T, for any 0<k<r. Remarkably, it is also possible to
prove these same estimates for approximation by spline functions which have
smoothness. For example, we have
16 RONALD A. DEVORE
THEOREM 8.2. If f has r continuous derivatives on I, there is a spline
function S€ 5 (T) which has r-2 continuous derivatives and satisfies
(8.7) l|f-S||(I) < C ||f(r)||(I) $£,
with C depending only on r.
There are several techniques for proving estimates like (8.7). When r=l,
(8.7) follows from (8.5) since there is no continuity prescribed. When r-2,
the case of approximation by piecewise linear functions, we can take S as the
continuous piecewise linear function in -^n(T) which interpolates f at each of
the t., i=0,...,n. Interpolating splines can also be used for other small
values of r but the question of where to place these points gets more and more
sticky as r increases and is still not solved for general r.
A more successful method to prove (8.7) was introduced by de Boor and Fix
[B-F]. It uses certain linear operators LT called quasi-interpolants. LT is (r-1)
a projection from C(I) onto 5^(T)A Cv '. While Lj,(f) does not interpolate
f in the usual sense, it uses only a finite number of values of f (hence the
name quasi-interpolant). Quite surprisingly the norms of the projectors L»T
are bounded independent of T. This is in stark contrast to polynomials where
as explained in §6, the norms of projectors onto II must tend to infinity with
increasing n. Using the fact that the L™ are bounded, one proves (8.7)(see
Unfortunately, we do not have time to describe these powerful
approximation methods in more detail but certainly they will be brought up in
other lectures. The reader should also consult the books of de Boor [B] and
Schumaker [SJ.
9. Non-linear approximation. Up to this point, we have only discussed
approximation by elements from a linear space (polynomials or trigonometric
polynomials). But many other families of functions used in approximation are
not linear spaces. For example, we have the set R of rational functions R
of degree < n or the set ^ = S , °f a H piecewise polynomials of degree k
which have n pieces.
Approximation by such non-linear families can sometimes give dramatic
improvement in the error of approximation. For example, approximation to the
function f(x)*=|x| was extensively studied by S. Bernstein [Be] because it is
prototypical for polynomial approximation to one time differentiable
functions. Bernstein showed that for the error E (f) of polynomial
approximation, we have lim n E (f) exists. Hence E (f) behaves like n->» n n
const./n as n-*30. It was therefore a great surprise when D.J. Newman [N]
18 RONALD A. DEVORE
favor in so-called adaptive methods which find a partition of I into interval
I. by subdividing. For example, the adaptive analogue of (9.1-2) would proceed
as follows. We would choose some tolerance s>0 which is the error we are
willing to accept in the approximation. We call an interval J "good" if
Int(f',J) < s. Otherwise, J is "bad". We are looking to generate a set G of
disjoint good intervals which are a partition of I. If I itself is good, we
can simply take G = {I}. On the other hand if I is bad, we divide I in half
producing therefore two new intervals. Whichever of these two intervals is
good, we put into our set G . The bad intervals are further subdivided. We
continue in this way and the whole process stops when there are no more bad
intervals.
Of interest to us is how many good intervals will appear in the set G .
Birman and Solomjak [B-S] have shown that when f is in L for some p>l then
G contains no more than C /s intervals with C depending only on p. Thus for
example, if we take e=C /n, then G will contain at most n intervals I-,...,I . o s i n
If we then take S as in (9.2), (9.3) will again be satisfied. This means
that by assuming slightly more about the function f, we can approximate f by
the above adaptive scheme to the same accuracy as with the optimal knot
approximation. There are a variety of other results on adaptive approximation
which show that functions with singularities can be approximated better in this
way than by linear methods of approximation (see for example [B-R]).
The piecewise polynomials used in the above approximation are not smooth.
While it is possible to modify these methods so that the resulting piecewise
polynomial has smoothness Cv ' (in the case the piecewise polynomials have
degree r), it is sometimes of interest to approximate f by smoother functions.
It turns out that the same accuracy of approximation is attainable with
rational functions. For example, we have the following result of Popov [P]
THEOREM 9.1. If f's L (I), p>l, there is a rational function R of degree at
most n which satisfies
(9.4) ||f-R|| < c Mf'llp n"1.
There is a simple technique [D2] for deriving (9.4) from (9.3). For this,
we can assume that l|f'llp = 1. Then, as in the derivation of (9.3), we
choose n intervals I., such that
(9 .5) r | f M p < 1 / n JI.
3
APPROXIMATION OF FUNCTIONS 19
By refining these intervals if necessary, we can further require that |I.| <
1/n and still there are at most 2n of these intervals. We let £. be a point in
I. and define 3
• .(x):- ^ 2 ( x - V
2 + | I . | 2
If Y:= Yi *> tnen tne functions
<j>j:= <yY, j=l,...,n
are a partition of unity:
2 ^(x) = 1, xs I.
It can be shown [D9] that for suitably chosen £., the rational function
R:= E fUj) ^
has degree at most 4n and satisfies (9.4).
BIBLIOGRAPHY
[Be.] S. Bernstein, Sur l'ordre de la meilleure approximation des fonctions
continues par des polynomes de degre donne', Memoires publics par la classe des
sci. Acad, de Belgique (2)4(1912), 1-103.
[Be2] S. Bernstein, Sur la meilleure approximation de |x| par de polynomes de
degre's donnes, Acta Math., 37(1914), 1-57.
[B-S] M.S. Birman- M. Solomjak, Piecewise polynomial approximation of
functions of the class Wa, Math. USSR, Vol.2 No. 3(1967), 295-317.
[B] C. de Boor, A Practical Guide to Splines, Applied Math. Science,
Springer-Verlag vol. 27, New York, 1978
[B-F] C. de Boor- G. Fix, Spline approximation by quasi-interpolants, J.
Approx. Th. 8(1973), 19-45.
[B-P] C. de Boor- A. Pinkus, Proof of the conjecture of Bernstein and Erdos
concerning the optimal nodes for polynomial interpolation, J. Approx. Theory,
24(1978), 289-303.
20 RONALD A. DEVORE
[B-R] C. de Boor - J. Rice, An adaptive algorithm for multivariate
approximation giving optimal convergence rates, JAT 25(1979), 337-359.
[D-jJ R. DeVore, Degree of approximation, in: Approximation II, Academic
Press, New York, 1976, pp. 117-161.
[D2] R. DeVore, Maximal functions and their application to rational
approximation, in Approximation Theory, CMS Conference Proceedings Vol. 3,
1983, Amer. Math. Soc, p. 143-155.
[F] J. Favard, Sur les meilleures procede's d'approximation de certaines
classes des fonctions par des polynomes trigonometriques, Bull. Sci. Math.
61(1937) 209-224, 243-256.
[J] D. Jackson, On the approximation by trigonometric sums and polynomials,
TAMS, 13(1912), 491-515.
[K-S] S. Karlin - W. Studden, Tchebycheff Systems: vith Applications in
Analysis and Statistics, Interscience, Wiley, New York, 1966.
[Kr] M.G. Krein, The ideas of P.L. Chebyshev and A.A. Markov in the theory of
limiting values of integrals and their further developments, Amer. Math Soc.
Translations, Ser. 2, 12, 1-122.
[Ki] T.A. Kilgore, A characterization of Lagrange interpolating projections
with minimal Tchebycheff norm, J. Approx. Theory 24(1978), 273-288.
[L] G.G Lorentz, Approximation of Functions, Holt, Rinehart and Winston, New
York, 1966.
[N] D.J. Newman, Rational approximation of |x|. Michigan Math. J. 11(1964),
11-14.
[P] V. Popov, Uniform rational approximation of the class V and its
applications, Acta Math. Acad. Sci. Hung. 29.U977), 119-129.
[R] T. J. Rivlin, The Chebyshev Polynomials, Interscience, Wiley, New York,
1974.
[S] L. Schumaker, Spline Functions: Basic Theory, Interscience, Wiley, New
York, 1981.
[W] K. Weierstrass, Uber die analytische Darstellbarkeit sogenannter
willkurlicher Functionen reeller Argumente, Sitzungberichte der Acad. Berlin
(1885), 633-639, 789-805.
Proceedings of Symposia in Applied Mathematics Volume 36, 1986
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN
E. B. SAFF1
ABSTRACT. Approximation theory in the complex variable setting has its roots in classical function theory, but is rich in modern applications. Moreover, it is a subject that lends much insight into real approximation problems. Starting with the example of Taylor series, we describe methods (such as Faber series and interpolation) for generating good polynomial approximants to a function analytic on a compact set in the plane. We also discuss characterizations for polynomials of best uniform approximation and the "near circularity property." An introduction is given to the theory of Pad£ approximants, which are rational function analogues of the Taylor sections. We conclude by discussing some contrasts between the theories of polynomial and rational approximation.
1. TAYLOR SECTIONS.
The properties of the Taylor sections for an analytic function are a
convenient starting point for approximation and interpolation in the complex
z-plane (denoted by (D). This is because Taylor sections are least squares
polynomial approximants as well as interpolating polynomials. Indeed, if
f is analytic at z = 0, then the Taylor sections
(l.D s n ( z ) = s n ( f ; z ) : = i : ^ M z k n n k=Q K.
satisfy the interpolation conditions
(1.2) sjj)(0) = f ( j )(0), j = 0,1 n.
2 Moreover, the polynomials 1, z, z ,... are orthogonal with respect to the inner
product
(1.3) (g,h) := J L /g(z)hTzT|dz|, Cf : |z| = r, Cr
and, if f is analytic on |z| <_ r,
1980 Mathematics Subject Classification 41A10, 41A20, 41A21. •'•Supported by the National Science Foundation.
© 1986 American Mathematical Society 0160-7634/86 $1.00 + $.25 per page
21
http://dx.doi.org/10.1090/psapm/036/864364
2 2 E . B . SAF F
(1-4) ( f . Zk )=^ fr
fM*W'Kr L L (z)r 2 k rdz
C -"' ->C zk i Z
r2k 1 f f_Mdz = r2k fW(0) 2TT 1 y r k+ 1 k!
L r z
Thus , th e leas t square s (bes t L ) polynomia l approximatio n to f ou t o f n on
the c i r c l e C i s
k=0 (z K ,z K ) k= 0 / K k ! n
Here and below, n denotes the collection of all algebraic polynomials (with
complex coefficients) of degree at most n.
Another significant property of Taylor sections is that they provide
minimal projections onto n with respect to the sup norm.
Definition 1.1. Let A(A) denote the collection of functions f that are
analytic in the open disk |z| < 1 and continuous on the closed disk
A : |z| <.l. A projection P : A(A) -* nn is a bounded linear operator such
that P2 = P and P = I on n
Endowing A(A) and n with the sup norm
(1.6) ||f|| := sup (|f(z)|: z € A},
we expect to find "near best" polynomial approximants to f on A by utilizing
a projection with smallest possible norm. It was shown by Geddes and Mason [21]
that this minimal projection is the Taylor projection:
n ^r(k)/nx .
(1.7) (Snf)(z) := s_(f;z) = E L
TTm * •
n n k=Q K.
Namely, they proved
Theorem 1.2. Let P be any projection of the space A(A) onto the subspace
n . Then, for the operator norm induced by the sup norm over A, we have
(1-8) | | 5 n | | < | | P | | ,
where S is the Taylor projection of (1.7).
The proof of this theorem follows from the clever observation that for
any projection P
(1.9) ( V ) W - ^ T {t|=1(AtPAtfHz>^'
where At is the shift operator defined by (Atf)(z) := f(tz).
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 23
What can be said about the rate of convergence of the Taylor
sections? The answer is intimately related to the familiar Cauchy-Hadamard
formula for the radius of convergence p of a power series ]T, cu z • k=0
That is, K u
(1.10 ) - = l im sup |c . | 1 / k . p k+<* > K
The basic convergence result is the following.
Theorem 1.3. Let f be analytic in an open set that contains the closed unit
disk A. Then for the sup norm (1.6), the Taylor sections s satisfy
(1.11) lim sup ||f - s ||1/n = 1/p < 1,
where p is the radius of the largest open disk centered at the origin
throughout which f has a single-valued analytic continuation. Moreover, the
sequence s converges to f for |z| < p.
The above theorem, which provides a model for more general results to
be mentioned later, nicely illustrates the relationship between the degree of
convergence and the maximal circular region of analyticity for f; that is, the
larger this circular region, the faster the convergence. In particular, for
entire functions f
(1.12) lim ||f - sn||1/n = 0.
n->oo
While the proof of Theorem 1.3 can be deduced via (1.10), it is more
instructive to give an argument based on the interpolation property (1.2) of
Taylor sections. For this purpose we appeal to the Hermite representation
(cf. Walsh [62, §3.1]) for interpolating polynomials.
Lemma 1.4. Suppose f is analytic inside and on the simple closed contour r
that surrounds the n + 1 points z ,z..,...,z . JLf p is the unique
polynomial in n that interpolates f in these points, then
(1.13) f(z) - p(z) = T f$ffi&J d t ' Z ^ ^ r ' n
where w(z) := n (z - z, ) . k=0 K
Proof. Replacing f(z) by its Cauchy integral representation
f(z) = -^j- f^H: dt, z inside r, f(t) 1
'r
24 E. B. SAFF
equation (1.13) becomes
(i-i*) P(z)=^r/:|ti(t-(zi^)dt, an z. r
From (1.14) we see that p is indeed a polynomial in n and from (1.13) that it interpolates f in the points z. (the zeros of w(z)). •
It is important to keep in mind that (1.13) is valid even when the points z. are not distinct; in such a case interpolation is meant in the Hermite sense. That is, if z. is repeated I times, then
p ^ ( z k ) = f^'(z^) for j=0,l,... ,£-1. In particular, since the Taylor section s interpolates in the origin of multiplicity n + 1, equation (1.13) gives
(1.15) f(z) - sn(z) = ^ f 0 ^ - * t , |z|<r, • |t|=r * ( t " z )
for any r such that f is analytic on |z| < r. With the assumptions of Theorem 1.3, we deduce from (1.15) that
(1.16) lim sup ||f - sn||1/n < 1/p,
n->«>
and that the sequence s converges to f in |z| < p. If strict inequality holds in (1.16), then
lim sup |f(n)(0)/n!|1/n = lim sup ||s - s jll^11 < 1/P,
which implies that the sequence s (the Taylor expansion for f) converges to an analytic function in some disk |z| < R, with R > p. As this violates the definition of p, the equality of (1.11) follows.
As is often the case where n-th root asymptotics are concerned, p
results that hold for best L polynomial approximants (such as the Taylor sections) are also valid for best L^, 1 < p < °°, approximants. For example, Theorem 1.3 holds if the sections s are replaced by the polynomials p* of best uniform approximation to f on A.
Many of the elegant properties of Taylor sections can be found in the book of Dienes [14]. We mention only one more fact concerning the behavior of the zeros of Taylor sections for the case when the radius of convergence p is finite and positive. Namely, Jentzsch proved [14, p.352] that every point of the circle of convergence |z| = p is a limit point of the set of zeros of the
n4-1
sequence {s }°\ The zeros of the partial sums s (z) = (z - l)/(z - 1) of f(z) = 1/(1 - z) provide a simple illustration of this theorem.
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 25
2. POLYNOMIAL APPROXIMATIONS FOR FUNCTIONS ANALYTIC ON E.
Given a compact set E in the z-plane and a function f analytic on
E (i.e., f is analytic on an open set G D E ) , how do we generate good
polynomial approximations to f on E? When E is a closed disk, we can use
Taylor sections which are "good" in the sense of Theorem 1.3. For general sets
E we need a procedure that likewise reflects the geometry of E.
First, we insist that E does not separate the plane; that is, (C\E
is connected. This assumption is necessary if we expect to get uniform
convergence (of polynomials) to an arbitrary function analytic on E. For
example, the function f(z) = 1/z is analytic on the circle E : |z| = 1, but
is not the uniform limit on E of any sequence of polynomials because (by the
maximum principle) uniform convergence on |z| = 1 implies convergence to an
analytic function throughout |z| < 1.
The connectedness of C\E is also a sufficient condition for
polynomial approximation to functions analytic on E as is stated in the
following version of the classical Runge's theorem (cf. [62, §1.10]).
Theorem 2.1. lf_ f is analytic on a compact set E that does not separate the
plane, then there exists a sequence of polynomials that converges uniformly to
f on E.
(The question of polynomial approximation to functions not analytic
on E is much more delicate and will be addressed in the next section.)
To prove Theorem 2.1, Runge's approach was to first form Riemann sum
approximations to the Cauchy integral representation for f. These Riemann sums
are rational functions whose poles lie outside E. Through a process of "pole
moving," the rational approximants are converted to polynomial approximants.
For reasonable sets E, we can generate polynomial approximants
more directly by constructing an analogue of Taylor series. This was the
fruitful approach taken by Faber [15]. To simplify the description of Faber's
method we assume that E is a compact set (not a single point) whose complement
C*\E with respect to the extended plane is simply connected. The Riemann
mapping theorem asserts that there exists a conformal mapping w = <j>(z) of
C*\E onto the exterior of the unit circle in the w-plane (see Figure 2.1). We
can insist that <}>(«>) = «> and $l («>) > 0 so that, in a neighborhood of
infinity,
(2.1) *(z) = f + b 0+ ^ + -|+ •••> c > 0.
26 E. B. SAFF
Figure 2.1 ( ^ \ y [ h ][\
^ w-plane
The polynomial basis {wn}°° for Taylor expansions in the w-plane now corresponds to the functions {cf)(z)n}°° in the z-plane. The obvious fly in the ointment is that the latter functions are not (in general) polynomials. However, <J>(z)n does have a polynomial part that will serve our purpose. Indeed, from (2.1), we get
(2.2) *(z)n = ( 4 + ' ' -) + iMn(z)
= F n(z) +iM n(z),
where F (z) = zn/cn + • • • e n and M (z) is analytic at infinity. We call F the n-th degree Faber polynomial for the set E, but caution the reader that many authors reserve this terminology for its monic brother c Fn(z).
For a function f analytic on E, our goal is to obtain an expansion of the form
(2.3) f(z) = a0FQ(z) + a ^ U ) + a2F2(z) + . • . .
For this purpose, it is convenient to introduce the inverse mapping of <{>, denoted by z = ip(w), and the level curves
(2.4) rr : |*(z)| = r (r > 1)
which are images under ty of the circles C : |w| = r (see Figure 2.1). Since F is the principal part of the Laurent series (2.2) for <|>(z)n, we can write
r
and transforming to the w-plane we obtain
Lr To derive the expansion (2.3) we begin with the Cauchy integral
representation for f(z):
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 27
rr Lr
Since f(^(s)) is analytic in an annulus of the form 1 < |s| < R, we can
expand this function in a Laurent series:
f(*(s)) = £ ansn.
n=-oo
Substituting this series into (2.6) and recalling (2.5) we get
00 a r n i / \ . °° a r n • / \ • °° ( 2 - 7 » fM - „ L rf /Cr * * $ $ - n?0 KT fCr * j m • „?„ * • ' (the integrals with negative n vanish because the integrand is 0(l/s )
near »).
To summarize, we obtain the Faber expansion for f by forming the
Taylor series for the Cauchy integral of the composition foif; and substituting
FR for w . The process is diagrammed below.
f ( z ) _ ( f 0 , ) ( w ) _ _ i T ^ I f i l M d s = ± a / _ | : anFn(z)
Exploiting the relationship between Taylor and Faber series leads to
the following analogue of Theorem 1.3.
Theorem 2.2. Let f be analytic on E and let p(>l) be the largest index
such that f has a single-valued analytic continuation throughout the interior
of the level curve r . Then the partial sums of the Faber series for f
satisfy
(2.8) lim sup ||f - £ akF.||J/n = 1/p < 1,
n+oo k=0 K K b
where || • || denotes the sup norm on E. Moreover, the Faber series converges
to f throughout the interior of r .
What does Theorem 2.2 say to realists who do approximation on an
interval? If E = [-1,1], then ^(z) = z+\z - 1 is just the Joukowski
transformation with inverse
(2.9) ip(w) = |(w + w " 1 ) .
Fo r n > 1 , th e polynomia l par t o f <f>(z) n i s th e sam e as th e polynomia l par t o f
<Kz) n + cKz)- n = wn + w" n
28 E. B. SAFF
i A which reduces to 2cos ne when w = e . Thus the Faber polynomials are (apart from a multiplicative constant) the same as the classical Chebyshev polynomials T , and the Faber series reduces to the (orthogonal) Chebyshev expansion! For the Joukowski transformation, the level curve r is an ellipse with foci ±1
1 and semi-major axis of length (r + r )/2. Hence Theorem 2.2 asserts that the
Chebyshev expansion for a function f analytic on [-1,1] will converge to f throughout the largest ellipse with foci ±1 in which f is analytic.
A more in-depth discussion of Faber series and Faber transforms is given in [13], [17], [22], [49], and [ 2 ] . The reader will find the subject rich in applications to geometric function theory.
Polynomial approximants can also be constructed via interpolation. As we observed, the Taylor section s (f,z) of (1.1) interpolates f in the origin or, more precisely, in the zeros of the polynomial w (z) = z . Since the mapping function <j> for the disk AQ : |z| <_ c is just cj)(z) = z/c, the w (z) trivially satisfy
(2.10) lim |w (z)| 1 / n = c|4>(z)| n->~ n
In fact, this asymptotic relation,when used to estimate the integral in the
Hermite interpolation formula (1.15), is all that is needed to prove the
convergence assertions of Theorem 1.3. For more general compact sets E (with
(D*\E simply connected) this suggests that we determine a triangular scheme of
points for E
a(0)
(2.11)
,(D ft(D
R(n) An) (n) e0 » 31 ' " ' " ' 3n
n , N
such that w (z) := kQ 0 (z - 3£ ;) satisfies (2.10) uniformly on compact subsets of C\E, where 4>(z) = z/c + ••• is now the mapping function of (2.1). Coupled with the Hermite interpolation formula this leads to the following result (cf. [62, §7.2]).
Theorem 2.3. If the scheme of points (2.11) of E satisfies (2.10) uniformly on compact subsets of C\E, then the assertions of Theorem 2.2 remain valid when the Faber sections are replaced by the sequence of polynomials p that interpolate f in the successive rows of (2.11).
n-t-1
For the unit disk A: |Z| < 1, the zeros of w (z) = z - 1 (the
roots of unity) provide "good points" of interpolation in the sense of (2.10).
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 29
When E is bounded by a smooth Jordan arc or curve, we obtain good points by taking the images under z = ijj(w) of such equally spaced points on |w| = 1 . For example, if E = [-1,1], the images of the roots of w + i = 0 under the transformation (2.9) yield the zeros of the Chebyshev polynomial T .
There are good points of interpolation that can be determined without knowledge of the mapping function. These are the Fekete points (cf. [62,§7.8]).
Definition 2.4. Let V (z ,Zi z ) := n (z. - z.) denote the Vandermonde i<j ' J
determinant of order n + 1. The points &£ = \ e E for wnicn the maximum
max {|Vn(z0,z1,...,zn)|; zk e E, k=0,...,n}
is attained are called Fekete points for E.
The positive constant c that appears in the expansion (2.1) for the mapping function has great importance; it is called the transfinite diameter or logarithmic capacity of E and is denoted by cap(E). Such terminology arises from an electrostatics problem that we now describe.
For a compact set E (with (D*\E simply connected) we distribute a unit charge over its boundary 3E so that equilibrium is reached in the sense that the energy with respect to the logarithmic potential is minimized. This corresponds to the problem of finding the minimum of the energy integral
(2.12) I[y] := f f log|q - ^ l ^ d y U ^ d y ^ ) r8E J 3E
over all positive unit measures y supported on 3E. The unique measure yE
that minimizes I[y] gives the equilibrium charge distribution with potential
(2.13) UE(z) := f log|z - t|"1dyE(t 3E
Apart from a small exceptional set, this potential has the constant value
I[vc] on the boundary of E. The capacity of E is defined as
(2.14) cap(E) := exp(-I[yE]).
In this context, the essential criterion for (2.11) to be "good points" of interpolation is that the discrete measures
(2-15) U n - ^ i S ^ ) ,
where 6($>n') denotes the unit measure supported at 3^n , converge to the equilibrium measure yF (in the weak-star topology). Such convergence implies
30 E. B. SAFF
that for z € C\E,
(2.16) / log|z - t|_1dyn(t) — U F ( z ) as n-**,
which is equivalent to property (2.10). In this light, the fact that the Fekete
points are good interpolation points seems reasonable since they are defined by
minimizing the energy log(V" ) for n + 1 distinct point charges.
The above discussion applies to more general sets E and these
aspects of potential theory can be found in Hille [27, §16.4], Tsuji [57], and
Landkof [28]. We mention one further characterization of good interpolation
points (cf. [62, §7.4]).
Theorem 2.5. The points 3£n' of E satisfy (2.10) if and only if
lim||wn||l/n = cap(E).
3. POLYNOMIALS OF BEST UNIFORM APPROXIMATION.
Let E be a compact set in the z-plane and f a function continu
ous on E. Since n is finite dimensional, there exists a polynomial p* e n of best uniform approximation to f on E in the sense that
(3.1 ) || f - p*||E = inf{|| f - q|| £ : q e l y ,
where || ||E is the sup norm over E. Moreover, if E contains at least n + 1
points, then n is a Chebyshev subspace and hence p* is unique (see §2 of
DeVore's notes). A fundamental characterization of best approximation in the
complex variable setting is the Kolmogoroff criterion:
Theorem 3.1. A polynomial P e nn is a best uniform approximation to f on E
if and only if
(3.2) min Re{(f(z) - p(z)) q(z)} <. 0 zcM
holds for every q e n , where M is the set of extremal points for
f(z) - p(z); that is,
(3.3) M := (z € E : |f(z) - p(z)| = ||f - p||E>.
A proof of Theorem 3.1 is given in Meinardus [34, p.15].
For real functions, condition (3.2) asserts that there is jio polyno
mial in n that has the same sign as the error f - p* on its extremal point
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 31
set. This is the essential fact that is used to prove the Chebyshev Equioscil-
lation theorem. For complex functions, an analogue of the alternating-sign
patterns was developed by Rivlin and Shapiro (cf. [48, §2.6]) and is called the
extremal signature.
Let's turn to the geometric aspects of best approximation. We let
A(E) denote the collection of functions f that are analytic in the interior
of E and continuous on E. If f € A(E) and E is bounded by a Jordan
curve r, then best polynomial approximation to f on E reduces to best
approximation on r; that is, by the maximum principle,
IIf " P l l r= l | f " PH E. a l l P « V
The image of r under f - p is a curve in the w-plane which we denote by
(f - p)(r) and call an error curve. In this context, the problem of best
uniform approximation to f is equivalent to finding an error curve that is
contained in a disk of minimal radius about w = 0.
It had been observed by some authors and crystallized by Trefethen
[53] that the minimal error curve (f - p*)(r) often has a near circularity
property in the sense that it winds around the origin n + 1 times and is close
to being a perfect circle. Before proceeding with a discussion of this phenome
non we give a consequence of perfect circularity.
Lemma 3.2. Suppose E is bounded by a Jordan curve r, f e A ( E ) , arid p e n .
If the error curve (f - p)(r) is a perfect circle with center at the origin
and winding number >. n + 1, then p is the polynomial of best uniform approx
imation to f mi E out of n .
Proof. If, to the contrary, there exists q € n such that ||f - q||E < ||f - p||E,
then
|(f - p)(z) - (q - p)(z)| = |(f - q)(z)| < ||f - p||E = |(f - p)(z)|
for all z on r. By Rouchg's theorem, this means that q - p and f - p
have the same number of zeros interior to r. But since this number is at least
n + 1 and q - p c n , we arrive at a contradiction.D
As a simple application of Lemma 3.2, consider the problem of finding
the polynomial in n that is of best uniform approximation to f(z) = zn on
A : |z| <_ 1. Since f itself has the perfect circularity property, then
p* = 0. In other words, the Chebyshev polynomials for the disk A are just the
powers of z.
Using finite Blaschke products we can produce other examples of per-
32 E. B. SAFF
fectly circular error curves, but only for certain rational functions f. (The
reader is invited to determine the polynomials of best approximation on A to
f(z) = l/(z - a), |a| > 1.)
While near circularity is a property that can be made precise in an
asymptotic sense (cf. Trefethen [53]), its practical importance is in the con
struction of yery accurate polynomial approximations. The starting point for
this algorithm is an elegant theorem due to Caratheodory and Fejer
(cf. [22, p. 497]).
^ k Theorem 3.3. Given a polynomial p(z) = YJ C U Z > there exists a unique power k=0 K
series extension B(z) = p(z) + J2 ctz analytic in the unit disk A that
k=v+l K
minimizes ||B|| among all such extensions. Moreover, B(z) is a finite
Blaschke product with at most v zeros in the disk.
The solution B(z) to the minimal extension problem of Theorem 3.3
can be computed quite easily. We know that it has the form
ID + . . . + t> zv
(3.4) B(z) = X-^ ?— , X > 0, b + • • • + b zv
o v (k) and that it extends p(z) in the sense that Bv ;(0)/k! = c. for k = 0,...,v.
When the c.'s are real, this system reduces to an eigenvalue problem for a
(v + 1) x (v + 1) Hankel matrix formed from the c. ' s. It turns out that the
constant A (which equals ||B|| ) in (3.4) is the largest of the absolute
values of the eigenvalues of this matrix and that the coefficients b. are
determined by a corresponding eigenvector. For complex coefficients c. , the
procedure is modified by working instead with the largest singular value of the
same Hankel matrix.
How does the CF extremal problem of Theorem 3.3 relate to the problem
of best polynomial approximation on A? Finding the minimal error f - p* for °° k
f(z) = Y au z 1S equivalent to minimizing o K
n . oo
(3.5) | | E c .z k + £ a k z k |L , C : | z | = 1, k=o K k=n+l K L
over all (n + 1) -tuples (c ,...,c ), which is the converse of the CF
problem. Nonetheless, we can utilize Theorem 3.3 by performing a truncation and
an inversion z -> 1/z. Following Trefethen [53], we truncate the given
series for f at k = N so that XI a.z is negligible; that is, we work
N N+1
with X 3|ZK =: zn V z ) instead of X a.zK in (3.5). n+1 K n+1 K
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 33
Next we solve the CF problem for the inverse polynomial
(3.6) p(z) := z ^ ^ q d / z ) 6 1 ^ ,
to obtain the minimal extension Blaschke product
B(z) = p(z) + £ c*zk. k=N-n K
Since
(3.7 ) ||B| | = | |z N B(l /z) | | = | |z n + 1 q(z ) + £ cjj_ k z k + ± c*z N " k | | L L k=0 N k=N+l K L
then discarding the terms involving negative powers of z (which have small
coefficients), we see that the choice
ck = cN-k' k = 0'lj • • • »n,
in (3.5) gives an error curve with a near circularity property.
The polynomial approximants obtained via this CF method are often much
better in the sup norm sense than the Taylor sections. Moreover the technique
can be extended to find near best rational approximants (cf. [54], [56]). The
theoretical underpinnings of the CF method are contained in a paper of Adamjan,
Arov, and Krein [l] who generalized the results of Caratheodory, Fejer, Schur,
and Takagi.
Let's now turn to the question of convergence of approximating poly
nomials. We naturally ask, what is the extension of the Weierstrass theorem to
the complex setting? Runge's theorem (Theorem 2.1) is not a true generalization
because it assumes far more than continuity - it requires f to be analytic in
an open set containing E. Only in 1951 did the Russian mathematician Mergelyan
confirm the suspicions of many who had worked on the problem by proving that the
assumption on f in Runge's theorem could be weakened.
Theorem 3.4 (Mergelyan [35]). Let E be a compact set that does not separate
the plane. _Tf f € A(E) (that is, f is analytic in the interior of E and
continuous on E), then there exists a sequence of polynomials that converges
uniformly to f on E.
The proof of Mergelyan's theorem (cf. [17], [41]) is a tou/t do. {OHQ.<L
that utilizes the Tietze Extension theorem as well as Koebe's 1/4-theorem.
Observe that the Weierstrass theorem is a special case of Theorem 3.4 because an
interval has an empty interior and so A(E) reduces to the collection of
functions continuous on E.
As an application of Theorem 3.4 we mention the following
34 E. B. SAFF
generalization of the Cauchy integral formula: If r is a rectifiable Jordan
curve and f is analytic in the interior ® of r and continuous on
G u r , then
(3.8) f(z) = 2 T / / = 4 d t ' z e G-
To prove (3.8) we take a sequence of polynomials p that converges uniformly
to f on G u r (the special case of Mergelyan's theorem used here was proved
in 1926 by Walsh [61]). Since the Cauchy integral representation holds for
polynomials, we have for z e G
f(z) = lim pn(z) = 11m ^ f ^ i t - ^ f j U L dt,
as claimed in (3.8).
Results on the rate of polynomial convergence require special
assumptions on the smoothness of the boundary E as well as on the modulus of
continuity of f. For some extensions of the Jackson type theorems, see Sewell
[47].
As the reader might suspect from the results of §2, geometric rates of
convergence characterize the functions that are analytic ii E. Before making
this precise we present a useful lemma dealing with the growth of polynomials.
Lemma 3.5 (Bernstein-Walsh [62, §4.6]). Suppose that E is a compact set (not
a single point), whose complement C*\E is simply connected. If p e n satisfies |p(z)| < M for z cm E, then
(3.9) |p(z)| < Mrn, z on rr (r > 1),
where r is the level curve defined in (2.4).
Proof. We apply the maximum principle to g(z) := p(z)/cj>(z)n, where cf>(z) is
the mapping function of (2.1). Observe that since p 6 n and <j> has a simple
pole at oo, then g(z) is analytic exterior to E, even at «. As z
approaches the boundary of E from the outside, |g(z)| < M; hence |g(z)| < M
for all z outside E. For z e r , the last inequality gives (3.9). •
We can now prove
Theorem 3.6 (Walsh [62, §4.7]). J_et E be as in Lemma 3.5, f a function
continuous on E, and set
(3.10) En(f) :=||f - p*||E,
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 35
where p* is the polynomial in n of best uniform approximation to f on E.
Then f is analytic on E if and only if
(3.11) lim sup E n(f)1 / n < 1.
n -oo
Proof. In one direction the proof is trivial. Namely, if f is analytic on
E, then Theorem 2.2 asserts that the Faber sections and, a fiosutio/U, the
polynomials of best approximation converge geometrically.
On the other hand, if (3.11) holds, then
(3.12) limsup||p*+1 - p*||J/n < 1.
n->oo
Appealing to Lemma 3.5, we deduce that, for some r > 1,
lim sup ||p*+1 - p*||J/n< 1.
n->-°° r
But this means that the sequence ( P * ) Q converges in the interior of r ,
necessarily to an analytic extension of f. •
As with several of the theorems presented, Theorem 3.6 is not stated
in its full generality - the assumption on E can be considerably weakened.
4. PADE APPROXIMANTS.
Polynomials have the advantage of being easy to evaluate. But the
same is true of rational functions. Moreover, rational functions have poles
which can imitate the singularities of a function to be approximated. In this
section we introduce a class of interpolating rational functions called Pade
approximants. These rationals provide a natural extension of the Taylor
sections. (Standard references are [39], [ 4 ] , [5,6]; for a historical treat
ment, see Brezinski [11].)
Given a formal power series
(4.1) f(z) = E akzk,
k=0 K
we wish to construct a rational function of a certain type whose Taylor co
efficients match those of f as far as possible. To be precise, let
(4.2) n m s n := {R(z) = P(z)/Q(z) : P € nm, Q € nn, Q i 0}.
Then the matching condition can be stated as follows: For a fixed pair (m,n),
find an R e n „ such that m,n
(4.3) (f - R)(z) = 0(z£)',
36 E. B. SAFF
0
where I is as large as possible. (Here and below, 0(z ) denotes a power
series with lowest order term z .) What is a realistic value for 11 Since
there are m+1 free parameters in the choice for the numerator P, and n+1
in the choice for the denominator Q, there are m+n+1 parameters available in
the ratio P/Q (one parameter is lost in the division process). Thus we expect
to have I >_ m+n+1 or, equivalently, to match the first m+n+1 terms of (4.1).
Unfortunately this is not always possible (try m=0, n=l, and f(z) = z). To
circumvent this difficulty we work, instead, with the following linearized
version of (4.3).
Given (m,n), select Pm„ e n and Q_(20) e n so that mn m mn n
If f is (m+n)-times differentiate at z=0, then (4.4) is equivalent to
(Qm.f - pmn)(k)(°) = °> k=0,1,...,m+n. VMmn mn' x
Notice that (4.4) represents a homogeneous system of m+n+1 equations in m+n+2
unknowns (the coefficients of Pmn and Q m n ) . Hence this system has a nontri-
vial solution, necessarily with Q m n i 0. With this observation we give
Definition 4.1. The Pade approximant (PA) of type (m,n) to f is the rational
(4.5) [m/n](z) :- P . J z J / ^ J z ) ,
where Pm„ e n and QmA?0) € n satisfy (4.4). mn m mn n J
Notice that for n=0, the PA reduces to a Taylor section of (4.1):
m . (4.6) [m/0](z) = E a.zk.
k=0 K
Tacit in Definition 4.1 is the fact that a PA is unique. To prove
this, suppose that
(4.7) (Qjf - P1)(z) = 0(zm + n + 1) and (Q,,f - P2)(z) = 0 ( z m + n + 1 ) ,
where pi» PQ € nm an^ Qi» QQ € nn* ^n "multiplying the first equation in
(4.7) by Q2 and the second by Q ^ we deduce on subtracting that
(4.8) QXP2 - Q2PX = 0 ( z m + n + 1 ) .
But the left-hand side of (4.8) is a polynomial of degree j<m+n. Hence
Q1P2 " Q2P1 ~ ° or Pl/Ql ~ P2 / Q2*
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 37
The Pade numerators and denominators are rich in algebraic properties
such as the 3-term recurrence relations found by Frobenius (see [4 , 24,39] for
a detailed discussion of these properties). Here we pause only to mention a
representation for Q that illustrates the important role played by the
Toeplitz determinants
(4.9) D(m/n)
3m-l
am+l
Vn+1 Vn+2
m+n-1 am+n-2
(a. := 0 if k < 0)
formed from the coefficients of f.
Theorem 4.2. (Jacobi). lf_ D(m/n) f 0, then
(4.10) f(z) - [m/n](z) = 0(z m + n + 1)
and the Padg denominator Qmn normalized by Qmn(0) = 1 js
^ n > Qmn(z) "Dm
am-l
am+l a_
Vn+1 Vn+2 .,n n-1
m+n am+n-l
Vl
A fast numerical method (based on the Euclidean algorithm) for solving
Toeplitz systems and computing PAs is described in [10].
The PAs for (4.1) are typically displayed in a doubly infinite array
known as the Pade table:
[0/0]
! [o/i] [0/2]
•
[1/0]
[1/1]
[1/2]
•
[2/0] •
[2/1] •
[2/2] •
•
• •
38 E. B. SAFF
Here the first row lists Taylor sections; the 2nd row consists of PAs with at most one pole; the 3rd row consists of PAs with at most two poles; etc. The structure of this table was the subject of the 1892 thesis of E. Pade. He showed that the table breaks up into square blocks of identical entries, with the common entry not appearing elsewhere in the table. When all blocks are of size one, i.e., no entry is repeated, the table is said to be normal. Normal tables arise when all Toeplitz determinants D(m/n) are nonzero. It is possible for a Pade table to contain an infinite block of identical entries, but (as shown by Kronecker) such a table arises only for the power series of a rational function.
Of special interest in the Pade table are the diagonal entries, for these represent continued fraction expansions. Indeed, if
^ k d i z
(4.12) f(z) = E a / = dn + -i ,
0 K U 1 + d2z
1 + .
then an inductive argument shows that the successive truncations dQ , dQ + d,z, d~ + d,z/(l + d 2z), etc. are rational functions that have maximal contact with f at the origin. In other words, these truncations give the PAs
[0/0] , [1/0] , [1/1] ,..., [n/n] , [(n+l)/n] ,...,
which form a staircase of entries in the Pade table (the main diagonal and first superdiagonal). For many of the classical special functions (such as ez ), the [n/n] approximant in continued fraction form provides an accurate, computationally stable approximation that is considerably better than using the 2n-th degree Taylor section. Of course, continued fraction expansions of real numbers have played an important role in number theory and, in this respect, PAs provide their function theoretic analogues. (For further discussion of the continued fraction aspects of PAs, see [39], [59]).
The PAs for a function f have poles that can be used to predict the positions of the poles as well as other singularities of f. For example, the qd-algorithm (cf. [26, §7.6]) for computing the zeros of a given polynomial p is based on the fact that the poles in certain rows of the Pade table for f = 1/p tend to poles of f (zeros of p ). The basic row convergence theorem involved is the following.
Theorem 4.3. (de Montessus de Ball ore [4, p.139]). Let f be analytic in the disk D : |z| < R (0 < R <_ «) except for poles of total multiplicity v, none of which occurs at z=0. Then, as m->«>, the sequence of Pade approximants [m/v](z) converges to f(z) uniformly on e\/ery compact subset of
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 39
D\{ poles of f}. Furthermore, as m+«>, the poles of [m/v](z) tend, respectively, to the v poles of f in D.
For example, suppose that f is a meromorphic function in the plane whose poles are simple and occur at the points £k, where
o < U 2| < U2I < •••.
Then Theorem 4.3 asserts that the poles of [m/l](z) tend to ^; the two poles of [m/2](z) tend to ^, ^; etc.
The proof of Theorem 4.3 is based on the following simple observation (cf. [46]). Since
(Q f . p )(Z) = 0 ( z m + v + 1 ) , -XMmv mv' x ' v '
then for any Q € n , the product QP € n satisfies
(Qm Qf " Qpm )(z) = 0 ( z m + v + 1 ) , vxmv^ x mv' v ' K '
and so QP is the (m+v)-th Taylor section of Q Qf. Consequently, we can use the Hermite formula (1.15) to write
z m + v + 1(Q m Qf)(t) (4.13) (Qm Qf - QPm )(z) = J-,- / m + ™v dt, |z| < r,
provided Q Qf is analytic on |t| <_ r. If Q is chosen to be the monic polynomial whose zeros are the poles of f, then r can be taken arbitrarily close to R. On suitably normalizing the Pade denominators Q we find that the right-hand side of (4.13) tends to zero in D. In particular, at a zero £ of Q, we have (QmvQf)U) + 0 and so QmU) + 0 because (Qf)U) f 0. This means that every limit polynomial of the Q 's has zeros at the poles of f (the zeros of Q), which establishes the last assertion of Theorem 4.3. (This same argument can be applied to rational functions that interpolate in the "good points" discussed in §2; see [43].)
In proving convergence theorems for PAs, the essential question is: Where (asymptotically) are the poles of the PAs? In Theorem 4.3, the v poles of f serve as "attractors" for all the available poles of the [m/v] approx-imants. However, if f has fewer than v poles, then only a subset of the poles of [m/v](z) "know where to go," and the remaining poles may wander aimlessly, destroying convergence. The following simple example illustrates this point.
Consider a sequence of nonzero coefficients a for which there is a large discrepancy between the root test and the ratio test:
40 E. B. SAFF
(4.14) lim |a m|1 / m = 0 and lim sup |a m + 1/a m| = «.
m->oo m->oo
As shown by Perron [39, §78], it is possible to construct such am's so that
the sequence ^ am/
am +i^o
hdS ^imi't P01'nts that are dense in the plane. But,
from (4.11), we see that ^^^^+1 " the zero of the Pade denominator Q ^ i ^ )
for f(z) = l^a.z (which is an entire function). Hence the 2nd row of the o K
Pade table for this f has poles everywhere dense in the plane.
Even more startling is the following result due to Wall in [60] con
cerning the diagonal of the Pade table.
Theorem 4.4. There exists an entire function f such that the sequence of
diagonal PAs {[n/n](z)}°° for f is unbounded at ewery point in the plane
except z = 0.
In light of these anomalies, results on the convergence of PAs usually
pursue one of three directions:
(i) Proving uniform convergence for special classes of functions;
(ii) Replacing uniform convergence by a weaker condition, such as
convergence in measure or in capacity;
(iii) Extracting subsequences of PAs that do have the desired uniform
convergence properties.
An early step in the first direction was taken by Pade, who studied
the table for the exponential function. He showed that whenever m+n->oo5 the
approximants [m/n](z) for e z converge to e z uniformly on compact subsets
of the plane. Precise asymptotic results for the location of the zeros and
poles of these PAs were obtained by Saff and Varga [45]. The approximants
for the exponential have several important applications. For example, proving
the stability of certain numerical schemes for solving differential equations
boils down to showing that certain of these approximants are bounded by 1 in
the left-half plane.
A substantial extension of Pade's results for the exponential function
was obtained by Arms and Edrei [3 ] . They proved convergence of the approx
imants for the class of functions generated by totally positive sequences (also
cal1ed Polya frequency series).
The PAs for the class of Stieltjes functions have particularly elegant
properties. Here we discuss Stieltjes functions that can be written in the form
(4.15) f(z) = fb M * o * • z t
where y is a finite positive measure on [0,b], with 0 < b < «>. Such a
function is analytic in the cut plane C*\(-°°, -1/b] and has the power series
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 41
expansion f(z) = 2-, (-1) c.z , where the c.'s are the moments
• b
0
(4.16) ck := j tkdy(t), k=0,l,...
As we now show, the Pade denominators Q . for f are related to the poly
nomials that are orthogonal with respect to dy. Starting with the defining
property
(Vl,nf " V l , n » W = <>(z2n),
we replace z by -1/z and multiply by zn to obtain
• b
'0
where qn(z) := znQn_1>n(-l/z) € np and p ^ z ) := z^P^^-l/z) e V r
Then for j=0,l, — , we have
(4.17) qn(z) fb f ^ - zpn^(z) = 0(l/zn),
(4.18) qn(z) / Y^tMt) - z\^(z) = 0(zj/zn+1). '0
Next, we integrate with respect to z around a simple closed contour containing
[0,b] in its interior. Using the Cauchy formula, we find that
•b f qn(t)t
jdy(t) = 0, for j=0,l n-l; 0
that is,
(4.19) qn(z) = znQn.1>n(-l/z)
is the n-th degree orthogonal polynomial for dy. One consequence of this
relation is that the zeros of Qn-1 n(z) must be simple and lie on the cut
(-oo, -1/b). On writing the approximant [(n-l)/n] in the form
Pn , n(z) n A . (4.20) [(n-l)/n](z) = Q ^ ' V N = £ , +"
J7t ,
Vl,n l z j j=l L + ztnj
where the t .'s are zeros of qn(t), we deduce in a similar manner from
(4.17) that
b (4.21) f P(t)dy(t) = t A .P(t.)
J j=l nj nJ 0
for any polynomial P € n2n i. Hence, the constants A • are the Christoffel
42 E. B. SAFF
numbers for Gaussian quadrature (cf. [52]). Since Christoffel numbers are
positive, we see from (4.20) that the approximant [(n-l)/n](z) is itself a
Stieltjes function of the form (4.15) corresponding to a discrete measure dyn.
In particular, the zeros and poles of this approximant are interlaced along the
cut (-«>, -1/b). With all these facts in hand, a simple normal families
argument can be used to prove that [(n-l)/n](z)—•f(z) in C*\(-°°, -1/b].
This is a classical result due to Markoff [33], which has been further extended
by Stahl [50].
The example of Stieltjes functions shows that the Pade theory pro
vides a natural setting for generalizing the classical theory of orthogonal
polynomials. In this regard, convergence results for PAs to functions of the
form (4.15) with y a complex measure, were obtained by Magnus [32], Nuttall
and Wherry [38], and Stahl [51].
Many commonly occurring functions have "smooth" Taylor coefficients 2
in the sense that aLC-iai<+i/
at< h a s a 1 imit as k->°°. Convergence properties of
the PAs for such functions were investigated by Lubinsky [29], [30].
Space limitations preclude a discussion of results concerning the
convergence of subsequences of the rows, columns, or diagonals of the Pade
table. We also leave it for the reader to delve into results (such as the
Nuttall-Pommerenke theorem [5, §6.5]) that deal with the convergence in capacity
of near diagonal PAs.
We do wish to emphasize that various generalizations of PAs exist
that are quite useful; e.g. multipoint Pade approximants (rational functions
found by interpolation in distinct points), Faber-Pade approximants (rational
functions whose Faber series matches the Faber expansion of f as far as pos
sible), multivariate Pade approximants; etc. (A dictionary [31] of these
generalizations is available, upon request, from this author.)
5. RATIONAL VERSUS POLYNOMIAL APPROXIMATION (WHAT A DIFFERENCE A DIVISION
MAKES!)
We now discuss some essential differences between polynomial and
rational approximation in the complex variable setting. Some of the contrasts
are rooted in function theoretic properties, while others are more typical of
linear vs. nonlinear approximation theory (cf. the forthcoming book of Braess [9]).
VoMibJJUXy oi Conv&igmct. For rational approximation, Runge's classical
theorem asserts that if f is analytic on a compact set E, then f is the
uniform limit on E of a sequence of rational functions. Unlike its polynomial
version (Theorem 2.1), the hypothesis that (D\E be connected is not needed; it
is compensated for by choosing rational approximants that have poles in the
components of (D\E. For example, a function analytic on the annulus
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 43
E : r1 < |z| < r2 is the uniform limit of rational functions that have poles at z = 0 and z = °° (think of its Laurent series!).
To describe the more delicate problem of approximating functions in A(E), we let n(E) denote the uniform limits on E of polynomials, and R(E) denote the uniform limits on E of rational functions whose poles lie outside E. Then the theorem of Mergelyan (Theorem 3.4) states that A(E) = n(E) if and only if C\E is connected. In contrast, the compact sets E for which A(E) = R(E) cannot be characterized topologically; that is, this property is not invariant under a homeomorphism of the plane (cf. [20]). The most popular (and most tasteful) example of a compact set E for which A(E) * R(E) is the Swiss cheese of A. Roth (cf. [17]), which she manufactured by removing a countable number of disjoint open disks from the closed unit disk. For further discussion of the possibility of rational approximation see Gamelin [18].
ExAj>tmc& oi BeAt kpptioxAjnawtA. For an arbitrary compact set E, the existence of best polynomial approximants from nm is a simple compactness argument. However, for best rational approximants from n (n > 0), this argument must be modified to handle the possibility of poles tending to the boundary of E. Using normal families, Walsh [62, §12.2] proved that best rational approximants exist provided E contains no isolated points.
UyisLqumeAA o& BeJ>£ kppsioxAjnarvtA . If f e C[a,b] is real-valued, then Chebyshev showed that the best uniform approximation to f on [a,b] out of
(5-1) nm « := ^R e nm n : R nas real coefficients} v ' m,n m,n
is unique (cf. [34, §9.2]). Surprisingly, this is no longer true if approximation to a real-valued f is done from n ; that is,if we allow rational approximants with complex coefficients. Indeed, as was shown by Saff and Varga [44], the function f(x) = x has no unique best uniform approximation on [-1,1] out of n- , (any such best rational r,, has complex coefficients, so that ^ll(*rll) 1S a^so best). Further examples of this type, as well as non-uniqueness results for approximation on a disk can be found in [25], [42].
Given f € A(E) we can nonetheless construct a table of best uniform rational approximants to f on E by making a specific choice for each pair (m,n). This analogue of the Pade table is called the Walsh array. The convergence theory for this array closely parallels the theory for the Padg table (e.g. Walsh [64] proved an analogue of Theorem 4.3). Moreover, the Pade table can be viewed as a limiting version of Walsh arrays where best approximation is done on disks E :|z| < e with e -> 0 (cf. [55]>[63]).
VdQKZd o{ ConveAgmce, o{ BeAt kppK.oxAjna.wU. For f e A(E), we set
44 E. B. SAFF
E n(f) := inf{ ||f - p||E: p c nn>, e n(f) := inf{ ||f - R||£: R c n ^ } .
Clearly, en(f) <• E n(f) f ° r all n> and so the essential question is: Can
e n(f) tend to zero substantially faster than En(f)? (Let's rule out the
trivial situation where f itself is rational.) The now famous example of
Newman [36] answered this affirmatively for f(x) = |x| on E : [-1,1], where E
n ( f ) * !/n' while en ( f ) * e_7Tv/S" (cf* C12], [58]). Another example of the
contrast is readily accessible to the reader. Using a simple calculus argument,
one shows that for the partial sums s (x) := ]T x /k! of e , there holds
for the sup norm on [0,~), °
lim sup ||e"x - l/s n(x)||1 / n < 1/2.
n->oo [0,«0
Replacing x by (l+x)/(l-x) we see that for f(x) := exp[-(l+x)/(l-x)] and
E : [-1,1],
lim sup e n ( f )1 / n < 1/2.
On the other hand, Theorem 3.6 asserts that
lim sup E n ( f )1 / n = 1
n->oo
because f is not analytic at x = 1.
At present, there is no simple characterization of the functions f
for which e (f) « E (f). However, some special classes of functions have
been investigated in this direction. For example, Goncar [23] has obtained the
precise geometric rate of convergence for rational approximation on an interval to
Stieltjes functions of the form (4.15). Several important results for classes
of real functions were obtained by Popov [40], Freud [16], and others. See also
the survey articles of Ganelius [19] and Newman [37] on the subject.
Analytic Continuation. A simple but important observation concerning the
class n of rational functions,is that it is invariant under a bilinear trans
formation. Thus, unlike polynomials, rational functions can provide analytic
continuations of functions to unbounded regions of the plane. The convergence
properties of the diagonal Pade approximants to Stieltjes functions illustrates
this point. Another example of the contrast is for Newman's example. It can be
shown that the sequence of polynomials {p*)p of best uniform approximation to
f(x) = |x| on [-1,1] diverges on eyery continuum in t\[-l,l]; moreover (analo
gous to the Jentzsch theorem of §1), every point of [-1,1] is a limit point of zeros
of the p* (cf. [7]). On the other hand, the best (real) rational approximants
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN
Rn to |x| out of nn have all their zeros and poles on the imaginary
and satisfy (cf. [ 8 ] )
( z for Re z > 0 lim R*(z) = J n->«> ( -z for Re z < 0.
REFERENCES
1. V. M. Adamjan, D. Z. Arov, and M. G. Krein, "Analytic properties of Schmidt pairs for a Hankel operator and the generalized Schur-Takagi problem", Math. USSR Sbornik, 15 (1971), 31-73.
2. J. M. Anderson, "The Faber operator", In: Rational Approximation and Interpolation (P.R. Graves-Morris, E. B. Saff, and R. S. Varga, eds.), Lecture Notes in Math., Vol. 1105, Springer-Verlag, Berlin (1984), 1-10.
3. R. J. Arms and A. Edrei, "The Pade tables and continued fractions generated by totally positive sequences", In: Mathematical Essays Dedicated to A. J. Macintyre, Ohio University Press, Athens, Ohio (1970), 1-21.
4. G. A. Baker, Jr., Essentials of Padg Approximants, Academic Press, New York (1975).
5. G. A. Baker, Jr. and P. R. Graves-Morris, Pade Approximants Part I: Basic Theory, Encyl. of Math., Vol. 13, Cambridge Univ. Press, Cambridge (1981).
6. G. A. Baker, Jr. and P. R. Graves-Morris, Pade Approximants Part II: Extensions and Applications, Encyl. of Math., Vol. 14, Cambridge Univ. Press, Cambridge (1981).
7. H.-P. Blatt and E. B. Saff, "Behavior of zeros of polynomials of near best approximation", J. Approx. Theory, 46 (1986).
8. H.-P. Blatt, A. Iserles, and E. B. Saff, "Remarks on the behavior of zeros of best approximating polynomials and rational functions", (to appear).
9. D. Braess, Nonlinear Approximation, Springer-Verlag, Berlin, (to appear).
10. R. P. Brent, F. G. Gustavson, and D. Y. Yun, "Fast solution of Toeplitz systems of equations and computation of Pade approximants", Journal of Algorithms, 1 (1980), 259-295.
11. C. Brezinski, "The long history of continued fractions and Pade approximants", In: Pade Approximations and Applications (M. G. de Bruin, H. van Rossum, eds.), Lecture Notes in Math., Vol. 888, Springer-Verlag, Berlin (1981), 1-27.
12. A. P. Bulanov, "The asymptotics of the maximum deviation of |x| from rational functions", Mat. Sb. 76 (1968), 288-303.
13. J. H. Curtiss, "Faber polynomials and the Faber series", Amer. Math. Monthly, 78 (1971), 577-596.
46 E. B. SAFF
14. P. Dienes, The Taylor Series, Dover, New York (1957).
15. G. Faber, "Uber polynomische Entwicklungen", Math. Ann. 57 (1903), 398-408.
16. G. Freud, "Uber die Approximation reller Funktionen durch rationale gebrochene Funktionen", Acta Math. Acad. Sci. Hungar, 17 (1966), 313-324.
17. D. Gaier, Vorlesungen uber Approximation im Komplexen, Birkhauser Verlag, Basel (1980).
18. T. W. Gamelin, Uniform Algebras, Prentice-Hall, Englewood Cliffs, N.J. (1969).
19. T. Ganelius, W. K. Hayman and D. J. Newman, Lectures on Approximation and Value Distribution, Seminaire de Mathematiques Supeneures, Les Presses de 1'Universite de Montreal, Montreal, Canada (1982).
20. P. M. Gauthier, "On the possibility of rational approximation", In: Pade and Rational Approximation (E. B. Saff and R. S. Varga, eds.), Academic Press, New York (1977), 261-264.
21. K. 0. Geddes and J. C. Mason, "Polynomial approximation by projections on the unit circle", SIAM J. Numer. Anal., 12 (1975), 111-120.
22. G. M. Golusin, Geometric Theory of Functions of a_ Complex Variable, Amer. Math. Soc, Vol. 26, Providence, R.I. (T969).
23. A. A. Goncar, "On the speed of rational approximation of some analytic functions", Math. USSR Sbornik, 34 (1978), 131-145.
24. W. B. Gragg, "The Pad£ table and its relation to certain algorithms of numerical analysis", SIAM Rev., 14 (1972), 1-62.
25. M. H. Gutknecht and L. N. Trefethen, Nonuniqueness of best rational Chebyshev approximations on the unit disk," J. Approx. Theory, 39 (1983), 275-288.
26. P. Henrici, Applied and Computational Complex Analysis, Vol. I, John Wiley & Sons, New York (1974).
27. E. Hille, Analytic Function Theory (Introduction to Higher Mathematics, vol. II), Ginn and Co., Boston (1962).
28. N. S. Landkof, Foundations of Modern Potential Theory, Springer-Verlag, Berlin (1972).
29. D. S. Lubinsky, "Pade tables of entire functions of very slow and smooth growth", Constr. Approx. 1 (1985), 349-358.
30. D. S. Lubinsky, "Uniform convergence of rows of the Pade table for functions with smooth Maclaurin series coefficients", (to appear).
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 47
31. D. S. Lubinsky and E. B. Saff, "A dictionary of generalized Pade approximants", Institute for Constr. Math. Technical Report (1986), Univ. of South Fla.
32. A. P. Magnus, "Another theorem of convergence of complex weight Pade approximants", Pade Meeting, Luminy, France, 14-18 Oct. 1985.
33. A. Markoff, "Deux demonstrations de la convergence de certains fractions continues", Act. Math., 19 (1895), 93-104.
34. G. Meinardus, Approximation of Functions: Theory and Numerical Methods, Springer-Verlag, Berlin (1967).
35. S. N. Mergelyan, "On the representation of functions by series of polynomials on closed sets", (Russian) Dokl. Akad. Nauk SSSR, 78, 405-408; Translations Amer. Math. Soc. No. 85 (1953).
36. D. J. Newman, "Rational approximation to |x| ", Michigan Math. J., 11 (1964), 11-14.
37. D. J. Newman, Approximation with Rational Functions, Regional Conference Series in Math, Vol. 41, Amer. Math. Soc, Providence, R.I. (1979).
38. J. Nuttall and C. J. Wherry, "Gaussian integration for complex weight Pade functions", J. Inst. Math. Appl., 21 (1978), 165-170.
39. 0. Perron, Die Lehre von den Kettenbrcrchen, Chelsea Pub. Co., New York (1929).
40. V. A. Popov, "Uniform rational approximation of the class V and its applications", Acta Math. Acad. Sci. Hungar, 29 (1977), 119-129.
41. W. Rudin, Real and Complex Analysis, McGraw-Hill, New York (1974).
42. A. Ruttan, "On the cardinality of a set of best complex rational approximations to a real function", In: Pade and Rational Approximations (E. B. Saff and R. S. Varga, eds.), Academic Press, New York (1977), 303-319.
43. E. B. Saff, "An extension of Montessus de Ballore's theorem on the convergence of interpolating rational functions", Journ. Approx. Theory, 6 (1972), 63-67.
44. E. B. Saff and R. S. Varga, "Nonuniqueness of best complex rational approximation to real functions on real intervals", J. Approx. Theory, 23 (1978), 78-85.
45. E. B. Saff and R. S. Varga, "On the zeros and poles of Pade approximants to e z , III, Numer. Math., 30 (1978), 241-266.
46. E. B. Saff, "An introduction to the convergence theory of Pade approximants", In: Aspects of Contemporary Complex Analysis (D. A. Brannan, J. G. Clunie, eds.), Academic Press, New York (1980), 493-502.
48 E. B. SAFF
47. W. E. Sewell, Degree of Approximation by Polynomials in the Complex Domain, Ann. of Math. Studies No. 9, Princeton Univ. Press, Princeton, N.J. (1942).
48. H. S. Shapiro, Topics in Approximation Theory, Lecture Notes in Math., Vol. 187, Springer-Verlag, Berlin (1971).
49. V. I. Smirnov and N. A. Lebedev, Functions of a Complex Variable: Constructive Theory, 11 iffe Books Ltd., London (1968).
50. H. Stan!, Beitrage zum Problem der Konvergenz von Pade-approximierenden, Dissertation, Technischen Umversitat Berlin (1976).
51. H. Stahl, "Orthogonal polynomials with complex valued weight function", I & II, Constr. Approx. (to appear).
52. G. SzegS, Orthogonal Polynomials, 3rd ed., Amer. Math. Soc. Colloq. Pub., Vol. 23, Amer. Math. Soc, Providence, R.I. (1967).
53. L. N. Trefethen, "Near-circularity of the error curve in complex Chebyshev approximation", J. Approx. Theory, 31 (1981), 344-367.
54. L. N. Trefethen, "Rational Chebyshev approximation on the unit disk", Numer. Math., 37 (1981), 297-320.
55. L. N. Trefethen and M. H. Gutknecht, "On convergence and degeneracy in rational Pade and Chebyshev approximation," SIAM J. Math. Anal., 16 (1985), 198-210.
56. L. N. Trefethen and M. H. Gutknecht, "The Caratheodory-Fejer method for real rational approximation, SIAM J. Numer. Anal., 20 (1983), 420-436.
57. M. Tsuji, Potential Theory in Modern Function Theory, Dover, New York (1959).
58. N. S. Vjaceslavov, "On uniform approximation of |x| by rational functions", Soviet Math. Dokl.,16 (1975), 100-104.
59. H. S. Wall, Analytic Theory of Continued Fractions, Van Nostrand, Princeton, N.J. (1948).
60. H. Wall in, "On the convergence theory of Pade approximants", In: Linear Operators and Approximation, ISNM Vol. 20, Birkhauser, Basel (1972), 461-469.
61. J. L. Walsh, "Uber die Entwicklung einer analytischen Funktion nach Polynomen", Math. Ann., 96 (1926). 430-436.
62. J. L. Walsh, Interpolation and Approximation by Rational Functions in the Complex Domain. 3rd ed., Amer. Math. Soc. Colloq. Publ., Vol. 20, Amer. Math. Soc, Providence, R.I. (1960).
63. J. L. Walsh, "Pade approximants as limits of rational functions of best approximation", J. Math. Mech., 13 (1964), 305-312.
POLYNOMIAL AND RATIONAL APPROXIMATION IN THE COMPLEX DOMAIN 49
64. J. L. Walsh, "The convergence of sequences of rational functions of best approximation with some free poles", In: Proc. Sympos. Approximation of Functions (General Motors Res. Lab., 1964), Elsevier, Amsterdam (1965), 1-16.
E. B. Saff Institute for Constructive Mathematics Department of Mathematics University of South Florida Tampa, Florida 33620
Proceedings of Symposia in Applied Mathematics Volume 36, 1986
N-WIDTHS AND OPTIMAL RECOVERY
A. PINKUS
ABSTRACT. These lecture notes are intended as a short introduction to the theory of n-widths, and to the theory of optimal recovery. Some simple examples are studied in detail in an attempt to explain and motivate the main ideas.
1. GENERAL INTRODUCTION. In this lecture I hope to whet the readers
interest in both the theory of n-widths and the theory of optimal recovery.
As such* these notes are intended as an introduction to the subject matter,
and not as an overview or survey.
The twinning of the two topics of n-widths and optimal recovery in one
lecture is somewhat artificial. Nonetheless a relationship does exist in the
type of problems considered. Both subjects differ from the more classical
problems of approximation theory in that they are concerned with determining
optimal subspaces, operators, algorithms, or whatever, with which to
approximate elements of an a priori given set. However because we are dealing
with two topics we have divided these notes into two distinct parts. In
Section A we discuss n-widths, and in Section B optimal recovery. In each of
the sections we present a simple example and try, using the example, to
motivate some of the main concepts and ideas of the theory. Readers interested
in more comprehensive surveys are urged to consult references [3], [4], [6],
[7] and [9].
A. N-WIDTHS
2. INTRODUCTION, Perhaps "the" classic problem of approximation theory is the
following. Given an element x of a normed linear space X, and an n-dimensional
subspace Xn of X, find a best approximation to x from X , and determine the
value of the error, i.e. the measure of the distance of x from a best
approximant. Thus, for example, if X = H is a Hilbert space over t with inner product (•,•)> and Xn is spanned by v-|,...,v which, for convenience, we
1980 Mathematics Subject Classification. 41A46, 41A65. © 1986 American Mathematical Society
0160-7634/86 $1.00 + $.25 per page
51
http://dx.doi.org/10.1090/psapm/036/864365
52 A. PINKUS
n assume to be an orthonormal basis for Xn, then \ (x,v.) v. is the unique
best approximant to x from Xn. The "error", generally denoted as E(x;Xn), may
be expressed by n'
2 _ V i#v lf M 2 nl/2 E(x;Xn) = [ ||x||< - I |(x,Vi)|^ ] n i = l n
^ery often one is not so much interested in approximating a given element
of X, but in approximating a subset A of X. By this we mean determining
E(A;XJ = sup inf ||x - y||. n xeA y€Xn
There are many reasons for considering such a quantity. One is often not
interested in a best approximant or in the specific error obtained, but in
measuring the error in terms of some other criteria such as, for example,
smoothness. We consider an example of such a problem.
We denote by W^'CO^ir] the (Sobolev) space of real-valued 2TT-periodic
functions on (R, for which f^" 1' is absolutely continuous, and whose rth_ der ivat iv e on [0,2TT ] ex ist s as a funct io n o f L 2 [ 0 , 2 T T ] . Se t X = L 2 [ 0 , 2 T T ] , an d
A (= ffOr> ) = { f : fG W (2
r)[0,2*], | | f ( r ) | | 2 < 1 } .
Let Tn denote the (2n + l)-dimensional subspace of trigonometric polynomials
of degree n, i.e.
T = span{ 1, sin x, cos x,..., sin nx, cos nx }.
It is not at all difficult to prove that E(A;Tn) = (n + l ) "r for all
non-negative integers n and r.
A more common, but totally equivalent way of stating this result is the
following. For every feW^f'[0,2TT],
E(f;Tn) < (n+ l ) "r ||f(r)||2,
and (n + l ) " r is the best constant in the above inequality. It is often to
obtain inequalities of this form, with best constants, that we study the
quantities E(A;Xn).
As above, we assume that X is a normed linear space and A a subset of X.
For each n-dimensional subspace Xp, we have the associated quantity E(A;Xn). In
1936, Kolmogorov [1] proposed the following idea. Instead of considering E(A;Xn)
for different but specific Xn, let us vary E(A;Xn) over all n-dimensional
subspaces X of X. We then search for n-dimensional subspaces (if they exist)
which best approximate A, and also for the associated minimum value of E(A;Xn).
To state this in more precise mathematical terms, we have
DEFINITION 1. X is a normed linear space and A a subset of X. The n-width
of Kolmogorov of A jjr X is given by
N-WIDTHS AND OPTIMAL RECOVERY 53
dn(A;X) = inf E(A;Xn) Xn
= inf sup inf ||x - y| |, Xn x€A y€Xn
where the left-most infimum is taken over all n-dimensional subspaces Xn of X.
We would like, if possible, to identify n-dimensional subspaces X of X
for which d (A;X) = E(A;Xn). Such subspaces are quite naturally said to be
optimal for d (A;X).
The quantity dn(A;X) measures the extent to which A may be approximated by
n-dimensional subspaces of X. It is not only that dn(A;X) is an interesting
theoretical quantity (and it is), but also that knowledge of it can help us in
other problems. For example, suppose that while we may or may not know d (A;X)
precisely, we do know something about its asymptotic behaviour as n + ».
d (A;X) is a lower bound on the extent to which A is approximable by n-dimension
al subspaces. As such, if we have a given sequence {X } of n-dimensional
subspaces and estimates for E(A;X ), then it is possible to judge whether it is
worthwhile spending energy, time, and money, in using better but more
complicated subspaces in our approximation process.
In the above general framework, yery little of interest can be said. But
let us consider a specific example in detail. Before doing so we remark that
many other n-width concepts now abound in the literature. We will touch upon
some of these in the next few pages.
3. EXAMPLE. Set X = L°°[0,1]. The (Sobolev) space W ^ [ 0 , 1 ] is the set of
absolutely continuous real-valued functions defined on [0,1] for which f' exists
a.e. as an element of L°°[0,1]. We define
( A = ) B ( 1 ) = {f : f€W(1)[0,l], ||f'|| < 1 }, 00 0 0 U ' " J ' , , I I 0 0 '
( 1 ) oo
and we are interested in the quantity d ( B ^ ^ L ).
A natural approximating subspace to consider is X = -n ,, where TT , is the set of algebraic polynomials of degree n-1 (dimension n). The quantity
E(B^J;TT ,) has been much studied and although an exact formula is not known
to the best of my knowledge, there does exist from Jackson's Theorem ( see e.g.
[8,p.22] ) the upper bound E(B (^ •,7rn_1) <3/(n-l).
We now consider an even more elementary subspace. Let S denote the subspace
( of dimension n ) of left-continuous step functions with jumps at i/n, i=l,...,
n-1. Thus s€Sn if s(x) = c. on ((i-l)/n,i/n], i=l,...,n, for some choice of real
constants {c^}". ( Modify the first interval to include the point zero. ) We
claim PROPOSITION 1. E ( B ^ ; S ) = l/(2n), n = 1,2,... .
54 A. PINKUS
PROOF. For f € B ^ , let sf€Sn be uniquely defined by sf((2i-l)/2n) = f((2i-l)/2n), i = l,...,n. We claim that | |f - sf| 1^ <_ l/2n. For xe[0,l], we have xe((j-l)/n,j/n] for some j = l,...,n ( recall that x = 0 is in the first interval ). Thus f(x) - sf(x) = f(x) - f((2j-l)/2n). Since ||f I L ± 1 . it follows that
|f(x) - f((2j-l)/2n)| <Jx - ((2j-l)/2n)| <. l/2n.
Thus ||f - sJI^ <_ l/2n. To prove the desired equality, it remains to find an f * e B 0 ) for Which E(f*;S ) = l/2n. Set f*(x) = x. Then it is easily seen that E(f*;Sp) = l/2n, and s.^ is in fact the unique best approximation to f* from S . This proves the proposition. •
We claim that dn(B(^;L°°) = l/2n, for n = 1,2 ( d^B^O = »
since eyery constant function is in B^J. ) From Proposition 1 we have d (B^ ,L°°) <_ l/2n. It is therefore necessary to prove the lower bound, i.e. E(B^';Xn) >_ l/2n for every n-dimensional subspace Xn of L°°[0,1]. This problem is non-linear in nature and is generally the more difficult. In the proof of this result we use the following general theorem which we will not prove.
THEOREM 2 [2]. Let X be a normed linear space. Let Xn+1 and Xn be_ (n+1)-and n-dimensional subspaces of X, respectively. There then exists an xeXn+1\0 for which
E(x;Xn) = ||x||,
i.e. the zero element is a best approximation to x from X .
This theorem is often used in obtaining lower bounds for n-widths. Let us see how.
Let L , denote the (n+l)-dimensional subspace of continuous functions on [0,1] which are linear on [(i-l)/n,i/n], i = l,,..,n. The key to the lower bound is Theorem 2 and this next result.
PROPOSITION 3. rf f€Ln+1 andj |f| <. l/2n, then f £ B ^ .
PROOF. Obviously L .-. c W ^ . We must therefore prove that if f€Ln+-| and H f I L <J/2n, then ||f U ^ <. 1. Assume that feLn+1 and Ufjl^ <J/2n. Then |f(i/n) | <_ l/2n, i=0,l ,...,n. Since f is linear on [(i-l)/n,i/n], i = 1 ,...,n, then for xe((i-l)/n,i/n),
|f(x)| = n |f(i/n) - f((i-l)/n)| <J . D
PROPOSITION 4. If_ Xn is any n-dimenstonal subspace of L°°[0,1], then
E f B ^ ' . y >:V2n.
PROOF. Let Ln+1 be as above, and Xn be any n-dimensional subspace of L°°[0,1]. Then from Theorem 2 there exists a non-zero feL ,, which we normalize
N-WIDTHS AND OPTIMAL RECOVERY 55
so that ||f|| = l/2n, satisfying
E(f;V = I l^i loo = V 2 n . From Proposition 3, f€& J . Thus
E(B(1);Xn) > l/2n. o
To summarize, we have proved the following result.
THEOREM 5. Let. B^^ be as previously defined. Then dn(B^;L°°) = l/2n,
n = 1,2,... . Furthermore S . the n-dimensicmal subspace of step functions
with jumps at i/n , i = l,...,n-l, is an optimal subspace for d (B\/;L°°).
REMARK. Before proving Theorem 5 we noted that E(B J[> »7rn-l^ i 3 / ^ " 1 ) -While we do not know if algebraic polynomials of degree n-1 are optimal for
( 1 ) co
dn(Bvoo
;;L ), it does follow from the above inequality that they are at least
asymptotically optimal in the sense that both quantities decrease to zero at
the same rate.
4. OTHER N-WIDTHS. Theorem 5 is a special case of a more general result. We
defer the statement of the general result to the next section. We will now use
the above example and its proof to motivate additional problems and definitions.
4.A. LINEAR N-WIDTH. In the proof of Proposition 1 we obtained the upper bound
constant l/2n by the simple linear process of interpolating from S to each
feB^' at (2i-l)/2n, i = l,...,n. That is, we did not calculate the best
approximation to each feB^ 7 from S 2 n. We calculated instead a linear approx
imation. This suffices because the quantity E(A;X ) is a "worst case" measure.
It is not necessary when calculating E(A;Xn) that we actually determine E(x;X )
for each xeA.
In a Hilbert space setting the best approximation is an orthogonal
projection and is therefore a linear approximation. This is no longer the case
in a non-HiIbert space setting, for there]the best approximation operator is a
nonlinear operator which is generally exceedingly difficult to exactly determine.
Linear approximations are easier to calculate, and are of interest in and of
themselves. Let us therefore consider linear approximations rather than best
approximations. In other words, we replace E(A;X ) by
E(A;P ;X ) = sup ||x - P x|| n n XGA n
where Pn is a continuous linear operator from X to X . ( P is said to be of
rank n if its range space is of dimension n. )
Analogous to the Kolmogorov n-width we now define what is termed the
linear n-width of A in X.
56 A. PINKUS
DEFINITION 2. X is a normed linear space and A a subset of X. The linear
n-width of A jj^ X is defined by
6n(A;X) = inf sup ||x - P x| | , n Pn xGA
n
where the infimum is taken over all continuous linear operators of rank at
most n. •k -k "k
If 6 (A;X) = E(A;P :X ) where Pn is a continuous linear operator of rank * I ju M M II
<_ n, then P is said to be optimal for 6p. From our definitions, it follows that d ±&n- If X is a Hilbert space,
the n d „ = <s . In genera l 6„ i s an easie r quant i t y t o determine . Whe n d „ an d <S„ n n ^ n J n n are unequal, then serious problems generally arise in the computation of the
former. Both d and 6 depend on "worst case" situations. As such they may be
equal even in a non-Hilbert space setting. This is true of the example of the
previous section.
THEOREM 6. Let B ^ and Sn be as previously defined. Then <5 n(B^ ;L°°) oo n — ^ — ~ E -*—-*- • — — n v oo ' '
= l / 2 n , n = 1 , 2 , . . . . Furthermor e P , th e ran k n l inea r operator * define d by
in terpo la t in g from S n t o eac h feB^ J at^ ( 2 i - l ) / 2 n , i = 1 , . . . , n , i s optima l
fo r ^ ( B ^ L 0 0 ) .
4.B. BERNSTEIN N-WIDTH. Let us examine, in our example, the proof of the lower
bound using Theorem 2. The following idea was used. Assume X , is an (n+1)-
dimensional subspace of X, and let S(X ,) denote the unit ball of X ,. If
XS(X -.) c A, then from Theorem 2 ( see Proposition 4 ) , d (A;X) >_x. This
technique for obtaining lower bounds for d has been so often used that it has
been codified.
DEFINITION 3. Let X be a normed linear space and A a closed, convex,
centrally symmetric ( xeA implies -xeA ) subset of X. The Bernstein n-width
of A j_n X is defined by
bn(A;X) = sup sup { \ : AS(XM+-,) E A >> Xn+1
where the X +, range over all subspaces of X of dimension n + 1.
Thus d (A;X) > bn(A;X). The quantity bn is often of interest, other than f 1 ) oo
simply as a lower bound for d . In our example we proved that b ( B ^ ^ L ) = l/2n.
Restating this we may write sup l l f l U l l f l L i 2n , n = 1,2
f € X n + l
N-WIDTHS AND OPTIMAL RECOVERY 57
for any (n+l)-dimensional subspace X , of W ^ . From Proposition 3, equality
holds with X n + ] = L n + 1.
This constant 2n is, in some sense, a smallest "smoothness constant"
relating the size of f' with that of f. This fact should be considered in the
light of Markov's inequality. The well-known Markov's inequality for algebraic
polynomials of degree n states that if pGirn, then on [0,1]
l iP ' IL£2n 2 | |p |L ,
an d equal i t y i s attaine d ( by th e Chebyshe v polynomia l o f th e f i r s t kin d ) .
Equivalentl y we may wr i t e
sup | |p« |L / | |p l L = 2n 2. P €*n
Algebraic polynomials are thus far from being optimal in the above sense.
4.C. GEL'FAND N-WIDTH. There is one other n-width concept which we wish to
introduce and it is the Gel'fand n-width_ of A j£ X. It is related to the
Kolmogorov n-width via a duality relationship which we will not discuss. It is
considered again in Section 6 and will reappear when dealing with optimal
recovery problems. The Gel'fand n-width is defined as follows.
DEFINITION 4. X is a normed linear space and A is a closed, convex,
centrally symmetric subset of X. The 15e1'fand n-width of A in X is given by
dn(A;X) = inf sup ||x||, Ln xGAnLn
where Ln varies over all subspaces of X of codimension n.
A subspace Ln is said to be of codimension n if there exist n linearly •k
independent linear functionals x*GX ( the continuous dual of X ) i = l,...,n,
such that Ln = { x : x*(x) = 0, i = l,...,n }.
There is no a priori relationship between dn(A;X) and dn(A;X). Either may
be larger. Perhaps surprisingly they are often equal. There is a simple reason
for this and it is that the inequalities <$n(A;X) _> dn(A;X) >_ bn(A;X) always
hold. The inequality 6n(A;X) >_ dn(A;X) follows from the fact that every rank
n continuous linear operator P may be written in the form
Pnx = x}(x)Xi
where x*GX , i = l,...,n, and the { x ^ 1 span the range of P . Thus
58 A. PINKUS
sup ||x - P x|| >_ sup ||x|| xGA n x€A
x*(x)=Of i=l,...,n
from which follows the desired inequality. The inequality dn(A;X) >_ bp(A;X) is
even simpler to prove. For every (n+l)-dimensional subspace X , of X, and for
every subspace Ln of codimension n of X, X n + 1nLn 7* (0) Thus if xS(Xn+1) 5 A,
then there exists an x€AnLn for which ||x|| = A. Hence d^AjX) >_A, and the
inequality follows. Thus in our previously considered example we necessarily
have dn(B(i);L00) = l/2n.
5. N-WIDTHS OF SOBOLEV SPACES. As stated earlier, the results of the previous
two sections may be considered as particular cases of a more general theorem.
We present this generalization and the remarks thereafter in an attempt to give
the reader a taste for the type of research done in this subject.
The Sobolev space W^'[0,1] for p€[l,°°], r a positive integer, is defined
as follows:
W(p}[0,l] = {f : f ( r _ 1 )abs. cont., f(r)€Lp[0,l] }.
Let B ^ = {f : f e W ^ L O , ! ] , || f ( r ) | | <_ 1 }, and for a nonnegative integer m, r r r
set f 0, x < 0
+ lxm, x > o .
THEOREM 7 [7]. Fix pG[l,°°] and r a positive integer. Then for n >_ r,
dn(B^);LP) = dn(B^;LP) = «n(B^);LP) = ^ J MP)
Furthermore,
1) X* = span { 1, x,..., x r ~ \ (x-^)^" 1,..., (x-Cn_r)+"1 > is an optimal
subspace for dn(B ;LP) for some choice of 0 < £-, < ... < e < 1.
2) Ln = { f : f(n.j) = 0 , i = l,...,n } is an optimal subspace for d n ( B ^ ; L p )
for some choice of 0 < n-i < ... < n < 1.
3) The rank n linear operator P defined by interpolation from X t£ feBv ' at_
the { n i}" is optimal for 6 n(B^);L p).
It is also possible to identify an optimal subspace for b (B^[/;LP).
REMARK. The proof of Theorem 7 is far beyond the scope of this lecture.
However to give an inkling as to how X„ arises, recall Taylor's formula with (r) remainder in integral form. Bv ; is the set of functions
f(x) = ^l a. x1 + (l/(r-l)l) jj (x-y)^ 1 h(y) dy, i=0
N-WIDTHS AND OPTIMAL RECOVERY 59
P(r) and a. f(i)(0)/i!, i = Q,l,...,r-1 ). The subspace where ||h|| < 1 ( h = fv
XM is the span of the first r monomial terms ( which must appear since there is NT-1 n
no restriction on their coefficients ) and the kernel (x-y)+~ evaluated at n-r
distinct points.
REMARK. Theorem 7 is also valid in certain mixed norm cases. Consider the
n-widths of Er£' in Lq, where p and q are arbitrary numbers in [!,«]. If p = »
or q = 1, then Theorem 7 holds ( except that b (B^';Lq) is unknown ). It is
conjectured that Theorem 7 is valid ( except for b ) for all p q . For p < q,
the situation is considerably more involved. No exact results are known and it
was only some years ago that the asymptotic behaviour of each of the n-widths
was determined. They do not all behave asymptotically in the same manner.
6. N-WIDTHS AS GENERALIZATIONS OF S-NUMBERS.
operator mapping X into itself. Set
A = { Tx :
Let T be a compact linear
< 1 }
The choice of A as the image of the unit ball under a linear map is a common (r) choice in the theory of n-widths. Er ' is of this form, aside from the free
r-dimensional polynomial subspace.
Assume for the moment that X = H is a Hilbert space with inner product -k ie ic
(•,•). Associated with T is its adjoint T . The compact maps T T and TT are
self-adjoint and non-negative. They possess the same eigenvalues {X (T)},
n = 0,1,..., ( given in non-increasing order of magnitude ) which are all 1 /2 non-negative numbers. The values s (T) = [x (T)] ' are called the s-numbers,
or singular values, of T. Functional analysts who study n-widths generally
regard them as generalizations of s-numbers, see e.g. Pietsch [5]. Let us
explain why.
There exist many well-known characterizations for the s-numbers of T. Thus
the "max-min" characterization is given by
sn(T) = sup inf xex LXn+l n+1
sup inf Xn+1 x€Xn+l
(T*Tx , x ) l (x , x)
1 Tx l |x |
1/2
where X n + 1 varies over all subspaces of H of dimension n+1. But this last
quantity is simply a restatement of the definition of the Bernstein n-width
bn(A;H) for this choice of A. Thus bn(A;H) = s (T). In a totally analogous
manner, it may be seen that the classical "min-max" characterization of s (T)
given by
60 A. PINKUS
sn(T>
I V 2 (T*Tx , x)
where the Ln vary over all subspaces of H of codimension n, is essentially
the Gel'fand n-width dn(A;H). In this same vein 6n(A;H) = sp(T), since the
definition of 6 (A;H) corresponds to the classical singular value decomposition
of T ( and dn(A;H) = 6n(A;H) since we are in a Hilbert space ). Thus dn(A;H) =
dn(A;H) = 6n(A;H) = bn(A;H) = sn(T).
B. OPTIMAL RECOVERY
7. INTRODUCTION. Let X be a normed linear space and A a subset of X. In a
yery general sense optimal recovery is concerned with the problem of estimating,
in as efficient a manner as possible, some specific information about elements
of A based on a number of given pieces of information.
Many problems fall into this wide setting. Before presenting a general
framework, let us consider some specific examples.
8. EXAMPLES.
8.A. RECOVERY OF A FUNCTIONAL. Let X = L°°[0,1] and A = B ^ ( see Section 3 ).
Assume that for each f€B J we are given the values f(x.)> i = l,...,n, for some fixed 0 <_ x-. < ... < x < 1. For convenience we set 1(f) = (f(x-,),... ,f(x ))
GRn, and call 1(f) the information vector. Let ye[0,l], fixed. The problem we
consider is that of optimally reconstructing f(y), for feB^ , based only on
the data 1(f).
Any function T which maps 1(f) to R is called an algorithm. The error of
the algorithm T is defined by
E(T) = sup{|f(y) - T(I(f))| : f€B (^ } .
The value
E* = inf { E(T) : T } ~k it ie
is the intrinsic error in our problem. If E = E(T ) for some algorithm T , then
we say that T is an optimal algorithm or provides for an optimal recovery of
f(y). The problem is to find E and an optimal algorithm T .
An important tool in the solution of this problem is the following simple
lower bound for E .
PROPOSITIONS. E* >_ sup{ |f(y)| : f e B ^ 9 K f ) = <L >.
PROOF. Let f e B ^ wit 00
that for eyery algorithm T,
PROOF. Let f e B ^ with 1(f) = 0. Since - f e B ^ and I(-f) = 0, it follows
N-WIDTHS AND OPTIMAL RECOVERY 61
E(T ) > m a x { | f ( y ) - T ( 0 ) | , | - f ( y ) - T ( 0 ) | }
1 ( | f ( y ) - T(0) | + | f ( y ) + T ( 0 ) | )/2
L I f (y) l -The clai m now fo l lows . •
We w i l l prov e tha t equal i t y holds , calculat e E , an d i den t i f y an optima l
algori thm .
PROPOSITIO N 9 . E = min { |y - x i | : i = 1 , . . . , n } . Furthermore , i f
\y - x . | = min { \y - x i | : i = l , . . . , n } , the n th e algorith m T define d by
T ( 1 ( f ) ) = f ( x . ) i s opt imal . , (1 )
PROOF . Th e funct io n f * ( x) = min { |x - x i | : i = 1 , . . . , n } i s i n B voo
/ an d
sa t i s f i e s f * ( x . ) = 0 , i = 1 , . . . , n . Thu s from Propositio n 8 , E >_ f * ( y) =
min { |y - x . | : i = l , . . . , n } . Fo r T as above ,
E* <_. E(T y) = sup { | f ( y ) - f ( x j ) | : f e B ^ }
= |y - x j |
1 E • D
8.B. RECOVERY OF A FUNCTION. Within this same framework, we change our
example somewhat. Assume that based on the same information vector 1(f), we
are now interested in recovering not f(y), but the full function f on [0,1].
Our algorithms T are therefore functions from Rn to L°°[0,1]. As previously
E(T) = sup { ||f -T(I(f))|| w : f G B ^ } ,
and E* = inf { E(T) : T }.
Totally analogous to Proposition 8, we have
PROPOSITION 10. E* >_ sup{ ||f||ro : f € B ^ , 1(f) = 0. }.
Thus, in particular, E >_ 11"F*I I^ where f* is as defined in the proof of
Proposition 9. We will prove equality. To this end set z^ = ( x^ + x.j+-|)/2,
i = l,...,n-l, ( z = 0, z = 1 ). Let S denote the space of step functions
with jumps at {z.j}-j" • For each feB^ , define s^eS by sf(xi) = f(x.), i = 1,.
..,n.
PROPOSITION 11. E = ||f*|| . Furthermore, if T is the algorithm given
by T (1(f)) = Sf, then T is an optimal algorithm.
PROOF. For T as above,
E* < E(T* ) = s u p { | | f - s f | | o o : f € B ( i } } .
Fo r x G ( z i _ 1 , z i ] , | f ( x ) - s f ( x ) | = | f ( x ) - f ( x i ) | 1 |x - x . | = f * ( x ) . Thu s
62 A. PINKUS
E* <: E(T*) < ||f*IL < E*. a
8.C. OPTIMAL INFORMATION. We again alter our problem. We assume that we wish,
as in (8.B),to recover flEB^' based on 1(f). However, now we may choose, a
priori, the n information functionals which constitute 1(f). For ease of
discussion, let us assume that we may choose n points x-j,...,x in [0,1],
which appear in 1(f), at which to sample f.
We exhibit this dependence on I by letting E(T,I) and E (I) denote the •k
E(T) and E of(8.B). We are therefore concerned with the problem of evaluating
£ = inf E*(I), I
where I ranges over all information vectors of the form 1(f) = (f(x-j),... ,f(xn))
with 0 <_ X-, < ... < x < 1. Set x_ = (x-,,... ,x ) , and let f*(x,x_) denote the
f*(x) of(8.B). It is now a simple matter to prove
PROPOSITION 12. £ = inf { | |f*(-,x)| |w: 2L> = V2n. Furthermore, an
optimal x is given by x* = (l/2n,3/2n,...,(2n-l)/2n).
Before continuing note that the value obtained is exactly the value of
n-widths of B ( 1 ) in L°°. This 00
connection in the next section.
the n-widths of B^ ' in Lw. This is not a coincidence. We will discuss the
8.D. OPTIMAL RECOVERY WITH ERROR. We now return to the problem, considered
in (8A), of recovering f (y), for f B ^ , based on 1(f) = (f(x^,... ,f(x )) for
fixed 0 <_ x, < ... < xn <_ 1. However, let us assume that we do not know 1(f) exactly. Errors may occur in our calculation and rather than being given 1(f),
we are given w = (w-j,... ,w ) where |w - f(x-) | <_ e., i = 1,.. .,n. The error bounds e. >_ 0 are given fixed values. We therefore define, for an algorithm T
mapping Rn to R,
E(T;e) = sup { | f (y) - T(w)|: f € B ^ , |w. - f(x.)| £ - ^ , 1 = l,...,n },
and E*(e) = inf { E(T;£) : T }.
This next result is totally analogous to Proposition 8.
PROPOSITION 13. E*U) >_ sup { |f(y)|: f€B (^, \f{^)\ ± e., i = l,...,n }.
Set f*(x;e) = min { e. + |x - x, I : i = 1,... ,n }. Then f*(se)GB^^ and **" 1 1 ' — oo
0 <_ f*(x 9e) £ e i 5 i = l , . . . , n . Le t e . + \y - x . | = min { e i + |y - x^ | : i = l , . . , r PROPOSITION 14. E*U) = f*(y;e) , and the algorithm defined by T (w) = w.
is optimal.
PROOF. From Proposition 13, E*(e) >_ f*(y;e). Let f€B^^ with
lwi - f(x-j)l l e-j 5 "» = l5...*n. Then
N-WIDTHS AND OPTIMAL RECOVERY 63
|f(y) - Ty(w)| = |f(y) - Wj|
llf(y) - f(Xj)| + |f(xj) - wjl
< |y -x.| + e j
= f*(y;£).
Thus E*(e) £ E(Ty;£) <_ f*(y;£). D
Analogues, with error, of examples (8B) and (8C) are similarly constructed.
9. GENERAL THEORY. The above simple examples are prototypes of some of the
problems considered in the theory of optimal recovery. In our general discussion
we will somewhat restrict ourselves. Thus, for example, all our operators will
be linear, and we will not touch upon problems of recovery with error as
exemplified by example (8D).
Let X, Y and Z be normed linear spaces. A is a subset of X which we assume
to be closed, convex, and centrally symmetric. By U we denote a linear
operator from X to Z. U(x), for x€A, will be the element which we wish to
recover, and U is therefore termed the object operator. I is a linear operator
from X to Y, called the information operator. Any function T from 1(A) to Z is
said to be an algorithm. Each algorithm gives rise to a recovery scheme with
error
E(T) = sup { ||U(x) - T(I(x))||: x€A }.
The value E = inf { E(T) : T } where T ranges over all possible algorithms, is
called the intrinsic error of the process. If E = E(T ) for a specific
algorithm T , then T is called an optimal algorithm and we have found an
optimal recovery for U on A.
There is no reason to suppose that optimal algorithms exist, or if they
do, are linear. To obtain such results, additional assumptions are needed.
However, certain very general properties do hold. As an analogue of Proposition
8 we have the following.
PROPOSITION 15. E* >_ sup { ||U(x)||: x€A, I(x) = 0 }.
PROOF. Let xeA and I(x) = 0. By assumption -x€A and I(-x) = 0. Thus for
every algorithm T,
||U(x) - T(0)||, ||U(-x) - T(0)|| < E(T).
Since U is linear, it follows that ||U(x)|| ^ E ( T ) , proving the proposition. •
For ease of exposition, set
e* = sup { ||U(x)||: xeA, I(x) = 0 }.
64 A. PINKUS
In all our examples we had E = e*. However, this is not generally valid as the
following simple example shows.
EXAMPLE. Let X be R3 endowed with the Euclidean norm ||xj|2 = ( x1 + x|
+ X3 ) 1 / 2 . Set Z = X, Y = R, U(x) = x.,
A = { x : Hx.1^ = Ix^ + |x2| + |x3| < 1 )
and l{x) = X-, + x2 + x^. A simple calculation shows that e* = l/\/2. Now
E* = inf sup{ ||x - T(I(x))|L: ||x|U <. 1 } T L '
>. inf max{ H e 1 - T(1)|L: i = 1,2,3 }, T
o
where e1 is the ith unit vector, i = 1,2,3. T(l) is a vector in R . No vector
in R3 is of distance less that \I7JS from each of the e1, i = 1,2,3. Thus
E > VZ73. ( Equality in fact holds for the algorithm T(a) = (a/3,a/3,a/3).)
An opposite inequality to E >_ e* is the following.
THEOREM 16 [3,p.3]. E* <_ 2e*. PROOF. For each yGl(A), choose an x'(y)GA satisfying I(x'(y)) = y. We
define an algorithm T* by
T«(I(x)) = U(x'(I(x))).
Then
E(T') = sup { ||Ux - T'(I(x))||: xeA }
= sup { ||U(x - x'(I(x)))||: xGA }.
Set w = x - x'(I(x)). Then I(w) = I(x) - I(x) = 0, and since A is convex and
centrally symmetric, w/2eA. Thus
E* 1 E(T') 1 2 S UP< I |Uw| I: w€A, I(w) = 0 } = 2e*. n
REMARK. The T' constructed above is, in general, neither continuous nor
linear. If we demand a linear, continuous algorithm, them no inequality of the
above form is valid.
One set of assumptions which implies the equality E = e* is the following.
( Note that our examples do not quite satisfy these assumptions.)
THEOREM 17 [3,p.5]. Assume that there exists a function S from 1(A) to^X • ' . . . . . ^
for which x - S(I(x))eA, anc[ I(x - S(I(x))) = 0 for eyery xGA. Then E = e* anc[
T = US is an optimal algorithm.
The proof of this theorem is an immediate consequence of the definitions
of T and e*.
A totally different set of restrictions gives us similar results.
N-WIDTHS AND OPTIMAL RECOVERY 65
Assume that X is a normed linear space over the reals and Z = R, i.e. U
is a linear functional. In addition to the previous assumptions, we also
suppose that 1(A) is absorbing in Y, i.e. for any y€Y there exists X > 0 •k
such that Xy€l(A). Let Y denote the continuous dual of Y. Then
THEOREM 18 [3,p.16]. Under the above assumptions,
e* = E* = inf* sup {|U(x) - T(I(x))|: x€A }. TeY
Furthermore, if 1(A) is a neighborhood of the origin in Y, and if there exists
a_ TeY for which
sup { |U(x) - T(I(x))|: x€A } < »,
then there exists an optimal algorithm in Y , i.e. one which is linear and
continuous.
We close these notes by examining a connection between certain n-widths
and problems of optimal recovery with optimal information. For convenience, we
now assume that X = Z, Y = Rn, U(x) = x, and I(x) = (x*(x),... ,x*(x)), where
x*eX ( the continuous dual of X ), i = l,...,n. Thus for each algorithm T,
E(T) > E * > e * = sup{ ||x||: xeA, I(x) = 0_ 1.
Set L = { x: I(x) = 0_ }. Then Ln is a subspace of codimension at most n, and
we may write
e* = sup{ ||x||: x€AflLn }.
From the definition of the Gel'fand n-width dn(A;X), it follows that e* >_
d (A;X) for any choice of n continuous linear information functionals. In
particular
inf E* > dn(A;X). Ln
Let us also recall the definition of the linear n-width 5 (A;X). n
<$n(A;X) = inf sup ||x - P x| | n Pn xeA
n
n where Pn is an operator of the form Pnx = J x*(x) x^. Taking the infimum over Pp is equivalent to taking the infimum o v e f u * } " and {x.}?. If we fix the
{x^}" and take the infimum over the {x*}" then we are searching for a best
continuous linear approximation to A from span { x-,,... ,x }. On reversing the
process by fixing the { x*>y and taking the infimum over the (x.}?, we are
searching for the optimal continuous linear algorithm based on the information {x^}". As such,
6n(A;X) >_ inf {E*: L n}.
66 A. PINKUS
Thus the two n-widths 6 and d are upper and lower bounds, respectively, in
the problem of optimal recovery of A with n optimal linear continuous pieces
of information. If 6n(A;X) = dn(A;X), and Pn is optimal for 5n(A;X), then Pn
gives rise to a continuous linear optimal algorithm for this problem ( see
e.g. Theorem 7 ).
BIBLIOGRAPHY
1. Kolmogoroff, A., "liber die beste Annaherung von Funktionen einer gegebenen Funktionenklasse", Annals of Math., 37 (1936), 107-110.
2. Krein, M.G., Krasnosel'ski, M.A., Milman, D.P., "On deficiency numbers of linear operators in Banach spaces and on some geometric problems", Sb. Trudov Inst. Mat. Akad. Nauk SSSR, 11 (1948), 97-112.
3. Micchelli, C.A., Rivlin, T.J., "A survey of optimal recovery" in Optimal Estimation in Approximation Theory, eds. C.A. Micchelli, T.J. Rivlin, Plenum Press, New York, 1977, 1-54.
4. Micchelli, C.A., Rivlin, T.J., "Lectures on optimal recovery", preprint.
5. Pietsch, A., Nuclear Locally Convex Spaces, Springer-Verlag, Berlin, 1972.
6. Pinkus, A., n-Widths in Approximation Theory, Springer-Verlag, Berlin, 1985.
7. Pinkus, A., "n-Widths of Sobolev spaces in L^", Constr. Approx., 1 (1985), 15-62.
8. Rivlin, T.J., An Introduction to the Approximation of Functions, Blaisdell, Waltham, Mass., 1969.
9. Traub, J.F., Wozniakowski, H., A General Theory of Optimal Algorithms, Academic Press, New York, 1980.
DEPARTMENT OF MATHEMATICS TECHNION HAIFA, ISRAEL
Proceedings of Symposia in Applied Mathematics Volume 36, 1986
Algorithms For Approximation
E. W. CHENEY
ABSTRACT. The solution of almost any concrete problem of approximation will require an algorithm ("recipe") for producing a solution. The advent of highspeed computing made it possible to calculate best approximations by means of iterative methods. Several procedures for this type of problem are outlined here. They illustrate some of the techniques used in the construction of algorithms and some of the criteria by which algorithms are judged.
1. Best Approximation from a Finite-Dimensional Subspace. A general problem in approximation can be stated thus: a normed linear space X and a finite-dimensional subspace Y in X are prescribed. For x £ X we define the distance from x to Y by means of the equation
6vit{x,Y)=M{\\x-y\\:yeY} .
We then seek to determine a "best approximation" of x in Y. That term describes any element y in Y such that
| | s - y | | = dist(x,F) .
A straightforward attack on this problem involves first the selection of a basis for y , say {&i,. . . , bn}. Then one attempts to locate a minimizing point for the functional A : ET -> R defined by
n
A(X1,...,Xn) = \\x-^2xibi\\
The functional A has some endearing properties: it is continuous, nonnegative, and convex. On the other hand, it may be nondifferentictble, and it may be expensive to compute. The task of determining one or more minimizing points
1980 Mathematics Subject Classification. 41A45, 41A20, 41A50, 65D15
Key words and phrases. Algorithms, best approximation.
© 1986 American Mathematical Society 0160-7634/86 $1.00 + $.25 per page
67
http://dx.doi.org/10.1090/psapm/036/864366
68 E.W. CHENEY
for A can be turned over to a general-purpose computer program designed for
minimizing "arbi trary" real-valued functions of n real variables. Alternatively,
one can employ a program tha t takes advantage of the convexity of A. Further
degrees of specialization are possible in the codes used. Some procedures exploit
the fact tha t A arises from a norm. Finally, we can use algorithms which are
tai lor-made for the part icular norm and the part icular subspace in the problem
at hand.
A general-purpose algori thm which applies to any normed linear space X and
to any finite-dimensional subspace Y will now be described. Its roots lie in the
work of E. Ya. Remes, and it is sometimes referred to as the First Algorithm of
Remes. See [Remes, 1934].
We begin by selecting a "norm-determining" subset 3> in the conjugate space
X*. This means tha t for each z G l ,
||*|| = m a x { M x ) | : * € * } .
For example, $ can be the entire unit ball in X*, or the surface of the unit ball,
or the set of extreme points of the unit ball, or a set of "half of the extreme
points" . (Clearly, if <f> G $ , it is not necessary tha t —<f> G $.)
The problem to be solved is tha t of determining, for a given x G X , one or
more points y EY for which
| | x - y | | = d i s t ( x , y ) .
The algori thm breaks this problem into a sequence of simpler problems. The
procedure is iterative, and produces a sequence t/i, t/2> • • • in Y with the property
lim \\x — yk\\ = dist(x, Y) . k—*oo
At the fc-th step of this algorithm, a finite subset $& C <£ is given. This finite
set induces a semi-norm in X via the equation
||u||fc = max \<f>(u)\ (u G X) .
One then computes an element yk G Y to minimize the expression ||x — y\\k.
This is a more elementary problem than the original one because $k is finite,
and some s tandard techniques drawn from the subject of linear programming are
applicable. Having found y^, we choose an element <f>k in $ so tha t
\4>k{x-yk)\ = \\x~yk\\ .
The A;-th step terminates with the adjoining of <j>k to $k to form $k+i'
The minimization problem tha t must be solved in step k is not as formidable
as it may seem, since the vector yk-i from the preceding step is a good s tar t ing
ALGORITHMS FOR APPROXIMATION 69
point in searching for y^. If $£ = {^ i , . . . , <^m} and if { 6 1 , . . . , bn} is a basis for
F , then yk will have the form ]Cy=i ^ J ' ^ P a n ^ t n e coefficients A 1 } . . . , An must
be chosen to minimize the expression
n
max i X ^ i ' M M - ' M z ) ! • Kt<m *—*
~ ~ 3 = 1
This is a s tandard problem in mat r ix analysis, and good software is available to
solve it. See [Bartels and Golub, 1968], [Barrodale and Phillips, 1975], [Cline,
1976], and [Bartels, Conn and Charalambous, 1978]. The theoretical basis for
these mat r ix algorithms is discussed in Chapter II of [Cheney, 1966].
The initial set $ 1 in this algorithm should be chosen so tha t || ||i is a genuine
norm on Y. This can always be achieved with a set of n elements if Y has
dimension n. Having made this assumption, we can now prove tha t
lim || x — yk || = dist(x, Y) = d . k—*oo
One s tar ts with an obvious inequality, valid for 1 < k < i and y € Y:
\\x - y||i < \\x - y\\k < \\x - y||» < ||z - y|| .
From this it follows immediately tha t
| | z - yfc||i < \\x- yk\\k < \\x- yi\\i < d .
Obviously the sequence [yk] is bounded in the norm || | | i , and hence also in the
norm || ||, since Y is finite-dimensional. The sequence possesses cluster points ,
and we let y* be any one of them. If € > 0, select k so tha t ||yfc — y*|| < e; then
select i > k so tha t ||yt- — y*|| < e. It now follows tha t
< * < | | * - y * | | < | | x - y f c | | + €
= \4>k{x) -<f>k{yk)\ + e
= \\x- yk\\% + e
<\\*-1H\\i + \\Ui-V*\\i + \\y*-yk\\i + e
<d+3e
Since e was arbitrary, \\x — y*|| = d. This proves tha t each cluster point of the
sequence [y*.] is a best approximation of x in Y.
In order to prove tha t ||x — y^|| —> <i, s tar t with the observation tha t the
sequence pk = \\x — y^W is bounded. Let p* be any cluster point of [pk]> and let
Pki —+ P*'• Let y* be a cluster point of [y/b,.]. Then p* = \\x — y*\\ = d by the first
half of our proof. Thus the bounded sequence \pk\ has only one cluster point,
and must therefore converge to it.
It is clear tha t in si tuations where x has a unique best approximation in
y , the sequence of approximants yk in the algorithm will converge to the best
approximation.
70 E.W. CHENEY
An impor tan t proper ty of this algorithm is tha t at each stage of the i teration, an upper and a lower bound are available for the unknown number d = dist (x, Y). In fact, we have
\\x-yk\\k<d< min | | x - yi\\ . l<t<k
The lower bound in this inequality converges monotonically upward to d) and
the upper bound converges monotonically downward to d.
A further observation is tha t the algorithm is not limited to finite-dimensional
subspaces, nor even to linear subspaces. The computat ions tha t must be carried
out in each step of the algorithm may be more difficult for a more general set of
approximants , but the basic strategy of the procedure can still be followed.
For approximation in a space C(S) of continuous functions on a compact
Hausdorff space £ , with norm ||a;|| = m a x 5 | z (s ) | , the set $ is taken to be the
set of all point-evaluation functionals <f>3:
4>3{x) = x(s) se S , xeC(S) .
In practical realizations of this algorithm, it is possible to keep the sets $k from
growing bigger by eliminating one "old" element from $& at the same t ime tha t
the "new" element <f>k is to be added. This must be done with some care. The
resulting algori thm is sometimes called the "Exchange Method" . See [Stiefel,
I960].
THEOREM. The sequence [yk] generated by the Remes First Algorithm has at
least one cluster point. Each cluster point is a best approximation of x in Y.
Furthermore
\\x- yk\\k < dist(x,Y) < min | | x - y i | | i<*<«
and these bounds converge monotonically to dist{x>Y).
2. T h e s e c o n d a g o r i t h m of R e m e s . This algori thm is designed solely
for linear approximation in a space of continuous functions on a compact in
terval [a, 6]. In the space C\a) 6], an n-dimensional subspace of approximants is
prescribed, and it must have a special property called the .Haar property. An n-
dimensional subspace Y in C\a% b\ is a H a a r subspace if each nonzero element of
Y has at most n— 1 zeros in [a, b]. This is an abstraction of the crucial property
possessed by the polynomials of degree < n. As in Section 1, we wish to be able
to compute the best approximation in Y for an arbi t rary element x in C[a, b].
The point of discussing another algorithm is tha t this new one, al though limited
in applicability, will be much more efficient than the one in Section 1. In fact,
under favorable circumstances the new one will converge quadratic ally. This
means tha t the successive approximations y i , j/2> • • • generated by the algorithm
will converge to a best approximation y* in accordance with an inequality of the
form
| | y * - y f c + i | | < c | | y * - y f c | | 2 .
ALGORITHMS FOR APPROXIMATION 71
This 2nd algori thm is also iterative. In the A:-th step, a subset Sk of [a, b] is
given. Each set Sk will contain exactly n -f 1 points, n being the dimension of
Y. As in the first algorithm, it is convenient to use the semi-norm
||u||fc = m a x { | u ( s ) | : s G Sk} .
Each of these is a genuine norm on Y because of the Haar condition. The general
theory tells us tha t there is a unique element yk € Y for which ||x — yk\\k is a
minimum. This element yk is characterized by the fact tha t x(s) — t/fc(s) has
the magni tude ||x — yfc||fc at each point of Ski and exhibits alternating signs as
s runs over Sk from left to right. A typical graph of x — yk in the case n = 4 is
shown in the figure. A new set Sk+i is constructed by taking the abscissae of
n-f 1 local ext remum points of x — yk, care being exercised to ensure tha t x — yk
alternates in sign on the points of Sk+i.
In the figure, the points marked "x" compose the set Sk. The abscissae of the
points marked uo" compose the set Sk+i> The set Sk+i is further required to
have the proper ty tha t \x{s) — t/fc(s)| > ||z — yk\\k f ° r each point of £/e+i, and
the proper ty t ha t \\x — t/fc||/e+i = ||x — yk\\. These restrictions make the choice
of 5/e+i a bit complicated.
THEOREM. The successive approximants yk converge to the best approximation
y* of x in Y. The errors converge to zero at least linearly: \\yk — y*\\ < C6k,
with 0 < 6 < 1. If x and the elements ofY are continuously differentiable and if
the endpoints of the interval are maximum points of \x — y*\, then the algorithm
is quadratically convergent: \\y* — r//c+i|| < c||y* — t/fc||2.
The proof of quadrat ic convergence has been given in [Veidinger, I960]. The
linear convergence proof can be found in [Cheney, 1966]. The facts about Haar
72 E.W. CHENEY
subspaces which were used in the preceding discussion can also be found there.
The order-structure of the real line is essential in the 2nd Remes algori thm,
and no satisfying generalization to arbi trary Banach spaces is known. The algo
r i thm is therefore strictly limited in applicability, but the rapid convergence is a
compensat ing factor which makes it very popular .
The 2nd algori thm of Remes has been adapted for solving various nonlinear
approximat ion problems in the supremum norm. In every case, it is necessary
to have a characterization of best approximations in terms of the equi-oscillation
of the error function x — y*. We say tha t a function u on [a, b] e q u i - o s c i l l a t e s
n t imes if there exist n + 1 points
«o < si < • • • < sn
in the interval such tha t u(s»_i )u(s t ) = — | |u| |2 for 1 < i < n. Wi th this
terminology, the Chebyshev Alternation Theorem can be s ta ted thus: In order
t ha t an element y in an n-dimensional Haar subspace Y C C[at b] be the best
approximation to an element x in C[a, 6], it is necessary and sufficient tha t the
error function x — y equi-oscillate n t imes. The theorem is true also for pseudo-
norms of the type | | Z | | F = S U P « G F lx(5)l> provided tha t F is closed in [a, 6].
Similar theorems are available for approximation by quotients y/z, where y G
F , z G Z, z > 0, and Y and Z are finite-dimensional subspaces of C[a} b]. For
purposes of i l lustration, we give the classical case. Let I I n denote the space of
polynomials of degree < n, regarded as a subspace of C[a, 6]. For u G C[a, b] we
write u > 0 if u(s) > 0 for all s G [a, b], A useful approximating class is defined
by
K, = {p/q: P e nn , q e nm , q > 0}. Each element x of C[a, b] possesses a unique best approximation y* in the set
R^. It is characterized by the property tha t x — y* must equi-oscillate n-\- m +
2 — i t imes, where i is the largest integer for which y* G -RJ^JlV The "normal"
case occurs when y* ^ R%n^i> ^ n e n ^ n e number of equi-oscillations is (at least)
n + m + 2.
References to the Remes algorithm for ra t ional approximation are [Werner,
1962], [Fraser and Hart , 1962], [Ralston, 1965], [Wetterling, 1963] and [Hart et
al., 1968]. For general nonlinear approximation problems see [Novodvorskii and
Pinsker, 1951], [Shenitzer, 1957] and [B raess, 1967],
3 . T h e Dif ferent ia l C o r r e c t i o n A l g o r i t h m . This algori thm is applicable
to the ra t ional approximation problem (discussed in the preceding section) and
for a generalized version of tha t problem, which we now describe. In a space
C ( S ) , two subspaces Y and Z are prescribed, and it is desired to approximate
an element x of C(S) by a function y/z} where y G Y", z G Z% and z > 0. The
restriction z > 0 means tha t z(s) > 0 for all s G S. It entails no loss of generality
in the classical case, when S is an interval, Y = Uni and Z = n m .
The idea of the differential correction algorithm [Cheney and Loeb, 1961] is as
follows. Let y G F , z G Z and z > 0. Pu t M = ||x — y/z\\. We wish to compute
ALGORITHMS FOR APPROXIMATION 73
small corrections 6y and 6z to y and z so t ha t
II* - (y + 8y)/{* + **)|| = Af - 8M
with 8M > 0. Pu t t ing y = y-\-8y and £ = z-\-8z, we have the following pointwise
inequalities:
\x - y/z\ <M-8M
\xz — y\ < zM — z8M = zM — z6M — Sz6M
\xz — y\ — zM 8z6M
For small per turbat ions , we can ignore the second-order term 8z8M. Since 8M
is to be as large as possible, it seems reasonable to select y and z to minimize
the expression
^^\x{s)z(s) - y(s)\ - z{s)M
» z («)
This minimizat ion must be done with a suitable normalization for z} such as
||21| = 1, because otherwise the expression above can be driven to —oo with cer
tain choices of y and z. In the differential correction algorithm, these corrections
are made iteratively and produce thereby a sequence of approximants r^ = yk/zk
which, under certain conditions, will converge to a best approximation of x.
Investigations of this algorithm are still going on. The quadrat ic convergence
was established in the discrete case in [Barrodale, Powell and Roberts , 1972].
Quadra t ic convergence for approximation on an interval was proved in [Dua and
Loeb, 1973]. An adaptive version of the algorithm has appeared in [Kaufman,
McCormick and Taylor, 1983]. Various further results have been given in [Kauf
m a n and Taylor, 1981] and [Powell and Cheney, 1986]. See the impor tan t survey
[Taylor, 1985].
In the "classical case", we take 5 to be a compact interval on the real line,
and we define
Ki = {vl* : V € n n , z e n m , z > 0 on S) .
The principal result, due to Dua and Loeb is this:
THEOREM. Let x be an element of C(S) such that distfeyR^) < dt3t(x9R£l\).
If the differential correction algorithm is started with an approximation r*o such
that ||x —ro|| < dist(x, R^^i), then the sequence [r*] converges quadratically to
the best approximation of x in R^.
For "generalized" ra t ional approximation, in which S is arbi trary and arbi t rary
subspaces replace I I n and I I m , the si tuation is not completely unders tood. If an
e > 0 is fixed, and if the approximating family is defined to be
R = {y/z :yeY, zeZ , e < z(s) < 1 on S } ,
74 E.W. CHENEY
then a great simplification occurs in the analysis of the algori thm. In the first
place, each x € C{S) will possess a best approximation in R, provided only
tha t R is nonempty and tha t the subspaces Y and Z are finite dimensional. See
[Kaufman and Taylor, 1981]. These authors also prove the quadrat ic convergence
of the algori thm under certain conditions. The advantage of the differential
correction algori thm over the Remes Second algori thm lies principally in its
wider applicability, part icularly to problems involving functions of two or more
variables.
The constrained minimization problem tha t occurs in each step of the differen
t ial correction algori thm is to find y EY and z G Z to minimize the expression in
(1), subject to a constraint such as ||£|| = 1. This is usually done approximately
by taking a finite subset of S and applying a linear programming code.
4 . T h e D i l i b e r t o - S t r a u s A l g o r i t h m . This algorithm appeared in [Dilib
erto and Straus, 1951], and solves the following approximation problem. We
are presented with a continuous function of two variables, i.e., an element x
of C(S X T ) , where S and T are compact Hausdorff spaces. It is desired to
approximate x as well as possible in the form
x(s,t) « u(s) + v(t)
where u and t; are chosen freely in C(S) and C(T), respectively. Even in the
simplest case, where S and T are finite sets, the algorithm is of practical impor
tance because it solves a problem of opt imal scaling (or "preconditioning") of
matrices.
This algori thm shares with the algorithms discussed previously the property
of being i terative. The formulas defining it are these:
X0 = X X2n+1 = X2n ~ «n ^2n+2 = X 2 n + 1 ~ Vn
un (s) = | max x2n (s, t) + \ min x2n (s, t)
vn(t) = | m a x x 2 n + i ( 5 , t ) + | m i n x 2 n + i ( 3 , t ) . s s
A moment ' s reflection will convince one tha t un is a best approximation of x2n
by a function of s alone. Indeed, if 5 is momentari ly held fixed, then the number
un(s) is the constant which best approximates xs. Here x9 denotes the s-section
of x defined by x8(t) = x(s , t). The t-sections of x are defined by xt(s) = x(s , t).
In the same way, we see tha t vn is a best approximation of x 2 n + i ^Y a function
of t alone. The formulas defining un(s) and vn(t) produce continuous functions,
by elementary arguments .
A crucial property of the algori thm is tha t the iterates x i , x 2 , . . . form an
equicontinuous sequence in C(S X T) . The sequence is also bounded and has the
property | |x n | | j dist(x, F ) , where Y now denotes the subspace C(S) + C(T) in
C(S X T ) . All of this was already established by Diliberto and Straus. A proof
t ha t the sequence [xn] converges in the space C(SxT), i.e., converges uniformly,
was eventually given in [Aumann, 1959]. Since it is clear from the construction
ALGORITHMS FOR APPROXIMATION 75
t ha t x — xn G Yy we conclude tha t limn_+00(a; — xn) is a best approximation of
x in Y. (The subspace Y is closed.)
Although it is possible to construct examples in which the convergence of this
algorithm is slow, it works quite well in most practical problems. In [Bank,
1979], the Diliberto-Straus algorithm is used for preconditioning the matrices
which occur in numerically solving a par t ia l differential equation. Only a few
steps of the algori thm are needed to accomplish the desired purpose. Some recent
work on this algori thm is contained in [Light and Cheney, 1980], [von Golitschek
and Cheney, 1979, 1983] and [Dyn, 1980].
The algori thm works in an arbi trary Banach space X wi th two subspaces
U and V provided tha t these subspaces have c e n t r a l p r o x i m i t y m a p s , and
provided tha t U -f V is closed. A m a p A : X —-• U is called a p r o x i m i t y m a p
if | |x — Ax\\ = dist(x, U) for each x £ X. The proximity m a p A is said to be
c e n t r a l if
||x — Ax -f u|| = \\x — Ax — u\\
for all x G X and all u G U. In [Golomb, 1959] it is proved tha t under the
hypotheses given above, | | z n | | i dist(x, Y) , with Y = U -j-V. The exposition
in [Light and Cheney, 1985] is recommended for this result. If X is uniformly
convex, then lim(x — xn) exists and is the best approximation of x in Y.
In terms of the proximity maps A : X —• U and B : X —• V, the algori thm
reads as follows
XQ = X ,
^ 2 n + l = X2n — Ax2n ,
Z2n+2 = Z2n+1 ~ # Z 2 n + l
Unfortunately, the hypothesis tha t bo th proximity maps be central is very
restrictive. A notable case is Hilbert space, in which every orthogonal projection
onto a closed subspace is a central proximity m a p . Wi th this observation, we
recover an older theorem of [von Neumann, 1950] which states t ha t in Hilbert
space, the orthogonal projection of x onto U -f V is l im n (x — x n ) , where xn is
as above. Thus it transpires tha t von Neumann 's algorithm, which he s ta ted
for Hilbert space, is the same as the algorithm of Diliberto and Straus, which
they stated for a space C(S X T) . But von Neumann 's algorithm (also called the
"al ternat ing algori thm") works for any pair of closed subspaces in Hilbert space,
while the Diliberto-Straus agori thm works for C(S) + C ( T ) , as a subspace of
C[SxT). Generalizations of the algorithm for approximating functions in Li(S X T)
by functions in Li(S) -f Li(T) have been given in [Light, McCabe, Phillips
and Cheney, 1982], [Light and Holland, 1984], [Light, 1983] and [Light, 1984].
Generalizations to smooth and uniformly convex spaces have recently been given
in [Deutsch, 1979], [Pranchetti and Light, 1984, 1985] and [Deutsch, 1984].
There is a na tu ra l extension of the algorithm to subspaces of C(S X T) having
the form G <S> C(T) + C(S) ® H. Here G and H are finite-dimensional subspaces
in C(S) and C(T) respectively. If G has a basis {gi,..., gn} then G (8) G(T) is
76 E.W. CHENEY
the linear space of all functions
n
where the coefficient functions y» run over C(T). The subspace C(S) ® H is
defined similarly. The functions being used for approximants here are therefore
very general. Good approximations can be obtained by the so-called blending
methods introduced in [Gordon, 1971]. But methods for producing best approx
imations are still lacking. Tha t the na tura l extension of the Diliberto-Straus
algori thm fails was proved in [Dyn, 1980]. See also [von Golitschek and
Cheney, 1983]. For these more general subspaces, questions of existence of best
approximations remain open.
5 . R e c e n t W o r k o n N o m o g r a p h i c F u n c t i o n s . A continuous bivariate
function x is said to be nomographic if there exist continuous univariate functions
u, v and / such tha t
as(a, 0 = /(«(»)+«(*) ) • In some intuitive sense, a nomographic function should have simpler s tructure
t han a completely general bivariate function. As an example, the function cos(st)
is nomographic on the domain where s > 0 and t > 0, since
cos(s£) = cos oexp( logs + logt) .
Nomographic functions derive much of their interest from the fact tha t they are
"building blocks" from which all continuous functions on H 2 can be constructed.
In fact, the following remarkable theorem of Kolmogorov and Arnold is valid:
THEOREM. Every continuous bivariate function on the square, 0 < s < 1, 0 < t < 1, is a sum of at most five nomographic functions.
Various refinements of the Kolmogorov-Arnold Theorem have been made, and
the reader should consult [Lorentz, 1966] for these improvements.
An interesting open problem is to devise an algorithm for producing good
approximations by nomographic functions. Some progress in this direction has
recently been made by von Golitschek, whose algorithm will now be described.
See [von Golitschek, 1984],
We are given x e C{S x T ) , / G C(1R), g G C{S)9 and h G C(T). We seek
two functions u G C(S) and v G C(T) which yield a minimum deviation in the
approximation
(2) x{s,t)p*f(u{8)h{t) + v{t)g{a)) .
Further assumptions tha t are made are tha t / is strictly increasing, h > 0,
and g > 0. Of course, if we take h(t) = g(s) = 1, then in (2) we have a
nomographic approximation to the function x\ notice however, t ha t / has been
ALGORITHMS FOR APPROXIMATION 77
prescribed. Von Golitschek proves t ha t the opt imal u and t; exist, and his proof is
constructive (algorithmic). The sets S and T can be arbi trary compact HausdorfF
spaces.
The algori thm proceeds as follows. First, a value of the parameter a is selected.
Ideally, we would use a — p, where
p = inf ||x — / o [uh + vg)\\
but the value of p is usually not known. Next we define two functions to facilitate
notat ion: K{s,t) = f-1{x(s,t)-a)/g(s)h(t)
L{stt) = r1(x{8,t) + a)/g{8)h{t)
The algori thm generates two sequences [un] and [vn] s tar t ing with
u0(s) = 0 vo{t) = m f L(si t)
Subsequent functions are defined by the formulas
«n(s) = Wn- l (s) V SUp[it(s,t) - Vn_i(t)]
t>n(t) = v n - l M A mf [L(s, t) - Un(s)]
The algori thm has two "stopping criteria". These are tests to be made in each
step as follows. If vn = v n - i or vn < —2||L|| — | | if ||, STOP. In these equations,
V and A are the pointwise maximum and minimum operations. The formulas in
the algorithm permit us to make some immediate observations:
(i) un and vn are continuous.
(ii) 0 < uo < ui < • • •
(iii) v0 > v± > v2 > - • • (iv) vn< L-un
or un -h vn < L = Z"1 o (x -f a)/gh
or / o (ungh + vnhg) < i + a or —a < x — f o (ungh -f- vnhg)
(v) Similarly,
x - f o (ungh -h Vn-xhg) < a
(vi) If vn = vn-i then by the preceding inequalities,
II* - / o (ungh -f vngh)\\ < a
These elementary arguments establish the first half of the next theorem.
THEOREM. / / the algorithm stops at the n-th step with vn = vn-i then
||z ~ / © {ungh + vnhg)\\ < a
78 E.W. CHENEY
Hence a > p. If a > p then for some n, vn = v n - f i .
Sta ted informally, if a is chosen greater than the minimum deviation p, then
the algori thm will produce in finitely-many steps an approximation to x giving
precision a.
The dual result is as follows.
THEOREM. The inequality a < p is true if and only if the inequality vn <
—2\\L\\ — \\K\\ is true for some n.
The only case in which infinite sequences are generated by the algori thm is the
case when a = p. In this case, the sequences [un] and [vn] are equicontinuous
in C(S) and C(T). Furthermore, they are bounded. By the monotonicity of
these sequences, they converge to continuous functions, say u and v. Then our
previous inequalities show tha t
||x - / o [ugh + vhg)\\ < p
This provides the constructive proof of existence of an optimizing pair, which is
(ug, vh).
B I B L I O G R A P H Y
1. G. Aumann, Uber approximative Nomographic, I, II and III, Bayer. Akad. Wiss. Math.-Nat. Kl. S.B. (1958), 137-155; ibid. (1959), 103-109; ibid. (1960), 27-34. MR 22#1101, 22#6968, 24#B1289.
2. R. Bank An automatic scaling procedure for a d'Yakanov-Gunn iteration scheme, Linear Algebra and its Applications 28 (1979), 17-33.
3 . I. Barrodale and C. Phillips, Algorithm 495: Solution of an overdetermined system of linear equations in the Chebyshev norm, ACM Trans. Math. Software 1 (1975), 264-270.
4. I. Barrodale, M.J.D. Powell and F.D.K. Roberts, The differential correction algorithm for rational approximation, SIAM J. Numerical Analysis 9 (1972), 493-504.
5. R.H. Bartels, A.R. Conn and C. Charalambous, On Cline's direct method for solving overdetermined linear systems in the LOQ sense, SIAM J. Numerical Analysis 15 (1978), 255-270.
6. R.H. Bartels and G.H. Golub, Stable numerical methods for obtaining the Chebyshev solution to an overdetermined system of linear equations, Comm. Assoc. Comput. Mach. 11 (1968), 401-406, 428-430, ibid. 12 (1969), 326.
7. D. Braess, Approximation mit Exponentialsummen, Computing 2 (1967), 309-321. 8. E.W. Cheney, "Introduction to Approximation Theory", McGraw-Hill, New York, 1966.
2nd Edition, Chelsea Publ. Co., New York, 1982. 9. E.W. Cheney, Five lectures on the algorithmic aspects of approximation theory, in "Topics in Nu
merical Analysis", ed. by P.R. Turner, Lecture Notes in Math., Springer, New York, 1984. 10. E. W. Cheney and H.L. Loeb, Two new algorithms for rational approximation, Numerische Math.
3 (1961), 72-75. MR 22#21692. 11. A.K. Cline, A descent method for the uniform solution to overdetermined systems of equations, SIAM
J. Numerical Analysis 13 (1976), 293-309. 12. A.R. Curtis and M.J.D. Powell, On the convergence of exchange algorithms for calculating minimax
approximations, Computer J. 9 (1966), 78-80 13. F. Deutsch, Von Neumann's alternating method: the rate of convergence in "Approximation Theory
IV", C. Chui, L.L. Schumaker and J.D. Ward, eds., Academic Press, New York, 1984, 427-434.
14. F. Deutsch, The alternating method of von Neuman, in "Multivariate Approximation Theory", W. Schempp and K. Zeller, eds., ISNM Vol. 51, Birkhauser, Basel, 1979, 83-96.
15. S.P. Diliberto and E.G. Straus, On the approximation of a function of several variables by the sum of functions of fewer variables, Pacific J. Math. 1 (1951), 195-210, MR 13, p. 334.
ALGORITHMS FOR APPROXIMATION 79
16. S.N. Dua and H.L. Loeb, Further remarks on the differential correction algorithm, SIAM J. Numerical Analysis 10 (1973), 123-126.
17. C.B. Dunham, Chebyschev approximation by rationals with constrained denominators, J. Approximation Theory 37 (1983), 5-11.
18. C.B. Dunham, The weakened first algorithm of Remez, J. Approximation Theory 31 (1981), 97-98. MR 82m:41026.
19. N. Dyn, A straightforward generalization of Diliberto and Straus' algorithm does not work, J. Approximation Theory 30 (1980), 247-250.
20. C. Franchetti and W.A. Light, The alternating algorithm in uniformly convex spaces, J. London Math. Soc. (2) 29 (1984), 545-555.
21 . C. Franchetti and W.A. Light, On the von Neumann alternating algorithm in HUbert space, J. Math. Analysis and Applications (to appear).
22. W. Fraser and J.F. Hart, On the computation of rational approximations to continuous functions, Communications Association for Computing Machinery 5 (1962), 401-403, 414.
23. G.A. Gislason, An algorithm for constrained, nonlinear Tchebycheff approximation in "Theory of Approximation" ed. by A.G. Law and B.N. Sahney, Academic Press, New York, 1976, 298-307.
24. A.A. Goldstein, On the stability of rational approximation, Numer. Math. 5 (1963), 431-438. 25. M. von Golitschek, Shortest path algorithms for the approximation by nomographic functions in
"Approximation Theory and Functional Analysis" , P.L. Butzer, R.L. Stens and B. Sz.-Nagy, eds., Birkhauser Verlag, Basel, ISNM Vol. 65, 1984.
26. M. von Golitschek and E.W. Cheney, Failure of the alternating algorithm for best approximation of multivariate functions, J. Approximation Theory 38 (1983), 139-143.
27. M. von Golitschek and E.W. Cheney, On the algorithm of Diliberto and Straus for approximating bivariate functions by univariate ones, Numer. Funct. Analysis and Optimization 1 (1979), 341-363. MR 80g:41023.
28. M. Golomb, Approximation by functions of fewer variables in "On Numerical Approximation", R. Langer, ed., University of Wisconsin Press, Madison, Wisconsin, 1959, 275-327, MR 21#962.
29. W.J. Gordon, Blending-function methods of bivariate and multivariate interpolation and approximation, SIAM J. Numerical Analysis 8 (1971), 158-177, MR 43#8209.
30. J.F. Hart, et al., "Computer Approximations", John Wiley, New York, 1968. 3 1 . E.H. Kaufman, Jr., S.F. McCormick and G.D. Taylor, An adaptive differential-correction algo
rithm, J. Approximation Theory 37 (1983), 197-211. 32 . E.H. Kaufman, Jr. and G.D. Taylor, Uniform approximation by rational functons having restricted
denominators, J. Approximation Theory 32 (1981), 9-26, MR 84b:41014. 33 . W.A. Light, The Diliberto-Straus algorithm inLi (XxY), J. Approximation Theory 38 (1983),
1-8. MR 84h:41048. 34. W.A. Light, Convergence of the Diliberto-Straus algorithm in L\ [X X Y), J. Numer. Functional
Analysis and Optimization 3 (1981), 137-146. 35. W.A. Light and E.W. Cheney, "Approximation Theory in Tensor Product Spaces", Lecture
Notes in Mathematics, Springer-Verlag, New York, to appear, 1986. 36. W.A. Light and E.W. Cheney, On the approximation of a bivariate function by the sum of univariate
functions, J. Approximation Theory 29 (1980), 305-322, MR 82d:41023. 37. W.A. Light and S.M. Holland, The L\-version of the Diliberto-Straus algorithm in C{S X T ) ,
Proc. Edinburgh Math. Soc. 27 (1984), 31-45. 38. W.A. Light, J.H. McCabe, G.M. Phillips and E.W. Cheney, The approximation of bivariate
functions by sums of univariate ones using the L\-metric, Proc. Edinburgh Math. Soc. 25 (1982), 173-181.
39. G.G. Lorentz, "Approximation of Functions", Holt, Rinehart and Winston, New York, 1966. (To be reprinted by Chelsea Publishing Co., 15 E. 26th Street, New York, N.Y. 10010.)
40. F.D. Murnaghan and J.W. Wrench, The determination of the Chebyshev approximating polynomial for a differentiable function, Math. Tables and Other Aids to Computation 13 (1959), 185-193.
41 . E.N. Novodvorskii and I. Sh. Pinsker, On a process of equalization of maxima, Uspehi Math. Nauk 6 (1951), 174-181 (Russian).
42. M.J.D. Powell, "Approximation Theory and Methods", Cambridge University Press, 1981. 43 . M.J.D. Powell and E.W. Cheney, The differential correction algorithm for generalized rational func-
80 E.W. CHENEY
tions, CNA Report, University of Texas, 1984. 44. A. Ralston, Rational Chebyshev approximation by Hemes algorithms, Numer. Math. 7 (1965),
322-330. 45. E. Ya. Remes, Sur le calcul effectif des polynomes d'approximation de Tchebichef, C.R. Acad. Sci.
Paris 199 (1934), 337-340. 46. A. Shenitzer, Chebyshev approximation of a continuous function by a class of functions, J. Assoc.
for Computing Machinery 4 (1957), 30-35. 47. E.L. Stiefel, Note on Jordan elimination, linear programming and Tchebycheff approximation, Numer.
Math. 2 (1960), 1-17. 48. G.D. Taylor, The differential correction algorithm in "Delay Equations, Approximation and
Applications", ISNM Series, Birkhauser Verlag, Basel, to appear. 49. L. Veidinger, On the numerical determination of the best approximations in the Chebyshev sense,
Numerische Math. 2 (1960), 99-105. 50. J. von Neumann, "Functional Operators, Vol. II", Annals of Mathematics Studies 22,
Princeton University Press, 1950. 51 . H. Werner, Rationale Tschebyscheff-Approximation, Eigenwerttheorie, und Differenzenrechnung,
Arch. Rational Mech. Analysis 11 (1962), 368-384. 52. W. Wetterling, Ein Interpolationsverfahren zur Losung der linearen Gleichungssysteme, die bei der
rationalen Tschebyscheff-Approximation auftreten, Arch. Rational Mech. Analysis 12 (1963), 403-408.
Proceedings of Symposia in Applied Mathematics Volume 36, 1986
Algebraic Aspects of Interpolation
Charles A. Micchelli IBM T. J. Watson Research Center
P.O. Box 218 Yorktown Heights, N.Y. 10598
Introduction. This lecture contains basic facts about interpolation. As the title suggests we only discuss constructive methods for interpolation and do not address questions of convergence.
Conceptually, interpolation is the simplest method of approximation. A particular function is selected from a class of functions by the requirement that it match given values at a finite set of points in its domain.
Applications of interpolation in science and engineering are manifold indeed. Interpolation is a basic part of many numerical methods and so a rudimentary understanding of the elements of interpolation is important.
Much of what we present here is standard material and most books on numerical analysis and approximation theory treat this topic, [ 1 , 3 9 ] . We will try to contrast the relative simplicity of univariate interpolation with the greater complexity and challenge encountered in the multivariate case.
The lecture is organized as follows:
Part 1. Univariate Interpolation:
1.1. Polynomial Interpolation . 1.2. Trigonometric Interpolation . 1.3. Chebyshev Systems. 1.4. Spline Interpolation .
Part 2. Multivariate Interpolation:
2.1 Interpolation on Special Configurations . 2.2 Optimal Interpolation . 2.3 Radon Transform and Interpolation .
AMS (MOS) Subject Classification 41A05
Key Words: interpolation, divided difference, conditionally positive definite, Radon transform.
© 1986 American Mathematical Society 0160-7634/86 $1.00 + $.25 per page
81
http://dx.doi.org/10.1090/psapm/036/864367
82 CHARLE S A . MICCHELL I
Univariate Interpolation.
1.1. Polynomial Interpolation.
We denote by irn the set of complex polynomials of degree < n,
Pto = a0 + axx + ••• + a„K .
Theorem 1 . Given any distinct points *Q, ... ,xn+1 (real or complex) and data>'0. ... ,yn+i (also real
or complex) there exists a unique polynomial p e irn such that
/>(*,•) =yit i = 0, 1, ... ,n.
Proof. This is immediate since any nonzero polynomial/; e irn has at most n zeros. Alternatively,
the determinant of the linear system which determines the coefficients of the polynomial has the
value
n (xt - x). 0<i<J<n J
The interpolation procedure above can be written in terms of Lagrange polynomials,
(X - XfjU {X-)
w(x) = (x-x0) ... (x-xn)
as
n
Theorem 1 extends to interpolation of consecutive derivatives, sometimes called Hermite in-k
terpolation. Thus whenever mlt ... , mk are positive integers with Im, = « + 1 there is a unique
p e irn with
pU\tj) « / % ) . i = 0, 1 mj- 1J= 1 k.
We denote this polynomial by H(f | r0 /„)(/) where each /; is repeated with its multiplicity.
The leading coefficient of H{f | f0 tn)(t) is defined to be the n-th divided difference of f,
t'O <nV
ALGEBRAIC ASPECTS OF INTERPOLATION 83
and H{f | tQ, ... , tn) has the Newton representation
(1) H{f | tQ, ... , *„)(/) = X «'o» - . '/W(' - 'o) - . 0 - '/-!>• y-0
It can be shown that
k mt "-1
and even more can be said when m,= 1, i = 1, ... , k:
/up (2) [<o '„]/=£
.to.n.C,-'.) J i*J
Another formula of totally different sort says
(3) fro. - . . tnV= f fWiZ °tid°l .» d°n J Sn i -0
where
i -0
is the regular n-simplex. This formula, called the Hermite-Genocchi formula, makes it apparent
that whenever f is a polynomial, [t0, ... , tn]f is a polynomial in t0t ... , /„. One would have a dif
ficult time seeing this directly from (2).
Divided differences are quite useful. They are even helpful in a discussion of the plane wave
approach for the construction of the fundamental function of a hyperbolic equation. In this con
text, it was shown in [ 26 ] that whenever f is a polynomial and zlt ... , zm are the zeros of some
homogeneous polynomial Q(z, x) in (z,x) where x e C for some s, then [zu ... ,zm]f is again a
polynomial in x.
The Neville-Aitken formula
W I 'o '*X0 = ' * - 'o
84 CHARLE S A . MICCHELL I
can be used to evaluate H(f)(t). Also, we mention that the Newton form for Hermite interpo
lation , (1), can be evaluated by n nested multiplications
Pit) = ( . . . {dn{t - tn_x) + dn_x){t - tn_2) + ••• + dx){t - /0) + d0> dj=[tQ, ... , t)f.
1.2. Trigonometric Interpolation.
A trigonometric polynomial has the form
n
t{x) = OQ + S (ajcos Jx + bjsmjx). / - I
We use Tn to represent the class of all such functions.
Theorem 2. Given any N = 2n+1 distinct points XQ, ... , xN_t in some interval of length < 2tr and data
o» ••• »>V-i there is a unique t e Tn such that
(4) '(**)=*•, i = O fl AT- 1.
The Lagrange form for t(x) is
N-l
tix) = YJ M W
I -O
where N-l X — Xk N-l Xt — Xk
tAx) = n sin — / II sin — 1 Jfc-0 2 *-0 2
km km
Proof. The proof is as in the polynomial case. We may argue either that a trigonometric
polynomial in Tn has at most N-1 zeros in any interval of length < 2IT or that the determinant
of the linear system (4) has the value
N+l „ . xi~~xj 2 n sin ——A
0<i<j<N-l 2
When the points are equally spaced, xk = 2irk/N, k = 0, 1, ... , N — 1, an important case for ap
plications, more information is available.
Theorem 3 . Let t(x) = X ae'JX and y-0 r
ALGEBRAIC ASPECTS OF INTERPOLATION 85
«,-i |V-*,-o,. »-. lvi_
03 = e N
then
Proof.
*<**)=.&. * = 0, 1, . . . ,N- 1.
N-l 1 N-l _Jg
tf-1 /V- l
E l V> ~jt Jk
yt-ji 2J<* a
In this form, the coefficients of t{x) can be computed in 0{N log N) multiplications by using
the fast Fourier transform, see Cooley, Tukey [ 7 ] .
1.3 Chebyshev Systems.
The previous results suggest the following
Definition 1. A linearly independent set of continuous functions {u0(x), ... ,u„(x)} is called a
Chebyshev system on [ a,b ] whenever for any a <XQ< ••• <xn<b andy0, ... ,y„ e IR there is a R
unique u(x) = £ auXx) satisfying y-0 J J
u(Xi) = yt, i = 0, 1, ... ,n.
Thus the determinant of the linear system
\x0,xlf...,xn/ :det !"/(*/) II/,y-«o,...,*
W
has a fixed sign for a < XQ < • • • < x„ < b.
Chebyshev systems are important in approximation theory. They have been extensively
studied by many including S. Karlin [ 28 ] , S. Karlin, W. J. Studden, [ 27 ] , and M.G.Krein [30].
86 CHARLE S A 0 MICCHELL I
Of course, irn is spanned by a Chebyshev system on ( — oo, oo) and Tn is a Chebyshev space on
any interval of length < 2IT. We list below several other examples.
Example 1. uXx) = — , j: = 0, 1, ... , n , | x \ < 1, \> distinct and | Xt | < 1. These J 1 + Kfc J J
functions form a Chebyshev system since
( 0, 1, . . . , / ! \ n ) = n a j
xQtxlt...txn/ y-0
n (a— ak)(xj-xk) 0<k<J<n J J -1
Il(oj + xk) J J
(cf.Achieser [1]) .
The fundamental functions for interpolation at x, = X, are
L{x) = -—. , B{x) = n -±-
Example 2. u,(x) = ex ', i = 0, 1, ... , «, A, distinct, x e ( — oo, oo). To see that these functions
form a Chebyshev system we suppose that
has n+1 zeros then v = u — A0w has at least n zeros. Since
induction on the number of functions can be used to show that they form a Chebyshev system.
Example 3. Given any positive continuous functions w0(x) > 0, ... , wn(x) > 0, x e [a, b] then
ux{x) = wx{x) \ w0{o0)do0 J a
unW = *„(*) J w„- i ( f f „ - l ) j wn_2(an_2) . . . ^ . j . . . fi?a0, J a J a
form a Chebyshev system, see Karlin, Studden [ 27 ] . More examples are given in [ 27,28 ] .
1.4. Spline Functions.
ALGEBRAIC ASPECTS OF INTERPOLATION 87
Definition 2. Let 0 = £o < ii < '" < Zm < £m+i = 1, Am = {£,}m, and set
^ ( A m ) = {S:S | U[tc+i) e *„, 5 £ C""1©, 1)}.
An element of ^„(Am) is called a spline function of degree n with knots at £lt ... , £m.
It is easy to see that dim ^7„(Am) = n + m + 1.
Spline functions are a frequent choice for interpolation of experimental data. Usually, in
practical application splines of degree five or less are used. The degrees of freedom intro
duced by the knots are then used to interpolate data. Unlike polynomials, spline functions have a
local character. The spline curve can be altered in a part of its domain without dramatically af
fecting it elsewhere. Spline functions are especially useful for constructing shape preserving
(monotone/convex) interpolation. As for the spline interpolation problem we next present a
fundamental result of Schoenberg and Whitney [ 38 ]
Theorem 4. Given any points 0 < x1 < ••• < x„+m+1 < 1, n > 2, and datay l ty2, ... . A+m+i there ex
ists a unique spline S e ^?„(Am) such that
S(Xj) = yt, i = 1, 2, ... , n + m + 1
if and only if
Xi<Si<Xi+„+l, i = 1,2, . . . , m .
Proof. A proof of this result can be based on RoUe's theorem and induction on m and n. A proof
using determinants is given in [28 ] .
The recommended approach for computing a spline interpolant is to express it in terms of
B-splines. Thus we write
m
S(x) = 2<*«C*I€ , |,.+„+1) —n
where the additional knots are chosen arbitrarily andA/(x | £,-,..., I l+n+i) is the B-spline defined
by
(6) M{x \i, ii+n+1) = [«, ii+n+1]( • - x)l
88 CHARLES A0 MICCHELLI
(x* = ( max(x,0))rt). This leads to a banded linear system for the B-spline coefficients which can
be solved by Gaussian elimination.
For cubic spline interpolation a tridiagonal system results because each B-spline has three
knots in the interior of its support. Frequently, the knots are chosen to be the point of interpo
lation. Thus additional conditions are required to specify the spline uniquely .These can be chosen
as boundary conditions at 0 and 1 (rather than additional interpolation conditions). For a dis
cussion of various choices for these boundary conditions and numerical methods for the solution
of the corresponding tridiagonal systems see de Boor [ 3 ] .
The special structure of the matrices for periodic spline interpolation ad
mits fast algorithms for their computation. Given any periodic data y0, ... ,yN_lf yl+N = yit we
pass an odd degree periodic spline with knots £, = i/N, i = 0, 1, ... , N through the data,that is,
5 U , ) = ^ J = 0 , 1 , . . . , A T - 1 , Se&^iAx)
SU)(0) = SU\l), i = 0, 1, ... , 2r - 2.
The existence of a unique periodic spline interpolant can be established by showing that any peri
odic spline S(x) satisfies
f s (WW= o
for any periodic function g e C(r)[0,l] which vanishes at £,, i = 0, 1 AT— 1. If we expand S
in a Fourier series
$(*) = £ s/*iJx — 00
then s.•,= T/i; where af = — X yku J is the discrete Fourier transform of the data, see (5), and
J j J J N *-o
Tj are attenutation factors given by T0 = 1, Tk = 0, k = 0( mod JV) and otherwise by
/ sin irxk \ T * = \ vxk ) ?2r( c o s ™A:)> k^O(modN),
where qt are polynomials given recursively by
2
4 0 ( 0 = 1 , g/(0 = fr/.i(/)+ ° ^ /1
) q'l-iiO, / = l , 2 , . . . ,
ALGEBRAIC ASPECTS OF INTERPOLATION 89
[ 17 ] . Two useful applications of periodic spline interpolation are given in [ 22 ] .
Spline interpolation methods are known to have many optimality properties. Next we de
scribe a particularly striking instance of optimal spline interpolation (see also Theorem 10). For
this purpose we need
Definition 3. A perfect spline P of degree n with knots £lt . . . , £ m such that — « =
£o < £i < *' * < £m < £m+i = °° i s a function with the following properties:
P e Cn~\ - oo, oo), P j ( y e vH, i = 0, 1, ... , m and P{n\x) \ {i.A. } = ( - l ) ' a for some a e R.
The following result is proved in [ 34 ]
Theorem 5. Given any points 0 < xt < • • • < xn+m < 1 there is a unique perfect spline P (up to a sign
) with knots £lt ... , £m such that P(x,) = 0, i = 1, . . . , « + m, and
1 P{n) || m = max | P{n\x) 1 = 1 . Moreover, there is a unique S e ^„_1(AW) such that 0<x<l
5(x,) = /(*,), i = 1, ... , n + m
and this interpolant satisfies
(7) I S{x) - / ( * ) | < | P{x) | | | / n ) IL, 0 < x < 1.
Furthermore, there is no function ;4:IRn+m-*IR such that A(J{x^), ... ,f(x„+m)) gives a better estimate
for f{x) , in the sense of (7), than S(x)
In the terminology of [ 33 ] , S(x) above is the optimal recovery of f(x), in ^ [ 0 , 1] from the
information / ( x j , ... ,/(xB+m). The interpolant described in Example 1, is the analog of this
interpolant for the Hardy space on the disk, see [ 15 ] . In this case we suppose that f(z) is ana
lytic on the disk and the error in estimating f(x) using the information/(\0), ... , f{Xn) is bounded
by the norm —— / * | f(e,e) \ 2d6. See the lecture of A. Pinkus for more on optimal recovery.
Part 2. Multivariate Interpolation.
Chebyshev systems are helpful for studying univariate interpolation. Unfortunately, though,
as soon as we turn to multivariate interpolation we must leave them behind since there are no
Chebyshev systems on IR\ s > 2, [ 31 ] . To see this , suppose to the contrary that
UQ, ... ,uH,n> \, are continuous functions on 1R' with the property that
90 CHARLE S A . MICCHELL I
n det | ut(xJ) | 5* 0
for any distinct points x , ... ,xn e IR'. We join x0 ,*1 along nonintersecting paths
x\t),x\t), e R' /{x2 , ... ,xn} 0 < f < 1, so that (*°(0), x\0)) = (JC°,x1) and
(x°(l), JC^I)) = (x \*°) . Then the determinant above changes sign along these paths, which is a
contradiction.
Thus we see that there is no set of n universal functions which can be used for interpolation
at any n distinct points. We discuss below several ways to deal with this intrinsic difficulty of
multivariate interpolation. First we consider interpolation by polynomials.
Let 7r„(R') be the space of polynomials of total degree < n on IR" and /*n(R') the homogeneous
members of 17„(R5). with exact degree n .
Theorem 6. Given any distinct points x°, ... tx" e IR* there is a/? e ir^R') such that
/>(*') = # , i = 0, 1, ... ,n.
( n + s\ 1 > n + 1 for s > 2 and so the polynomial above is
not unique. Here is an easy way to construct such a polynomial: Choose any vector X e IR* such
that the points X • x, i = 0, ... , n are distinct.Then by Theorem 1 there is a polynomial
q e w„(JR}) satisfying
q(X*x) =yit i = 0, 1, ... ,n.
Thus p(x) = q(X • x) provides the required interpolant.
Hence we see that there is a universal set of interpolating functions provided we are willing
to use more functions than data ! This observation suggests several questions. The first is whether
or not there is a universal set of interpolating functions with dimension smaller than irn(R')?
Better yet, what is the minimal m such that there exists continuous functions M0, ... , um on RJ
with the property that for any distinct x°, ... , xn and dataj>0, ... ,yn the equations
YJ api(xJ) = yjf j = 0, 1, ... , n i-0
ALGEBRAIC ASPECTS OF INTERPOLATION 91
has a solution? We just showed that in the plane n < m < —(n + 2)(/i + 1). It is possible to do
better than this! In fact, if we view x°, ... ,xn e R 2 as complex number and use Theorem 1 we
see that the real and imaginary parts of zk, z = x + iy, k = 0, 1, ... , n can be used for interpo
lation; Thus in the plane m <2n + 1. For more information about this problem, see [ 37 ] .
The proof of Theorem 6 also suggests using the functions {uQ(X • x), ... , un(X • x)} for
mulivariate interpolation where {u0, ... , u„] is a Chebyshev system. Even spline functions can be
used provided due consideration is given to Theorem 5. We call this ridge function interpolation.
The advantage of this method is clear: univariate interpolation methods can easily be used for
multivariate interpolation. However, in general this approach is bound to give disastrous results.
Ridge function interpolants only vary in the direction of X . Thus two values of X • xl can be
close, while corresponding x/s and y's are far apart. Nevertheless, there are data sets well fitted
by ridge functions. For really difficult terrain, one may try using several directions,
{WQCA1 • x)t ... , u0(Xk • x), ... , K^A1 • x), ... , u„(Xk • x)} and search for good choices for
X , ... , X - projectional pursuit. Statisticians have investigated similar questions, [ 11 ] .
2.1 Interpolation on Special Configurations .
In this section we distinguish between methods of interpolation which are applicable for any
distinct points x , ... ,xn, scattered data interpolation and those which apply for special choices of
x°, ... , xn. For simplicity of presentation we restrict ourselves to R2. All our discussion extends
to IRJ, the only difference being in notational complexity.
There are many methods for interpolation in the plane at special points. Probably, the simplest
is tensor product interpolation on a rectangular grid .
Theorem 7. Given Chebyshev systems {uQ(x), ... , «„(*)}, x e (a,b) and {v0(y), ... , vm(y)},y e (c^),
and any points a < XQ < ••• < x„ < b, c <y0 < ••• <ym<d. Then for any data n m
yu, i = 0, ... n,j = 0,1, ... m there is a unique function fixy) = 1 1 dtp£x)vf(y) such that / 1=0 y'=0 J J
Ax^y,) =ytJ, i = 0, 1, ... ,n y = 0, 1, ... ,m.
The proof is straightforward. However, the importance of this result should not be underesti
mated. It provides a computational useful way to interpolate data on a rectangular grid and is
often used in applications (oil wells and mineral deposits are found with this method). Abstractly,
the interpolant is formed as a tensor product of univariate interpolation. Thus if (Ug)(x) is the
92 CHARLES A. MICCHELLI
unique interpolant to g at XQ, ... ,xn from {w0, ... ,un] and (Vg)(x) is similarly defined, then the
interpolant above is the tensor product of U and V. For example, tensor product polynomial in
terpolation has the Lagrange representation
/=0y= 0
where
z^x) = -f , mAy) = -t
{x - xt)u (*,-) y {x - yj)v (y)
<*{x) = UQ(X - xj), vO) = nQ(y - yt)
Our next observation provides a set of points for which interpolation by u„(IR ) is solvable.
Theorem 8. Let i0, £u ... , /„ be distinct parallel lines and x'°, ... ,xlyl distinct points on /,-. Then
for any dataj>l0, ... ,yttl, i = 0, 1, ... , n there is a unique/? e TTW(IR ) such that
P(*J)=yij • y'= 0,1, . . . , / , i = 0 , l f . . . . it
Proof. Bezout's theorem says that if f , g are polynomials of degrees n,m and intersect only at
points, they have at most nm simultaneous solutions, [ 20 ] . Thus if p e TT„(IR ) vanishes at
x1,0, ... ,x1^ i = 1, ... , n it must be a multiple of £lt ... , £n. (Actually, in this case we can see di
rectly that any polynomial of degree < n which vanishes at n + 1 points on a line must contain
that line as a factor.) Since it also vanishes at x ' it must be identically zero.
The Lagrange representation of p can be obtained inductively as follows: We let L be the
Lagrange interpolant to the data along £„. Then
p = L + £rtq
yiJ0 - L(x' ) where q is an interpolant to on £„ i = 0, ... , n — 1 and q e ir^OR ).
Another result of this type is the following interpolation scheme which came up in the study
of multivariate B-splines [ 10 ] . We will later give a "dual" version of this method.
ALGEBRAIC ASPECTS OF INTERPOLATION 93
Theorem 9. Let £0, ... , fB+1 be lines such that each pair of lines intersect at a point and every such
point lies on exactly two lines. If x1 xN are the points of intersection, iV = n(n + l ) / 2 then
given any data j ^ , ... ,yN there is a uniquep e ir^flR ) such thatp(x) = yit i = 1, ... , N.
The Lagrange polynomials for this interpolation method are easy to obtain. Suppose xl,j is the
intersection of £it ij then
II EAx)
"Vv
'A n ik{x)
In the remaining sections we consider methods for interpolation of scattered data.
2.2. Optimal Interpolation.
Here is a general procedure for interpolation of scattered data: We take a linear space X of
functions on some set DeR* and a semi-norm on X. As our interpolant, we choose/opt e A'such
that fopt(x) = yt, i = 0, 1, ... , n, [x°t ... , x"} £D, and
l/opt I < l / l
for all fe Xv/ithf(x) = # , i = 0, 1, ... , n.
For this to be a computationally viable method, computing /opt should not be prohibitively
expensive. This suggests (at the least) using Hilbert space semi-norms. Here is a well-known ex
ample of this type called natural spline interpolation.
Theorem 10. Let XQ<X^ < ••• <xn, n > m — l , m > 2 , then there is a unique function /opt in
H^fo.xJ = { / : / , . . . , / m l ) , absolutely continuous on [*b,xj, fm) e L2[xo,x„]} such that
ZoptW = ft* i = 0, 1, ... , n, and
fXVoptW)2^< fXn(fim\x))2dx
for a l l / e WTxb, xn] with/(;*:,) = y„ i' = 0, 1, ... , n. Moreover, /opt is determined by the equations
94 CHARLE S A . MICCHELL I
/opt(*o) = /opt(**) = °> i = m, ... , 2m - 1
/optW = J*. '' = 0. 1. ••• . *
V I (xt,x.+1) e W 2m-L ' = 0, 1, . . . , rt - 1
fopt e C m " 2 ( - oc, oo).
Proof. Integration by parts can be used to show that
f*"/<ptWmW=o
whenever g e W1 and g(x,) = 0, i = 0, 1, ... , «. This equation leads both to the existence,
uniqueness and minimality of fopt.
The next result extends Theorem 10 to higher dimensions, [ 12 ] , see also [ 32 ] .
Theorem 11. Let W?(1R0, m > s/1 have norm
l/l!2= S f f m ) I J>°/W l 2 ^ | a | - m ^ V a /
j
a = (alf ... , a,), | a | = X | a, | . Assume that if r e w^GR') /•(*') = 0, i = 0, 1, ... , n it fol
lows that r = 0. Then there is a unique/opt e W^OR') such that fopl(x') = y, / = 0, 1, ... , «, and
l/opt« < H/ll
for a l l / e W%(1R.') with/(jc') = y, i: = 0, 1, ... , n. Moreover, /opt is given by
/opt(*) = PW + X «i*(* - *')./> e Wjn-lOR') /=0
for some p € 7rm_i (MO where
-I 1 x 1 log I x ||, n even <*>(*)= i 2m-n
he , n odd ,
1 x 1 is the Euclidean norm of x and
£ afl{x) = 0, i -0
ALGEBRAIC ASPECTS OF INTERPOLATION 95
f o r a l l ^ ^ O R ' ) .
The proof of this result uses the fact that 4>(x) is the fundamental function for the 2m-th iter
ate of the Laplace operator
2m A <f>(x) = c8(x), c#0.
The special case s = m = 2, is called thin plate spline interpolation and it has been used for
practical data fitting. Grimson, [ 2 1 ] gives compelling reasons for the use of this interpolant in the
computational theory of interpolating visual surface information. In this case, the semi-norm is
f JdL + 2& + &»ty-
The computational problem of determining/optis studied in [ 14 ] . It is suggested there that
the appropriate linear system be preconditioned and a Richardson type iteration used, see also
[21].
Optimal interpolation gives some justification for choosing one interpolant among all possible
interpolating functions. However, to make use of such methods we are led to difficult optimiza
tion problems. Thus it would be worthwhile to have an alternative view of Theorem 11 which
suggests other methods of interpolation. To this end, let us recall that the thin plate spline
interpolant has the form
/opt(*) = ^oto + 2 °i A* - *' II log 1* - *' I i -0
for some linear function i0 which is determined by the equations
(8) 2 ati(x) = 0, t e W(R ), i-0
/oPt(*') =yv '" = 0 ' i. — 'w-
The existence and uniqueness of this function is established by using the Hilbert space struc
ture of extremal problem described in Theorem 11. When the points x , ... ,xn are not collinear
then the existence of a unique solution of the above equation also follows just from the fact that
96 CHARLE S A . MICCHELL I
n n
£ 2 aflj\x-J\\og \\x-xJ\\ > 0 »-0 y-0
whenever (8) holds and a = (OQ, ... , an) & 0. Of course, this property of conditional positive
definiteness, see [ 18 ] , satisfied by <f>(x) is directly linked to the extremal problem. Nevertheless,
as it has an independent formulation it provides us with a matrix theoretic means to study other
methods for multivariate interpolation, [ 35 ] . In particular, multiquadric surfaces (MQS) [ 25 ]
described in the next theorem can also be treated from this point of view.
Theorem 12. Given any distinct points x , ... ,xn e RJ and data yu ... ,yn there exists a unique
function of the form
such that
fix) = yit i = 1, ... ,/!.
Proof. We will actually show more, namely, that the matrix
y f = ( \ / l + U ' - ^ | 2 ) W . 1 n
has n-1 negative eigenvalues and one positive eigenvalue. We prove this by first observing that
A has at least one positive eigenvalue since
n n
\n = m a j ^ x , x) > Y, £ Aij > °
i - l y - 1
where Xx < • • • < Xn are the eigenvalues of A. We show that \n_t < 0 by noticing
n n n
(9) £ £ At/Vj<0, if £ «/ = 0 i - l y - 1 i - l
since it then follows that
X l l_ 1<nM« i {Aa,a)<0, e = ( l , ... , 1).
(fl,e)-0
ALGEBRAIC ASPECTS OF INTERPOLATION 97
To prove (9) we use the formula
00
(10) Sx = 1 + -4=" f t~V2{\ - e~*)dtt x>0. 2/ i T •>()
Substituting 1 + I*' - xJ | 2 into (10) gives us
£ £ a + wx-xjwv%= « - l y - l
1 f " -3/2 - / A A - n u ' - ^ i i 2
when X a, = 0. Since (e~',x ~ ' ),,;=!,..,„ is a strictly positive definite matrix for / > 0, the result
follows. For numerical experiments with the MQS method see [ 16 ]
The proof used for Theorem 12 leads to other results on scattered data interpolation which
includes the optimal interpolation method described in Theorem 11 see [ 35 ] .This method also
yields the result that the matrix (fix1— xJ ||a), ;f«it..., „, has one positive eigenvalue and n-1 nega
tive eigenvalues independent of * \ ... ,x" e JR.3 when 0 < a < 2. For s = l , the number of positive
and negative eigenvalues of this matrix are still independent of xlt ... ,xn (scalars in this case) for
all a > 0, [ 13 ] . However, even for a = 3 s = 2 , n = 4 , the number of positive and negative
eigenvalues does depend on the locations of JC1, ... , xn, [ 2 ] .
2.3. Radon Transform and Interpolation .
In this last section we depart from our discussion of methods of interpolation and treat proce
dures which use more than function values. In particular, we point out ways to pin down those
extra degrees of freedom in scattered data interpolation by polynomials described in Theorem 6.
We state the following result from Kergin [ 29 ] .
Theorem 13. Given any integers st n > 0 and points x , ... , xn e IR5, not necessarily distinct. There
is a unique map •#f:C'l(IR,)-*Tn(IR') satisfying :
(i) Jf is linear.
98 CHARLES A. MICCHELLI
(ii) for every / e COR')* every (homogeneous) polynomial q e hk(JR.s), 0 < k < n and every
/ c { 0 , 1, ... , n] with | / | = k + 1 there exists x e [{xJ:j e J]] ( = convex hull of
{xJ:j e J}) such that
q{D){je{f)-f){x) = 0.
This theorem extends to IR'the following mean value property of Hermite interpolation: for
every j, H{j)(f)(x) agrees with f})(x) at some x e [XQ, JCJ, XQ < ••• <xn,
Notice that by choosing k=0 in Theorem 12 we get
#V)(x)=flx\ i = 0, 1 #t
and if x° is repeated i times then all derivatives of order t — 1 of f are matched at XQ by
&f(f). However, only in exceptional cases, s = l or x = ••• = xn, do these conditions determine
2fC{f). The remaining functionals which determine this polynomial are obtained as follows. We
define
then [{xJ:j e J]]q(D)f, q e h{Jl _t(IR0 are all invariant under Ctf{f). The number of linearly inde
pendent such functionals was shown by Kergin to be dim wm(lR')f [ 29 ] . This was the way he
proved Theorem 13. In particular, when s=2 and *0, ... , xn are in general position, that is , any
three points are noncollinear, the linear functionals which determine 3€(f) are point evaluations,
f(xJ),j = 0 , 1, ... , n and line integrals between pairs of points, itj = [x, x*], i ^ j , of the derivative
of f in the direction normal to L,
J i . dntj'
Thus 36 picks out in this fashion a unique polynomial in 7r„(]R/) which interpolates f at
x , ... ,xn. Why is this a natural interpolation method and what is the algebraic foundation behind
its existence? The answer is that 3C{f \ x , ... ,xn) and H{f \ t0, ... , O.are closely connected.
One can actually define Jif by the property that
(11) &{fx I x°,...,xn)W = H{g | \.x°,...,\.xn)(\.x)
ALGEBRAIC ASPECTS OF INTERPOLATION 99
whenever X e R ' , / X t o = gft •* )>££ CO*1)- T h i s equation brings to mind the Radon transform,
[ 24 ] . Recall that the Radon transform (RJ)(0, IX | « 1, / > 0 of / e l?(&) is given by
(*x/)M = f /W<K.
Jmx is Lebesgue measure on X • x = /. A distributional definition takes the form
f , g{W)J){t)dt = f g(X . x)f(x)dxtg e CoiR1). JR JJRS
Thus, specializing equation ( l l ) t o j = « + l , x = 0, (*'), = 8^, i,y = 0, 1, ... , /i we get
^ ( / x I e°, ... , / ) ( 0 ) = H(g | X0, ... , X„)(0).
This means that the Radon transform of the distribution g-+H(g | X0, ... , X„)(0) is the distrib
ution f-+£f?{f | e , ... , en)(0). The connection of Radon transform to multivariate interpolation
is developed further in [ 4-6 ] . We use the terminology of these papers and say £tf lifts H because
equation (11) holds. This equation suggests the following constructive approach to Theorem 13,
[36]. Observe that since
(k) v q{D)fx{x) = q{\)gV \ \ • *), q e hk(R )
we can verify that
(12) *if\x . . . . . *") (*) = ] £ [x ,...,x]Dx_xo...Dxx,-if. i=0
{Dyf = y • If) because the right hand side becomes the Newton form of
H(g | x • x°, ... , X . xn)(\ . JC) when f{x) = g(X . x).
As is well-known, certain compatibility conditions must be satisfied for a function to be in the
range of the Radon transform, [ 24 ] . In the present case, this is a property previously mentioned
about Hermite interpolation. Namely, whenever p is a polynomial, H{p | X0, ... ,XJ(0) is a
polynomial in X0, ... , X„. This is the algebraic explanation behind Theorem 13. As such, it is a
guide to other maps which can be lifted. For instance, it clearly holds for the family of mappings
Hfe I /0. ... , /„)(/) = H{n(g{-° | /0, ... , *„)(/), 0 < / < n.
Notice that when / = 1, /0 < • • • < / „ , # , is the unique polynomial of degree < n - 1 such that
100 CHARLE S A . MICCHELL I
Hl(g\t0,..., tn)(t)dt=J g{t)dt, i = 0 , l , . . . , n - l ,
that is, H is "area matching". It was shown in [ 6 ] , that Ht can be lifted and for any i in
[8,19,26 ] . The form of the lifted map depends on the dimension to which Ht is lifted . A partic
ularly nice case occurs when s = £ + 1, which was introduced in [ 23 ] . Hakopian showed that
&! on IR/+1 is uniquely determined by matching the integrals
[{xJ:j el}]f, V | / | = / .
In particular, for s=2 and £ = 1 we see that «#V*e ^„_i(IR2) matches all line integrals of f
formed by pairs of distinct points chosen from x , ... , xn. This polynomial is in a sense dual to the
interpolant of Theorem 9. We also mention that the Lagrange representation for 3tt on IR'
when £ + 1 > s is identified in [5 ] . The Lagrange polynomials in this case are ridge functions.
It is interesting to note that certain multivariate splines come from lifting distributions. For
instance, the multivariate B- spline , M(x \ x°, ... , xn) is a distribution defined by
f M{x \x°, ... ,xn)f(x)dx= f / ( V opcSda^ ... ,dan. JtiC J Sn 0
As the name suggests, when A/( • | x , ... ,xn) is a function it can be shown to be a piecewise
polynomial which is the natural extension of the univariate B-spline , [ 9 ] , see also equation (6).
In general, for any polyhedral set CQlR." and JC1,.... , / e IR* the corresponding polyhedral spline
^c( • I * , . . . , * " ) is defined by
ff(x)Pc(x | x\ ... , A i x = J f(Z «V*W . - ,don. c i—1
When C = R" we obtain 1\x \ x1, ... fxn), the truncated power and C = [0, 1]" gives the box
spline. Detailed properties of these spline functions are given in [ 9 ] , see also the lecture of K.
Hollig.
BIBLIOGRAPHY
1. N. I. Achieser, Theory of Approximation , Frederick Ungar Publishing Co., New York, 1956.
ALGEBRAIC ASPECTS OF INTERPOLATION
2. L. P.Bos and K. Salkauskas , On the matrix [ I *,- - Xj | 3 ] and the cubic spline continuity equations , to appear JAT.
3. C. de Boor, A Practical Guide to Splines , Springer Verlag, Berlin- Heidelberg, 1978.
4. A. S. Cavaretta, C. A. Micchelli, and A. Sharma, Multivariate interpolation and the Radon transform, Math. Z., 174 (1980), 263-279.
5. A. S. Cavaretta, Jr., T. N. T. Goodman, C. A. Micchelli and A. Sharma, Multivariate interpolation and the Radon transform, Part III: Lagrange representation, in Canadian Mathematical Society Conference Proceeding, 3 (1983) 37-50.
6. A. S. Cavaretta, C. A. Micchelli, and A. Sharma, Multivariate interpolation and the Radon transform, Part II: Some further examples in Quantative Approximation, eds. R. De Vore, K. Scherer, Academic Press, New York 1980, 49-62.
7. Cooley, J.W. and J. W. Tukey, An algorithm for the machine calculation of complex Fourier series, Math. Comp. 19 (1965), 297-310.
8. W. Dahmen and C. A. Micchelli, On the linear independence of multivariate B-splines II: complete configuration, Math. Comp., 41 (1983) 143-163.
9. W. Dahmen and C. A. Micchelli, Recent progress in multivariate splines, in Approximation Theory IV, eds. C. K. Chui, L. L. Schumaker, J. W. Ward, Academic Press, New York, 1983, 27-121.
10. W. Dahmen and C. A. Micchelli, On the limits of multivariate B-splines, J. d'Analyse Math., 39(1981), 156-178.
11. D.Donoho and I. Johnstone, Projection- based smoothing and a duality with kernel methods, Department of Statistics, Report # 238. Stanford University, 1985.
12. Duchon, J., Splines minimizing rotation - invariant semi-norms in Sobolev spaces, Constructive Theory of Functions of Several Variables,, Lecture Notes in Mathematics 571 eds. W. Schempp and K. Zeller, Springer, Berlin-Heidelberg, 1977, 85-100.
13. N. Dyn, T. N. T. Goodman and C. A. Micchelli, Positive powers of certain conditionally negative definite matrices, IBM Research Report, #11202, 1985.
14. N. Dyn, D. Levin, and S. Rippa, Surface interpolation and smoothing by "Thin Plate Splines", Approximation Theory IV, eds. C. K. Chui, L. L. Schumaker, J. D. Ward, Academic Press, New York, 445-449
15. S. Fisher, C. A. Micchelli, Optimal sampling of holomorphic functions, Amer. J. Math, 106 (1984), 593-609.
16. Franke, R. Scattered data interpolation: tests of some methods, Math. Comp., 38(1982), 181-200.
17. Gautschi, W. Attenuation factors in practical Fourier analysis, Numer. Math. 18(1972), 373-400.
18. I. M. Gel'fand and M. I. Graev, N. Y. Vilenkin, Generalized Functions, Vol. 4 , Academic Press, New York, 1965.
19. T. N. T. Goodman, Interpolation in minimum semi-norm and multivariate B-splines, JAT, 37 (1983), 212-223.
20. Phillip Griffiths and Joseph Harris, Principles of Algebraic Geometry, John Wiley and Sons, New York, 1978.
21. W.E.L. Grimson, From Images to Surfaces, a Computational Study of the Human Early Visual System M.I.T. Press, Boston, 1981.
22. M. H. Gutknecht, Two applications of periodic splines, in Approximation Theory III, ed. E. W. Cheney, Academic Press, New York, 1980, 467-472.
23. H. Hakopian, Multivariate divided differences and multivariate interpolation of Lagrange and Hermite type, JAT,34 (1982), 286-305.
102 CHARLES A„ MICCHELLI
24. Helgason, S.,The Radon Transform, Birkhauser, Basel 1980.
25. R. L. Hardy, Multiquadratic equations of topography and other irregular surfaces, J. Geophys. Res. C. (1971).
26. K. Hollig and C. A. Micchelli, Divided differences, hyperbolic equations and lifting distributions, IBM Research Report,#l 1133, 1985.
27. S. Karlin and W. J. Studden, Tchebycheff Systems: with Applications in Analysis and Statistics,
Interscience, New York, 1966.
28. S. Karlin, Total Positivity, Stanford University Press, Stanford, 1968.
29. P. Kergin, A natural interpolation of C* - functions, JAT,19 (1980), 278-293. 30. M. G. Krein, The ideas of P. L Chebyshev and A. A. Markoff in theory of limiting values of
integrals and their further developoments, AMS Transl. Ser. 2, 12 (1951) , 1-122.
31. Mairhuber, J., On Haar's theorem concerning Chebyshev approximation problems having a unique solution, PAMS 7 (1956), 609-615.
32. Meinguet, J., An intrinsic approach to multivariate spline intepolation at arbitrary points, Polynomial and Spline Approximation, ed. B. N. Sahney, D. Reidel, Dordrecht, 1979, 163-190.
33. C.A.Micchelli and TJ.Rivlin, Lectures in Optimal Recovery, Lectures Notes in Mathematics 1129, Springer- Verlag, Berlin - Heidelberg, 1985.
34. Micchelli, C. A., T. J. Rivlin and S. Winograd, The optimal recovery of smooth functions, Numer. Math., 26(1976), 191-200.
35. C. A. Micchelli, Interpolation of scattered data: distance matrices and conditionally positive definite functions, IBM Research Report, 1984, to appear in Constructive Approximation.
36. C. A. Micchelli, A constructive approach to Kergin interpolation, Rocky Mountain Journal of Mathematics, 10 (1980), 485-497.
37. Saskin, Ju. A., Interpolation families of functions and imbeddings of sets in Euclidean and projective spaces (Russian) Dokl. Akad. Math SSSSR 174 (1967), 1030-1032, Soviet Math. Dokl. 8 (1967) 722-725.
38. I. J. Schoenberg and A. Whitney, One Polya frequency functions, HI: The positivity of translation determinants with application to the interpolation problem by spline curves, TAMS, 74 (1953), 146-259.
39. J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, Springer-Verlag, Berlin-Heildeberg, 1980.
Proceedings of Symposia in Applied Mathematics Volume 36, 1986
MULTIVARIATE SPLINES
Klaus Hollig1
In this lecture the construction of multivariate splines on triangular meshes via multi
variate B-splines is described. B-splines in several variables can be defined geometrically, as
volume densities of convex polyhedra. From this general definition smoothness properties and
recurrence relations are derived. The B-splines corresponding to simplices and parallelepipeds
give rise to natural generalizations of univariate splines. For both cases it is shown how linear
combinations of B-splines have to be selected to yield a smooth spline space which admits
a local representation of polynomials. This yields the standard approximation properties for
piecewise polynomials familiar from univariate theory.
For simplex splines, the underlying mesh can be chosen almost arbitrarily while maximal
smoothness is preserved. While this is a definite advantage over tensor products, new ideas
are still needed to overcome computational difficulties resulting from the fairly complicated
structure of the mesh. Box splines are defined on regular (triangular) meshes. Therefore, many
of the advantages of tensor products and Bezier representations are maintained. In particular,
efficient algorithms based on subdivision techniques have been developed and this has led to
application of box spline methods in computer aided design.
1980 Mathematics Subject Classification 41A15 1 Supported by International Business Machines Corporation and National Science Foundation
Grant No. DMS-8351187
Sponsored by the United States Army under contract DAAG29-8Q-C-0041
© 1986 American Mathematical Society 0160-7634/86 $1.00 + $.25 per page
103
http://dx.doi.org/10.1090/psapm/036/864368
104 KLAUS HOLLIG
Mul t iva r i a t e B-Splines
There are several equivalent ways of defining the univariate B-spline B(-\t0, • • • ,tn)- Per
haps the least common approach would be to use a variant of the Hermite-Genocchi formula
[ B(x\t0,...,tn)<j>(x)dx = n\ f ^ ( ] ^A( i / ) t „ ) d\{l)...d\{n) (l)
or the geometric interpretation of the B-spline due to Curry and Schoenberg,
B{z\t0, • . •, tn) = voln_x(T n ({x} x JR^^J /vo ln fT) . (2)
Here, a(n) := {(A(l ) , . . . , A(n)) : \(v) > 0, J22=o^(u) = *} 1S t n e ^-simplex with vertices
e0 = (0 , . . . ,0), ei — ( 1 ,0 , . . . ,0), . . . , en = (0 , . . . ,0,1) and T is an n-simplex for which the
first component of each vertex coincides with one of the knots tv. Both of the above identities
admit a natural generalization to several variables.
Definition 1 [BH82]. For n > m denote by P : IRn —* IRm the canonical projection and
let Q C IRn be a convex polyhedron with affine dimension m -f k. The multivariate B-spline
B is the linear functional defined by
< £,<£> := [ (j>oP, 0 e C o ( I R m ) , (3) JQ
where the integral is taken with respect to ( m + A;)-dimensional measure. If volm(PQ) > 0, B
can be identified with the bounded function
B(x):=yo\k(Qnp-1x), (4)
i.e. B(x) is the fc-dimensional volume of the cross-section of Q which is projected onto x (cf.
Figure 1).
The equivalence of (3) and (4) follows from Fubini's Theorem since, if volm(PQ) > 0, the
right-hand side of (3) can be written as
/ dx(f <t>(x)dy) = I vo\k{Qop-1x)dx. JPQ JQnP-^x JPQ
Strictly speaking, the pointwise definition (3) is valid only for almost every x (in the sense
of Lebesgue measure). In the univariate case this difficulty is less apparent since a consistent
definition of B at discontinuities is possible, e.g. all B-splines are assumed to be continuous
from the right. In several variables there does not seem to exist a simple convention which
is compatible with the recurrence relations of Theorem 1 below. However, if B is continuous,
which is the case of practical interest, the problem does not arise.
MULTIVARIATE SPLINES 105
Rk
B(x)
< Figure 1 >
The geometric definition (4) is essentially due to de Boor [B76] who considered the special
case when Q is a simplex. The usefulness of the analytical definition (3) for analyzing simplex
splines was discovered by Micchelli [M80] which finally led to the general definition given in
[BH82].
It is obvious that B is nonnegative (as a functional on C0(IRm)) with support equal to
PQ. If volm(P<?) > 0, it follows from Theorem 1 below that B is a polynomial of degree < A:
on any subset of IRm which is not intersected by the projection of any (m - 1) dimensional
face of Q. Theorem 1 also implies that B is (r - 1) times continuously differentiate where r
is the smallest integer for which an (m -f k - 1 - r)-dimensional face of T is projected by P
into an (m - 1) dimensional set.
Denote by D^ the derivative in the direction f, i.e. (Dtf) := £ „ £{v)du<t> where dv is
the derivative with respect to the z/-th variable. Moreover, denote by Qi the (m + k - 1)-
dimensional faces making up the boundary of Q, by r)x the corresponding outward normals
and by JB; the B-splines corresponding to the polyhedra Qx (cf. Figure 2).
106 KLAU S HOLLI G
Theorem 1 [BH82]. Assume that voln(Q) > 0, i.e. that k = n — m.
(i) For any z e IRn,
Dp,B = -Y,(z'Vi)Bi. i
(ii) For all points x — Pz where B and B{ are continuous,
kB{x) = ^2{{bi-Z)-r,i)Bi(x) i
where bi is any point in the hyperplane containing Q%.
The assumption that the polyhedron Q is nondegenerate is not essential. If k < n — m,
the affine hull of Q can be identified with IRm + and the Theorem applies.
< Figure 2 >
A repeated application of Theorem 1 yields that, for £ £ IRm, (D^)rB is a linear combi
nation of B-splines corresponding to (m + k — r)-dimensional faces Q\. For r > k the supports
PQ\ of of these B-splines (interpreted as linear functionals in the sense of definition (3)) are
contained in hyperplanes. Therefore, B is a polynomial of degree < k on any region which is
not intersected by any of the sets PQ% + 1 (since all (k + l)-th order derivatives of B vanish on
such a region).
If volm(P<5[) > 0, the B-spline corresponding to Q\ can be identified with a bounded
function. Therefore, if volm(P<2[) > 0 for all i, the derivatives of order r of B are bounded
which implies that B is (r - l) times continuously differentiate.
Proof of Theorem 1. The proof of (i) is immediate:
< DPzB,<t>>= - < B,DPz<t> >=- [ (DPz<j>)(Py)dy
= -J(Dz(<f>oP))(y)dy=-Y,J (z-m)HPy)dy
= - J2(z'^) < Bz,<p> .
MULTIVARIAT E SPLINE S 10 7
This uses the fact that, by definition, the derivative DtB is the linear functional given by
0 ,_» _ < B,Dt<f> > and that, by the chain rule,
Dz{4>oP) = (DPz<t>)oP.
Define (D<t>)(x) := (Dx(j>)(x). The recurrence relation (ii) is a consequence of the identity
DB = kB-^2{bi'fii)Bi. (5) i
With x = Pz it follows from (i) and (5) that
0=({D-DPz)B){x)
= kB[x) - J2ik • rn)Bi(x) + ] T ( z • m)Bi(x) i i
if B and B{ are continuous at x.
It remains to prove (5). By definition of D and the chain rule,
(D4>)(Py) = (DPy4>)(Py) = (Dy(<f>o P))(y) = (D(</>oP))(y). (6)
Denote by \u the i^-th coordinate function, i.e. Xv(x) = x(y). Then, integrating by parts and
using definition (3),
m
- <DB,4>>=- < J2x»dvB,4, >=< B,J2d4Xv<t>) >
= E [ (dAXu<f>)) o P = m [ ^ o P + r [ (Xvdu<P)°P v JQ JQ JQ
= m < £,<£> + / ( ^ ) o P , JQ
and similarly,
£ f du(xu{<t>oP)) = n<B,<p>+ f D(<f>oP). „=iJQ JQ
By (6), the last integral in the first identity equals the last integral in the second. Therefore,
n „
< DB,d>>= (n-m) < B,<f>> - JZ / du{Xu{<P° P)).
This proves (5) since, with r]{y) denoting the boundary normal of Q at t/,
E / #AxA<t>tP))= f (v(y) • y)4>(Py)dy v=lJQ JdQ
and, for y E Q%, v{y) = W a n d Vi • V is constant.
108 KLAU S HOLLI G
Multivariate splines are, by definition, linear combinations of B-splines. However, it is
not obvious how the B-splines should be selected to yield good approximation properties of
the resulting spline space. De Boor [B76] suggested the following geometric construction.
Definition 2. Let Q* C lRfc be a convex polyhedron and assume that the collection of
convex polyhedra {Q : Q e A} forms a partition of IRm x Q, and that \o\rn(PQ) > 0 for all
Q £ A. The spline functions corresponding to the partition A are defined by
5(A) := { £ aQBQ : aQ G IR} QeA
where BQ denotes the B-spline corresponding to the polyhedron Q.
(7)
< Figure 3 >
It is clear from (4) that the B-splines B Q , Q G A, form a partition of unity, i.e. that
£ > Q ( * ) = volfc(Q+) (8) Q
for all x where the B-splines are continuous. This implies that the spline spaces S(A) are
dense in continuous functions as the partition A is refined.
Proposition 1. Set h := max{diameter(<2) : Q G A} and choose XQ £ PQ. Then, for
any continous function / ,
Q
where || Ijoo denotes the L^ norm on IRm and uo is the modulus of continuity of / .
MULTIVARIATE SPLINES 109
Proof. By (8) and since the B-splines are nonnegative, we have for almost every x
\-f{x) ~ J2 / ( * Q ) * * ( * ) I = I E l / W - f(*Q))(BQ(x)/volk{Q.))\
BQ(x)^0
and for all Q for which BQ{X) is nonzero \x - XQ\ < h.
In this generality, little more can be said about the approximation properties of the
spline spaces 5(A) . However, a particularly rich theory results if Q is either a simplex or a
parallelepiped. This is due to the fact that in both cases the faces which make up the boundary
of Q are of the same type as Q itself.
Simplex Splines
Historically, the case when Q is a simplex has been considered first. Simplex splines were
defined by de Boor in [B76] generalizing the geometric interpretation of univariate B-splines
due to Curry and Schoenberg. Micchelli [M80] discovered the recurrence relations. Then,
the author [H82] and independently Dahmen and Micchelli [DM82] described an appropriate
choice for the space 5(A) which yields the approximation properties familiar from the uni
variate theory. Subsequently many interesting results have been obtained and the reader is
referred to the survey article [DM84i].
Let U be a collection of not necessarily distinct vectors {u : u £ U} and denote by #£/
the number of vectors in U counting multiplicities and by \U] the convex hull of the vectors in
U.
Definition I S . Let [U] be a simplex in IRn (#U = n + 1) and denote by V := {v = Pu :
u E U} the projections of the vertices of \U\. The normalized simplex spline My is defined by
Mv := B{u]/yo\[U}. (9)
To justify this definition, one has to show that the right hand side of (9) does only depend
on the projections of the vertices v £ V. This follows from definition (3) by a change of
variables,
volft/j"1 / 4>(Py)dy= [ <t>{Y X(u)u)d\, (10)
where o(n) := {(A 0 , . . . , A„) : £ \{v) = 1, A(i>) > 0}.
110 KLAUS HOLLIG
T h e o r e m I S [M80]. Let V be a collection of n 4- 1 points in fltm which span a proper
convex set.
D^My = n Y^ \{v)MV\v
V
where V\v is obtained from V by decreasing the multiplicity of v by one (e.g. by deleting v if
this vector occurs only once in V).
(ii) If x — Ylvev ^(v)v with Ylvev ^(v) ~ 1 anc^ ^-y\v> v £ ^ ? a r e continuous at re, then
^W — L^n.W-
To derive this Theorem from Theorem 1, let [U] be a simplex in IRm with {v :— Pu : u E
U} = V. Set B := vol[C/] My and B^ := vol[l/\ti] My\v and denote the normal of the face
[U\u] by 7?u. Then, for any bu E [C7\t*],
(fc - «M -r, - i n vo\[U}/vo\[U\u}, if u = u'; l*« « j * u - j 0 > otherwise. (11)
< Figure 4 >
To prove (i), fix u' 6 1/ and set
* : = £ A ( I ; ) U = £ A ( « ) ( * - U ' )
using that the sum of the weights A(t>) is zero. By (11), for u ^ u',
-z-r}u = -\(v)(u -u')'fiu = n Hv) J ^ L y
MULTIVARIATE SPLINES 111
and similarly,
-z • r}u> = ~n ] T A(v)vol[l/]/voI[l/\ti'] = n \(v') vo\[U]
vol[l7\«']'
and (i) follows from the normalization of the simplex splines.
To prove (ii) we define z as before and note that
bu> - z — 2^, Hv)(bul - u)-
ueu
Again, by (11),
[bu, A(w')(6u' - u') • r)u> — n \(u')~ *1[U]
fwo\(U\u'Y
In view of the remarks following Theorem 1, the simplex spline is a piecewise polynomial
of degree < k = n — m which is (r — l) times continously differentiable where r is the smallest
integer for which (m + k — r) points from the "knot set" V lie in a hyperplane. Thus, if the
knots are in "general" position, My is (k - 1) times continuously differentiable.
Figure 5 below gives a few examples of knot sets and corresponding meshes for simplex
splines in two variables. While in some cases the structure of the mesh (i.e. the hyperplanes
where derivatives of My are discontinuous) is fairly complicated, this is no disadvantage in
itself since the explicit form of My on each of the subregions is not needed in computations.
C° - quadratic
C 1 - quadratic
C1 - cubic
< Figure 5 >
112 KLAU S HOLLI G
Example 1. Let \W] be a proper simplex in IRm with vertices {if : w G W} and denote
by {Qw(x)}wew the barycentric coordinates of x with respect to W, i.e.
X —
1 =
„(z)w
] T £«,(*)• uew
If the knot set V consists of the vertices of [W] with multiplicities a (if), if G W, then, up to a
normalizing factor, the simplex splines coincide on [W] with the polynomials in the Bernstein
form, i.e.
Mv(x) = —iH Qw(xyW/a(w)L (12) m!
wew
This is most easily seen by checking that the right side of (12) satisfies the recurrence relation
(ii) of Theorem IS.
In principle, simplex splines can be defined by (7) with Q* :— a(k) and BQ := vol [l/]My.
However, without further restrictions, the simplex splines My, \U] G A, need not be linearly
independent. Nevertheless, their linear span does contain all polynomials of degree < fc, and
this is the minimal requirement for good local approximation properties.
Theorem 2S [DM82, H82]. For f G ffi,m define the mapping
(x,y)i->Gz{x,y):={x,(l + t'x)y) : IRm x a(k) - • IRm x JRk.
If all simplex splines My are continous at rr, then
(i + e-^)fc= E cv(e)My(x) (is) \u)eA
where
c v ( 0 := (*!/n!) «tpn(t/) d e t | | G ^ | |
with det| |G^l/| | denoting the determinant of the ( n + l ) x ( n + l) matrix with columns
1
and sign{U) G { — 1,4-1} chosen so that cy(0) is positive.
Identity (13) is the multivariate analogue of Marsden's identity for univariate splines. As in
the univariate case, this identity is the basis for the construction of dual linear functionals and
local approximation schemes [DM82, H82]. In two variables the identity is due to Goodman
and Lee [GL81] who also obtained a more explicit formula for the simplex spline coefficients.
MULTIVARIATE SPLINES 113
Proof. For fixed x both sides of (13) are polynomials in £ and we may therefore assume
that ||f || is small. Small perturbations of the vertices do not change the combinatorial structure
of a triangulation. Moreover, Ge maps the hyperplanes which form the boundary of IRm x a
onto hyperplanes. Therefore, for fixed x and small £, the simplices
\GCU] := \{Gcu:ue U}}
form a partition of 0 := G^TR™ x <r(k)) in a neighborhood of x. This implies that (cf. Figure
6)
(x,(l + t-x)o(k)) = (x,m.k)n [) {G(U}. x€P\U\
Computing the volume on both sides of this identity it follows from (4) and (9) that
i ( l + e-*)*= £ voUflG^nP-1*) xeP[u]
£vol„[G{l/]Mv(x) which yields the Theorem.
x
< Figure 6 >
A drawback of definition (7) is that the spline space SM(A) is defined via a triangulation
in n dimensions while the simplex splines depend only on the knots in IRm. In [H82] a method
for constructing spline spaces from a triangulation of IRm was described. This construction is
a generalization of the process of "pulling apart" knots, i.e. of obtaining smooth splines as a
perturbation of piecewise polynomials without smoothness constraints.
Denote by
[Wi] := [w»(o),...,«>*(m)]> *€l,
the simplices of a triangulation A m of IRm with vertices W := {..., w_i , wo, i u i , . . . } . More
over, assume that the vertices are consistently ordered, i.e. if
t'(iz) = t ' ( i / ) , i(/z) = i'{fir), with v < /x and i,i' G / ,
114 KLAUS HOLLIG
then
v' < / / .
with
Denote by T all "index" sets of the form
7=((a(0) , /9(0)) , . . . , (a (n) , /3(n)) )
ot(v) G {*(0),..., i{m)} for some i € I
0(i/) e {o , . . . , * } (14)
and
<*{v) < a ( i / + l ) , 0(i/) < 0 ( i / + l)
where one of the inequalities is strict.
As is indicated in Figure 7 below, the index sets 7 corresponding to a simplex [Wi] can
be identified with the ordered sequences of length n + l = m- f f c - f l from the set
{w»(o),...,ti;»(m)} x {0 , . . . , / c} .
i(0)
_ - 3 _ _ 1 <*>
id)
o
i(m)
< Figure 7 >
Definition 2S [H82]. Let F be a mapping from {..., - 1 , 0 , 1 , . . . } x {0 , . . . ,ife} to IRm
and denote by ^(7) the collection of vectors {F(e*(0),/3(0)),..., i r(a(n),/9(n))}. Assume that
the union of the sets [-^(7)] covers IRm, that the range of F has no limit point and that each
x G IRm in contained in at most finitely many of the sets [^(7)]- Then, the spline space
S(F,T) is defined as the linear span of the simplex splines MF^y 7 G T.
MULTIVARIATE SPLINES 115
Note, that the mapping F can be chosen almost arbitrarily, i.e. in analogy with the
univariate situation there is almost no restriction on the placement of the "knots" ^ (7 ) . How
ever, there is no canonical choice for F which yields maximal smoothness or a well conditioned
simplex spline basis. This must still be viewed as one of the major drawbacks of these spline
spaces. On the other hand, for "almost all" choices of F , the space S(F, T) consists of piecewise
polynomials of smoothness k — 1 and degree k which is in general fairly difficult to achieve
with other constructions.
Example 2. Let W be a partition of IR, i.e. the "simplices" [Wi] are the intervals
[wijttft+i]. Define F by
F(a,P) := <a(*+i)_0
where {... , t_ i ,$o^i? • • •} is a n increasing sequence of knots. Since m — 1, the index sets 7
are of the simple form
7 - ( ( z , 0 ) , ( z , l ) , . . . , ( e , i ) , ( e + l , i ) , . . . , ( e - f 1,*))
where 0 < j < k and i is any integer. Thus F(i) consists of the k -f 2 consecutive knots
*i(A + l ) - j > * t ( * - H ) - i + l ? • • • > * ( i + l ) ( M - l ) - . 7
and therefore S(F,T) is the standard space of univariate splines. However, Definition 2S is
more general since the sequence of knots does not have to be monotone increasing.
Example 3. For the particular choice
F*{a,0):=wQ, T=(a,P)eT, (15)
S(F*, r ) consists of all piecewise polynomials of degree < k with respect to the triangulation
A m . This can be seen as follows. For F± defined by (15), the simplex splines which correspond
to different index sets i via (14) have disjoint support. Therefore, restricted to a simplex
[wi(o)'> - • • 5 wi(m)\ °f Am? the spline space 5(F,T) reduces to the linear span of Mp^\ where
F(i) = (wa(0), . . . ,Wc(n)) with a(u) e {t(0),. . . , t (m)} ,
i.e. the linear span of simplex splines with multiple knots. From Figure 7 it is clear that
all combinations of multiplicities occur and by Example 1 the corresponding simplex splines
coincide with the polynomials in the Bernstein form.
A small perturbation of the mapping F+ can be interpreted as "pulling apart" multiple
knots, i.e. as deforming the space of (nonsmooth) piecewise polynomials into a space of smooth
splines. However, Definition 2S allows arbitrary perturbations as long as the combinatorial
relationship between the knot sets is preserved.
116 KLAU S HOLLI G
Theorem 3 [H82]. With
U := {(F(a(0)J(0)),em),..., (F(a(n),/3(n)),C / 3 ( n ))}
and \U] e A replaced by 7 6 I \ Theorem 2S remains valid for the spline space S(F,T).
The proof of this result is based on the fact that the Fourier transform of the right hand
side of identity (13) is an entire function of the knots. Therefore, if the identity holds for small
perturbations of the knots, it remains valid globally.
Under additional assumptions on F , the linear independence of the simplex splines Mp^)-,
7 £ r , can be established. Moreover, the standard error estimates are valid for simplex splines.
The practical implementation of algorithms for computing with simplex splines still seems to
be the major unsolved problem. However, one might think that, as for box splines, new
algorithms based on subdivision techniques can be developed.
Box Splines
The other natural choice for Q in Definitions (3,4) is a parallelepiped, and this leads to
the definition of box splines. These splines have been introduced by de Boor and DeVore
in [BD83] and their basic properties were studied in [BH82/3]. Box splines can be viewed
as generalizations of univariate cardinal splines. A variety of results on interpolation opera
tors [BHR85], combinatorial problems [DM85] and smooth piecewise polynomials on regular
meshes [BH83i,2] have been obtained. Moreover, efficient algorithms for manipulating box
spline surfaces have been developed [B683, CLR84, DM842, P83/84] which is the basis for
applying box spline techniques to computer aided design.
Definition IB [BD82, BH82/3]. Denote by \U\ the parallelepiped in IRn which is
spanned by the vectors {u : u G U}, i.e.
I l / ] : = { ^ A ( « ) t t : 0 < A ( u ) < l } u€U
and jfU = n. The corresponding normalized box spline is defined as
JVV := Bmlvo\\V\ (16)
where V := {t; := Pu : u € £/}.
As for the simplex spline, the right hand side of (16) does only depend on the projections
of the vectors in U and
< Nv,4> >= volp}-1 [ 4>{Py)dy= [ 4>[T, Hv)v)d\. (17)
MULTIVARIAT E SPLINE S 117
By the remarks following Theorem 1, Ny is a piecewise polynomial of degree k — n — ra
which is (r — l) times continuously differentiable where r is the smallest integer for which
(ra + k — r — l) of the vectors in V do not span IRm. In contrast to the simplex spline the
mesh for Ny is quite regular. It consists of translates of hyperplanes which are spanned by
(ra — 1) linearly independent vectors in V. Figure 8 below shows a few examples of meshes for
bivariate box splines.
C°-linear
C l -quadratic
C2-quart ic
< Figure 8 >
Example 4. (i) If ra = 1 and v = 1 for all v £ V, Ny is the forward cardinal B-spline
B(-\0,..., k -f 1). To see this let [£/] be a parallelepiped with v — Pu — 1 for all u € U and
consider the standard triangulation of [17] into n! simplices [Uu] with equal volume. For all
simplices \UV] the projections of the vertices are the integers 0 , 1 , . . . , k + 1. Therefore,
BW\ = E BW') = ( £ ^ D M{o,i *+D = vol|r/] B(-|0, . . . , * + 1). V V
(ii) If V consists of the unit vectors c i , . . . , cm with multiplicities a ( l ) , . . . , a(ra) respectively,
then iVV coincides with the tensor product B-spline with equally spaced knots,
m
( i( l) , . . .,x[m)) -» J ] B«v) °> • • •.«("))•
This could be verified directly from (17), but is more easily seen from formula (19) below for
the Fourier transform of Ny.
118 KLAU S HOLLI G
(iii) For m = 2 and V = {(1,0), (1,1), (0, l )} , Ny is the standard linear finite element. Adding
the vector (1 , -1) to V, one obtains the quadratic element which has been independently
derived by Zwart [Z73], Powell and Sabin [PS77]. Further examples can be found in the work
of Frederickson [F71].
Theo rem I B [BH82/3], Let V be a collection of n vectors which span IRm.
0) ^ e = Zvev Hv)v, then
DtNy = ] T \(v){Nv\v - NV\V(. - v)). v
(ii) If x — Ylvev ^{v)v anc^ the DOX splines AV\V, v G V, are continuous at z, then
^ (*)= ; r ^ £(M»)*v\t,(*) + (i - H"))*v\v(x - v)). n m v
The recurrence relation (i) has a particularly simple form if £ = v for some t; G V. Then,
DVNV = VVNV\V,
where (Vvf)(x) := f(x)-f(x — v) is the backward difference operator. With Dw '-— \[wew
D,„
and Vw := HwEW V w , this yields
DWNV = VWNV\W.
In particular,
DVNV = Vv6, where S denotes point evaluation at 0, i.e. < S^<j> >:= 0(0). Therefore,
/ NvDv<t>=(Av<f>){0), (18) Jmr"
which gives an integral representation for the forward difference operator Ay in terms of the
box spline Ny.
MULTIVARIATE SPLINES 119
The derivation of the recurrence relations is almost identical with the proof of Theorem
IS. Let \U] be a parallelepiped in IRn for which V = {v := Pu : u G U} and apply Theorem 1
with Q :— [£/] and B := vol [l/J Ny. The boundary faces of \U] consist of the parallelepipeds
|£/\tfc] and their translates tt + [£/\u] with normals r\u and -r\u respectively. The corresponding
box splines are B^j\u^ = vol[£/\u] Nv\v and Bu^V\uj = vol[£/\u] Nv\v(- - v) (cf. Figure
9).
u + IU\ul
< Figure 9 >
To prove (i), set z = J^ueu M t ,)u- Then, since the vectors u, u / tt', span the boundary
face \U\u%
-z-r\u< — - 2 J X(v)u • r)u> = -X(v')u' • r)u> = A(v v \o\\U\u'l
and the assertion follows from the normalization of the box spline.
To prove (ii), define z as before and choose the points bu in the boundary faces [C/\u] and
u + lU\u] as 0 and u respectively. Then,
yo\\u] (0-z)-rju = \(v)
vo\\U\uY and
(u - z) • ( - I J « ) = (1 - X(v))u • {-r,u) = (1 - A(t/))
r) in (17), one
*v(y)= II
vol ! 'vol[ l / \ t t ] '
Setting <f>(x) = exp (—iy • x) in (17), one sees that the Fourier transform of Ny is 1 — exp { — iy • v)
vGV zt/ • t>
(19)
From this it follows that
Nyuv = Nv * Ny (20)
where / * g(x) := / / ( z — y)g(y)dy denotes the convolution of / and g. In particular, if V
consists of a single vector £,
tfvu*(*)= / 7Vv(x-A£) rfA. (20') ./o
120 KLAUS HOLLIG
This identity provides an alternative definition for Ny via repeated "averaging" in the
direction of the vectors v G V.
Definition 2B [BD83, BH82/3]. Assume that the vectors in V have integer coordinates
and that V contains the unit vectors c i , . . . , cn . The space of cardinal splines corresponding
to V is defined as
S(V) := { £ a,Nv (• - j) : a,- € 1R} (21) j '€2ZT O
where 2Z denotes the integers.
Definition 2B is a special case of Definition 1 with Q* :— [0, l ] n _ m and the partition A
consisting of translates of the parallelepiped which is spanned by the vectors (ei, 0 ) , . . . , (em , 0)
and ( t ; m +i , e m + 1 ) , . . . , ( t ; n , e n ) where V = {eu . . . , c m , v m + 1 , . . . , v n } . The assumption that V
contains the unit vectors is no loss of generality since this can always be achieved by a change
of variables. However, for the proof of Theorem 2B below it is essential that all vectors v are
chosen from a common lattice which, again by a change of variables, can assumed to be the
lattice of vectors with integer coefficients.
T h e o r e m 2B [BH82/3]. Denote by (W) the linear span of the vectors {w : w G W} and
define
A : - {W c V : (V\W) ^ IRm}.
Then,
7 r n S ( V ) = p | ke r Dw, (22 ) weA
where n denotes the space of polynomials.
E x a m p l e 5. (i) As was pointed out in Example 4 (ii), the tensor product B-spline is
obtained when V consists of the unit vectors. Assume that each unit vector occurs with
multiplicity a, then A contains the sets
Wu - { e„ , . . . , c „} , i/ = l , . . . , m ,
ot t ime s
and any other set in A contains one of these sets as subset. Therefore, by (22), a polynomial
p is in S(V) if and only if it is annihilated by
Dw„ =d", i/ = l , . . . , m .
(ii) If the vectors in V are in "general" position, then all sets in W G A satisfy #W > k. Thus,
by (22), all polynomials of degree < k are in S(V). This is, e.g., the case for the quadratic
box spline of Example 4 (iii).
MULTIVARIATE SPLINES 121
Proof of Theorem 2B. Let
p:= ^ T ajNv{'-j)ennS{V). j€7Ljn
By the remark following Theorem IB,
DvP= Ylaj(Nv\v(' ~ J) ~ Nv\v(' - 3 ~ v)) = ^2ia3 ~ a3-v)Nv\v(- - j ) ,
where it was used that v has integer coefficients. Repeating this argument,
Dwp = ^ ( V v a ) ^ v ^ ( . - j). (23)
For W G A, the box splines Nw\v{m ~ j) have support on a set of measure zero which implies
that the polynomial DwP vanishes identically, i.e. lies in the kernel of Dw>
For the converse statement we first prove that
L := p | ker Dw C n. (24) WEA
Fix f G IRm. If V ^ A, then £ can be written as linear combination of the vectors in V \ V ,
Therefore,
Iterating this identity, replacing (D^)r Dy by a linear combination of (i)^) r _ 1ZV"UI*;#J w' £
V\V", one arrives at
(DtY = ( E «(V)(^)r-#v'i>v.)+( E «(n*v.) (25) VGA, #V '<r v'cv. VgA
# V ' = r
with certain coefficients a(V).
This proves (24) since, for r > # V , the second sum on the right hand side of (25) is empty
and the derivatives Dy> in the first sum vanish on functions in L.
To complete the proof of the Theorem we show by induction on r that
7 r r n L cS(V)
where 7rr denotes the space of polynomials of degree < r. For the induction step, we prove
that
p G 7rr n L implies q := p - ),p(j)Ny(' - j) G nr-i n L.
122 KLAUS HOLLIG
By (23),
DW(P -g) = ^2(VWP)U)NV{- ~ J)-3
If W G A, then Dwp = 0 and by (18) also (Vwp)(j) = (Awp)(j') = 0. which shows that
p- ge L.
By (25) and since q G L (i.e. Dv*q = 0 for V G A),
(De)r<7 - £ a (V ' ) (ZVp - E ( V v p ) ( i ) J V v \ v ( - - i ) ) .
V'CV. V g A jf
Since p is a polynomial of degree < r, Dy'P — Vy»p, and since J^ -ATv\v (* — i") — 1 it follows
that ( ^ ) r g = 0.
From Theorem 2B one can derive error estimates for approximation by box splines. More
over, the result is useful for studying approximation order for piecewise polynomials on regular
triangulations. For this and further results, the reader is referred to the work by de Boor and
the author [BH82/3, BH831?2] and Dahmen and Micchelli [DM84l5 DM85!].
Surface Approximation
As pointed out in section 3, box splines are natural generalizations of tensor product
B-splines. The underlying triangular meshes yield more flexibility in the choice of degree
and smoothness while some of the attractive computational features of tensor products are
maintained. In this section a simple approximation scheme is described and shape preserving
properties of box spline expansions are discussed.
Denote by Ny the (bivariate) box spline corresponding to a grid of meshsize h and, slightly
changing the notation of the previous section, assume that Ny is centered at 0, i.e
Nb(x)i=Nv(x/h-£v) (26)
where £y •= ^2v^v v'/2 is the center of the box spline Ny defined in (16). Moreover, denote by
N+ the piecewise linear box spline corresponding to the directions V* := {(1,0), (0,1), ( l , 1)}.
In the following it is always assumed that V contains V*. This excludes tensor product
splines and certain degenerate cases where the translates of the box splines Ny are not linearly
independent and therefore is no significant loss of generality.
Define the approximation scheme
f~S$f:= £ f{jh)N$(--jh). (27)
MULTIVARIATE SPLINES 123
which is a generalization of Schoenberg's univariate variation diminishing spline approximant.
In particular, if V = V*, then Sf / is the piecewise linear interpolant to / with respect to the
triangulation of IR which is generated by the three directions (1,0), (0,1) and (1,1).
Proposition 2. The method (27) is second order accurate, i.e.
f(x)-(S>}f)(x) = 0(h2) (28)
for any smooth function / .
Proof. The piecewise linear interpolant S^f is second order accurate. Therefore, arguing
by induction, it is sufficient to show that the estimate (28) remains valid if a vector w is added
to the set of vectors V. From (20') and (26) one sees that
NvuJ*) = h'HK * N$)(x) := f N$(x - Xhw)d\ (29) J-l/2
which implies
S$Uwf = h-1N**Sbf. (30)
Write the left hand side of (28) in the form
(/ - h-'Nt * f)[x) + (h-*Nt * (f - S*/))(*).
The second term is of order h2 since convolution by h~1N^u does not increase the maximum
norm. The first term equals
r l / 2
/ (f(x)- f(x- Xhw))dX. J-1/2
, 1 /2
1/2
Adding 0 = j]_^/2{Dwf){z)Xhd\ to th is expression, it follows that this term is also of order
O(h').
Obviously, Sy is a positive operator, i.e., if / is nonnegative, then so is Syf. Moreover,
Sy preserves monotonicity and convexity which is made more precise below.
Proposition 3 [DM852, G85].
(i) If, for some $ E IR2, D^S^f is nonnegative, then so is D^Syf.
(ii) If S^f is convex, then so is Syf.
The piecewise linear spline S^f is called the "control polygon" of the box spline surface
Syf. It interpolates the box spline coefficients at the points j £ TZ?'. The Proposition states
that the box spline surface has roughly the same "shape" as its control polygon which is a
desirable feature for design purposes.
124 KLAU S HOLLI G
The Proof of Proposition 3 is quite simple: It follows from the identity (30) and the
observation that convolution with a positive kernel preserves monotonicity and convexity.
E.g., for the proof of (ii) assume by induction that Sy f is convex. Then, for x — Y^u Q{V)XU
with Ylu Q(V) — 1 and q{v) > 0,
, 1 /2
1/2 [Svuv,f)(*) = I ( ^ v / ) ( ( E ^ K ) - Xh™)dX
In principle, box splines can be evaluated via the recurrence relation of Theorem IB (ii).
However, for approximate evaluation as is required, e.g. for rendering techniques, algorithms
based on subdivision techniques are considerably faster. For box splines such algorithms have
been developed by Bohm [B683], Cohen, Lyche and Riesenfeld [CLR84], Dahmen and Micchelli
[DM842] and Prautsch [P83/84]. The idea can be described as follows.
A box spline expansion V av (j)Ny (• — jh) can be rewritten as a linear combination of
the box splines Nv' (• - jh/2) corresponding to a refined grid, i.e.
£ 4 ( j ) JV* (x - jh) = £ ahJ2{j + (v)K'2(x - (j + £v)h/2), (31) 0 3
where £y '•— J2vev v/^- The s n ^ by £v is necessary only if £v ^ 2Z2 since then the mesh for hi 1 i>
Ny is not a refinement of the mesh corresponding to Ny. The subdivision process can be
repeated and, as has been shown in [D85], the sequence of control polygons converges to the
box spline surface at a quadratic rate. The coefficients av' in (31) can be computed via the
following
Y^av{j)N^{ih/2-jh), i EH2.
Algorithm.
(i) Define
(ii) Set V ' :=V*.
(iii) if V = V stop
else choose w 6 V\V
ay[ (1) ::
and define
< 4 ' L ( i + tv + W2) := (avU+ tv) + a £ ( i + £v + w))/2 for j e 2L2.
(iv) Set V := V U w and go to step (iii).
Example 6. As was first observed by Bohm [B683], the algorithm takes on a particularly
simple form if
V = K U . . . u V, ,
MULTIVARIATE SPLINES 125
i.e. if V contains the vectors (1,0), (0,1) and (1, l) with equal multiplicity r. In this case three
applications of step (iii) of the Algorithm can be combined which results in
"0 1 1 2
L 1 *
l l l 0
V'uv. I?) == M 1 2 1 U M J ) , J € K Z
where the weights in the square matrix are applied centered at the (double) index j .
Derivation of the Algorithm. For V = V*, the new coefficients are obtained by linear
interpolation since the control polygon interpolates the box spline coefficients. This explains
step (i) of the algorithm. Now, one has to show that (31) remains valid if a vector w is added
to the set V and the coefficients aVUw are computed via step (iii). Convolve both sides of (31)
with h~lN^. Then, by (29), on the left hand side N$ is replaced by N$Uw. For the right
hand side one obtains
/
1/2 N*/2(x- Xhw)d\
-1/2
= (1/2) / Ny/2(x - \{h/2)w)d\
H 1 / 2 M / ; . . . + £ . . . )
= {l/2){K'^{x - hw/4) + Ntfjx + hw/4)).
Therefore, using that y := x - (j + £v){h/2) - (h/2){w/2) = x - (j + £vuw)(h/2), the right
hand side of (31) equals
X > v / 2 0 ' + £v) (1/2) (Ntfjx - V+tvu*)h/2) + (K'Jjx - (j+(Vuw)h/2 + hw/2)) j
= £ (1/2) ( 4 / 2 ( i ) + ahv/2U + «,)) Ntfjx - (j + Zv^)h/2)
3
which establishes the formula for the coefficients.
Bibliography
[B683] W. Bohm, Subdividing multivariate splines, Computer Aided Design 15 (1983), 345-352.
[B76] C. de Boor, Splines as linear combinations of B-splines, in Approximation Theory II,
G. G. Lorentz, C. K. Chui and L. L. Schumaker, eds., Academic Press (1976), 1-47,
[BD83] C. de Boor and R. DeVore, Approximation by smooth multivariate splines, Trans. Amer.
Math. Soc. 276 (1983), 775-788.
[BH82] C. de Boor and K. Hollig, Recurrence relations for multivariate B-splines, Proc. Amer. Math. Soc. 85 (1982), 397-400.
126 KLAU S HOLLI G
[BH82/3] C. de Boor and K. Hollig, B-splines from parallelepipeds, J. Analyse Math. 42 (1982/3),
99-115.
[BH83i] C. de Boor and K. Hollig, Approximation order from bivariate C^-cubics: A counterex
ample, Proc. Amer. Math. Soc. 87 (1983), 649-655.
[BH832] C. de Boor and K. Hollig, Bivariate box splines and smooth pp functions on a three-
direction mesh, J. Comput. Appl. Math. 9 (1983), 13-28.
[BHR85] C. de Boor, K. Hollig and S.D. Riemenschneider, Convergence of cardinal series, Proc.
Amer. Math. Soc, to appear.
[CLR84] E. Cohen, T. Lyche and R. Riesenfeld, Discrete box-splines and refinement algorithms,
Computer Aided Geometric Design 1 (1984), 131-148.
[D85] W. Dahmen, Subdivision algorithms converge quadratically, Tech. Rep. 710 (1985),
Sonderforschungsbereich Universitat Bonn.
[DM82] On the linear independence of multivariate B-splines, I. Triangulations of simploids, SIAM
J. Numer. Anal. 19 (1982), 993-1012.
[DM84x] W. Dahmen and C. A. Micchelli, Recent progress in multivariate splines, in Approxi
mation Theory IV, C. K. Chui, L. L. Schumaker and J. Ward, eds., Academic Press,
New York (1984), 27-121,
[DM842] W. Dahmen and C. A. Micchelli, Subdivision algorithms for the generation of box-spline
surfaces, Computer Aided Geometric Design, to appear.
[DM85i] W. Dahmen and C.A. Micchelli, Combinatorial aspects of multivariate splines, Tech. Rep.
722 (1985), Sonderforschungsbereich Universitat Bonn.
[DM852] W. Dahmen and C.A. Micchelli, Convexity of multivariate Bernstein polynomials and box
spline surfaces, Tech. Rep. 735 (1985), Sonderforschungsbereich Universitat Bonn.
[F71] P.O. Frederickson, Generalized triangular splines, Tech. Rep. 7-71, Lakehead University,
1971.
[GL81] T.N.T. Goodman and S.L. Lee, Spline approximation operators of Bernstein-Schoenberg
type in one and two variables, J. Approx. Theory 33 (1981), 248-263.
[G85] T.N.T. Goodman, private communication.
[H82] K. Hollig, Multivariate splines, SIAM J. Numer. Anal. 19 (1982), 1013-1031.
[M80] C. A. Micchelli, A constructive approach to Kergin interpolation in IR : Multivariate
B-splines and Lagrange interpolation, Rocky Mountain J. Math. 10 (1980), 485-497.
[PS77] M.J.D. Powell and M.A. Sabin, Piecewise quadratic approximation on triangles, ACM
Trans. Math. Softwares (1977), 316-325.
MULTIVARIAT E SPLINE S 12 7
[P83/84] H. Prautsch, Unterteilungsalgorithmen fiir multivariate splines, ein geometrischer Zugang,
Dissertation, Technische Universitat Braunschweig (1984).
[Z73] P. Zwart, Multivariate splines with nondegenerate partitions, SIAM J. Numer. Anal. 10
(1973), 665-673.
COMPUTER SCIENCES DEPARTMENT
UNIVERSITY OF WISCONSIN-MADISON
MADISON, WISCONSIN 53706
INDEX
adaptive, 18 alternating algorithm, 75 alternation, 4-5, 71-72 analytic
continuation, 44 function, 17, 22, 35, 89
approximation, adaptive, 17-18 best, 2, 67 L2, 22, 24 complex, 21 good,10 linear, 55 near-best, 22 nonlinear, 16, 72 polynomial, see polynomial rational, see rational
attenuation factors, 88
Bernstein, 10, 12, 16, 34, 56, 111, 115 Bezier representation
of a pp function, 103 Bezout's theorem, 92 bivariate, 76-78 Blaschke product, 31 blending methods, 76 box spline, 100, 116-125 B-spline, 87, 92, 100, 103
calculation of b.a., 8 Caratheodory-Fejer Theorem, 32 capacity, 29 CF approximation, 32 characterization
of b.a., 5, 30 Chebyshev
expansion or series, 28 polynomials, 9, 11, 28, 31, 57 space, 6
system, 6, 85-86, 89 Chebyshev's Theorem, 4 Christoffel numbers, 42 circularity (of the error), 31-32 computer-aided design, 116 continued fraction, 38 control polygon, 123 convexity
strict, 3 convolution, 13
degree of approximation, 12, 23, 34, 43-44, 121
differential correction (algorithm), 72 Diliberto-Straus (algorithm), 74 dist, 2, 67-68 divided difference, 82 duality, 57, 68, 92
eigenvalues, 32, 59, 96-97 electrostatics, 29 entire (functions), 23, 40 equi-oscillation (criterion), 71-72 exchange method, 70 existence, 2, 43 exponentials, 6
Faber polynomials, 26 series, 26-28
Favard, 12 Fejer, 32 Fekete points, 29 finite element, 117 Fourier
series, 8, 88 transform, 119
discrete, 88 fast, 85
129
130
Gauss quadrature, 42 Gel'fand n-width, 57, 65 von Golitschek (algorithm), 76
Haar space, 6, 30, 70, 72 Hankel matrix, 32 Hardy space, 89 Hermite, 23, 28, 82
-Genocchi formula, 83, 104
interpolation, 10, 56, 81-102 by polynomials, 9, 21, 23, 28, 82-84
good points for, 11, 28-29 quasi-, 16
intrinsic error, 60
Jackson kernel, 14
Jackson's Theorem, 12, 34, 53 Joukowski transformation, 27
Kergin interpolation, 97-98 Kolmogorov
-Arnold Theorem, 76 criterion, 30 n-width, 52
Lagrange form, 11, 82, 84, 86, 92, 100 Lebesgue function, 11 least-squares, 22 lifting of a map, 99-100 linear
n-width, 55-56 programming, 68, 74
Markov inequality, 57 Marsden's identity, 112 Mergelyan's theorem, 33 minimization, 67-68 modulus of continuity, 12, 108 multipoint, 42 multiquadric surface, 96 multivariate, 42, 76-78, 89-100, 103
near circularity, 31 von Neumann (algorithm), 75 Neville-Aitken formula, 83 Newman's approximation
to the absolute value, 16-17, 44 Newton form, 83 nonlinear, 16, 55, 72 nomographic (functions), 76
INDEX
n-width, 51-60 Bernstein, 56 Gel'fand, 57, 65 linear, 55
optimal algorithm, 60 interpolation, 93-95 recovery, 60 spline interpolation, 89, 93 subspace, 53
asymptotically, 55 orthogonal, 8, 22
polynomial, 41 projector, 55
Pade approximant, 35
multipoint, 44 multivariate, 44
table, 37 partition of unity, 19 periodic (spline), 88 Perron, 40 piecewise polynomials, 14-15, 16
see also spline Poly a frequency sequence, 40 polynomials,
algebraic, 1, 5, 9, 11, 16-17, 21-22, 55 trigonometric, see trigonometric
positive definite, conditionally, 96
potential, 29 power function, 6
truncated, 14 projector or projection, 10, 22
minimal, 10, 22 proximity map, 75
central, 75
quasi-interpolant, 16
Radon transform, 99 rational (function)s, 16, 18, 32, 33, 35,
42-45, 72-74 realist, 27 recovery,
optimal, 61-66 recurrence relations, 104-106, 109-110,
117-118 Remez algorithm, 68, 70
INDEX 131
Riemann mapping theorem, 25 ridge function, 91 Rouche's theorem, 31 Runge's theorem, 25, 33
scattered data, 91 Schoenberg, 87, 104, 109, 122 shape preservation, 122 signature
extremal, 33 simplex spline, 109-116 singular value decomposition, 60 Sobolev space,
n-width of, 58-59 spline, 14-15, 54-55, 86-89
box, see box spline B-, see B-spline cardinal, 117 free knot, 17 "natural", 93 perfect, 89 periodic, 88 polyhedral, 100 simplex, see simplex spline thin plate, 95
Stieltjes function, 40, 44 subdivision algorithm, 103, 123 Swiss cheese, 43 s-numbers, 59
Taylor polynomial, 14, 21-22, 26, 33, 58 tensor product, 75, 91, 117, 120, 122 Toeplitz determinant, 37 transfinite diameter, 29 trigonometric polynomials, 6, 9, 13, 52,
84-85 truncated power, 100
uniqueness, 2, 4, 5, 43, 69
Vandermonde, 9, 29, 82 vive la difference!, 42
Walsh, 34 array, 43
Weierstrass Approximation Theorem, 1, 11,33
winding number, 31
ABCDEFGHIJ-89876