lecture3 - university of waterloolinks.uwaterloo.ca/amath391docs/week2.pdf · lecture3 convergence...

Lecture 3

Convergence of sequences in metric spaces

In the previous section, we introduced the idea of a metric space (X, d): A set X along with a

metric d that assigns distances between any two elements x, y ∈ X. It now remains to discuss the

important idea of convergence in such metric spaces. This will be easy, since you have already seen

the idea of convergence of sequences of real numbers.

Recall that a sequence {xn} of real numbers is said to converge to a limit x ∈ R if the following

holds true:

Given any ǫ > 0, there exists an integer Nǫ > 0 such that

|xn − x| < ǫ for all n > Nǫ. (1)

The ǫ-subscripted N , i.e., Nǫ, indicates that the value of N will generally be dependent upon the value

of ǫ. (Most probably, the smaller we make ǫ, thus “squeezing” the tail of the sequence closer to x, the

larger the required value of N .)

But look again at the LHS of the inequality in (1) – not surprisingly, it is the distance between

xn and the limit x, i.e., |xn−x| = d(xn, x), where the metric d on the set of real numbers was discussed

earlier. Therefore, we can rewrite the above requirement for convergence of a sequence {xn} as follows,

Given any ǫ > 0, there exists an integer Nǫ > 0 such that

d(xn, x) < ǫ for all n > Nǫ. (2)

This requirement is equivalent to the statement,

limn→∞

d(xn, x) = 0, (3)

which is also written as follows

d(xn, x) → 0 as n → ∞. (4)

The beautiful thing is that the above result carries over to metric spaces in general – regardless

of what comprises these spaces, e.g., numbers, functions, sets, measures, etc.. We say that a sequence

{xn} of elements of a metric space X, i.e., xn ∈ X converges to a limit x ∈ X if

d(xn, x) → 0 as n → ∞, (5)

30

acknowledging that this is a shorthand notation for the more proper ǫ-Nǫ definition.

The above definition of convergence is fine if you happen to know the limit x of your sequence

{xn}. You may then be able “measure” the distances d(xn, x) and show that this sequence converges

to zero. But what if you don’t know the limit x? Can you still characterize a sequence {xn} as

being convergent, or possibly convergent, if you don’t actually know the limit x? The mathematician

Cauchy struggled with this problem in his study of the real numbers and came up with the following

definition (Cauchy 1821):

A sequence of real numbers {xn} is said to be a Cauchy sequence if, for any ǫ > 0, there

exists an Nǫ > 0 such that

|xn − xm| < ǫ for all n,m > Nǫ. (6)

In other words, the elements xn of the “tail” of the sequence are getting closer and closer to each

other. Cauchy then proved the following remarkable result:

(Cauchy 1821) A sequence of real numbers {xn} is convergent (to a limit x ∈ R) if and

only if it is a Cauchy sequence.

The “if and only if”, i.e., equivalence, of Cauchy sequences and convergence to a limit, is true because

of the completeness of the real number system. We’ll return to this idea in a moment.

The idea of a Cauchy sequence is easily brought into a general metric space setting:

Let (X, d) be a metric space. Then a sequence of elements {xn} is said to be a Cauchy

sequence in (X, d) if, for any ǫ > 0, there exists an Nǫ > 0 such that

d(xn, xm) < ǫ for all n,m > Nǫ. (7)

We have simply replaced the distance function |x−y| on the real number line with the metric d(x, y) in

our metric space. This definition is now applicable to real numbers, ordered n-tuples in Rn, functions,

etc..

In general, however, Cauchy’s “if and only if” result for sequences of real numbers may not hold.

We do have the following result, however:

31

Theorem: Let {xn} be a convergent sequence in a metric space (X, d). Then {xn} is a Cauchy

sequence.

The proof of this theorem is so straightforward that it is worthy of mention here. (It is a direct

analogue of the proof for real numbers.)

Proof: Since {xn} is convergent, there exists an x ∈ X (the limit of the sequence) such that the

following holds: For an ǫ > 0, there exists an Nǫ > 0 such that d(xn, x) < ǫ/2 for all n > Nǫ. (Note

that we’re using ǫ/2 on the RHS, for reasons that will be clear below – it’s quite OK to do this.) Then,

for any n,m > Nǫ, it follows, from the triangle inequality, that

d(xn, xm) ≤ d(xn, x) + d(x, xm)

<ǫ

2+

ǫ

2

= ǫ. (8)

Just to repeat the final conclusion: d(xn, xm) < ǫ for all n,m > Nǫ. Therefore, by definition, the

sequence {xn} is Cauchy, proving the theorem.

And what about the converse? Are all Cauchy sequences convergent? The answer is “no” – it

depends upon the metric space X in which we are working. Here is an example.

Example: Let X = Q, the set of rational numbers, with metric d(x, y) = |x − y|. Consider the

following sequence,

x1 = 3, x2 =31

10, x3 =

314

100, x4 =

3141

1000, · · · . (9)

The element xn is obtained by truncating the decimal expansion of π to n− 1 decimal digits. It can

be shown (left as an exercise – it’s not that difficult) that this sequence is Cauchy. And it darn well

looks like it is convergent, i.e., converging to π. But this limit point π is NOT in the metric space Q.

Therefore the sequence {xn} is NOT convergent.

Of course, if you extended the metric space to be the set of real numbers R, then the sequence

{xn} is convergent – it converges to an element of R, the non-rational number π. As you may already

know, the real numbers are said to be the completion of the rational numbers.

For this reason, we have the following definition:

32

Definition: A metric space (X, d) is said to be complete if all every Cauchy sequence {xn} converges

(to an element x ∈ X).

We can now come up with some easy examples:

1. The metric space X = R, the set of real numbers, is complete.

2. The metric space X = Q, the set of rational numbers, is not complete.

3. The closed interval X = [a, b] of real numbers, is complete.

4. The interval X = (a, b] is not complete. (It’s possible to construct a sequence which converges

to a, which is not an element of X.) Likewise, the interval (a, b) is not complete.

In a short time, we shall arrive at some interesting conclusions regarding metric spaces of functions.

First, however, we examine the consequence of convergence with respect to a couple of important

metrics.

Convergence of sequences of continuous functions with respect to the d∞ and d2

metrics

Convergence in d∞ metric

Here we examine the consequence of convergence of sequences of continuous functions with respect to

the d∞ metric. Recall that for two functions f, g ∈ C[a, b], the distance between them according to

this metric is given by

d∞(f, g) = maxa≤x≤b

|f(x)− g(x)|. (10)

The distance between f and g is the maximum difference of their values on [a, b]. An important

consequence of this is the following: If

d∞(f, g) < ǫ implies that |f(x)− g(x)| < ǫ ∀x, y ∈ [a, b]. (11)

This implies that for all x ∈ [a, b], the difference between f(x) and g(x) is less than ǫ. This is a very

“tight” closeness between the graphs of f(x) and g(x).

Now suppose that we have a sequence of functions fn ∈ C[a, b] that converges to a limit function

f ∈ C[a, b] with respect to the d∞ metric. This means that

d∞(fn, f) → 0 as n → ∞. (12)

33

Using the ǫ definition: For any ǫ > 0, there exists an Nǫ > 0 such that

d∞(fn, f) < ǫ for all n > Nǫ. (13)

But from (11), this implies that

|fn(x)− f(x)| < ǫ for all n > Nǫ. (14)

First of all, let us consider a fixed a value of x in [a, b]. As we push ǫ toward 0, the values fn(x)

for n > Nǫ are forced to be closer and closer toward f(x). In other words, the real numbers fn(x)

converge to f(x), i.e.,

limn→∞

fn(x) = f(x) or simply fn(x) → f(x). (15)

This is true for any x ∈ [a, b]. This type of convergence is known as pointwise convergence.

But there is another important aspect of this result: the convergence is uniform: For any

x ∈ [a, b], the values of fn(x) lie within ǫ of f(x).

Another way to see this is to note that Eq. (14) implies the following (from the property of

absolute values):

f(x)− ǫ < fn(x) < f(x) + ǫ for all n > Nǫ. (16)

In other words, the graphs of the fn(x) functions for n > Nǫ lie inside an ǫ-ribbon of the graph of

f(x). Letting ǫ go to zero reduces the width of this ribbon. The fact that for a given ǫ > 0, the same

Nǫ will work for all values of x in [a, b] implies uniform convergence. This is made possible by the fact

that the interval [a, b] is closed and bounded.

Convergence in d2 metric

Recall that for two functions f, g ∈ C[a, b], the distance between them according to the d2 metric is

given by

d2(f, g) =

[∫ b

a[f(x)− g(x)]2 dx

]1/2

. (17)

Note that

d2(f, g) < ǫ implies that

[∫ b

a[f(x)− g(x)]2 dx

]1/2

< ǫ. (18)

This metric involves an integration over the functions f and g and not a comparison of their values

over the interval. Unlike the case for the d∞ metric, Eq. (18) does not imply that the maximum

difference between f(x) and g(x) values must be less that ǫ. It is possible that the values of f(x) and

g(x) can differ significantly, but over a sufficiently small interval.

34

To illustrate, consider the following example. Let f(x) = 0 and gn(x), n = 2, 3, · · ·, be a sequence

of functions be defined over [0, 1] as follows: For n ≥ 2, define

gn(x) =

0, 0 ≤ x ≤ 12 − 1

n ,

1 + n(

x− 12

)

, 12 − 1

n ≤ x ≤ 12 ,

1− n(

x− 12

)

, 12 ≤ x ≤ 1

2 +1n ,

0, 12 +

1n ≤ x ≤ 1.

(19)

The graph of this seemingly complicated formula is a “hat” centered at x = 1/2, as sketched below.

0

y

1

1x

y = gn(x)

y = f(x) = 0

1

2

1

2−

1

n

1

2+ 1

n

We now compute the d2 distance between f and gn. After a little algebra, we find that

d2(f, gn) =

[∫ 1

0[f(x)− gn(x)]

2 dx

]1/2

=

[

∫ 12+ 1

n

12− 1

n

[gn(x)]2 dx

]1/2

=

√

2

3n. (20)

We can make n as large as we please, thereby making the d2 distance between f and gn as small as

desired. In other words, d2(f, gn) → 0 as n → ∞.

But note that for all n ≥ 2, f(1/2)− g(1/2) = 1, which implies that d∞(f, gn) = 1 for all n ≥ 0.

In summary,

1. In the d2 metric, the functions gn are approaching the function f(x) = 0 as n → ∞.

2. In the d∞ metric, the functions gn remain a constant distance of 1 away from function f(x) = 0.

Therefore, they cannot converge to f(x) = 0 in the d∞ metric.

Let us make one more observation regarding the gn functions. If we focus our attention on a fixed

value of x ∈ [a, b], note that the behaviour of the real numbers gn(x) as n → ∞ may be summarized

as follows:

35

1. If x = 1/2, then gn(1/2) = 1 for all n ≥ 2, implying that

limn→∞

gn(1/2) = 1. (21)

2. If x 6= 1/2, then for some N > 0, gn(x) = 0 for all n > N . (The hat just gets thinner and

thinner.). This implies that

limn→∞

gn(x) = 0, x 6= 1/2. (22)

To summarize, from a pointwise perspective, the sequence of functions gn(x), in the limit n → ∞,

approach the function

g(x) =

0, x 6= 1/2,

1, x = 1/2.(23)

For this reason, g(x) is known as the pointwise limit function of the sequence of functions {gn(x)}.Clearly, g(x) is not a continuous function on [0,1] since it is discontinuous at x = 1/2. Note that

g(x) differs from the zero function f(x) = 0 only at x = 1/2. From the perspective of the d2 metric,

these functions are identical, i.e.,

d2(f, g) =

[∫ 1

0[f(x)− g(x)]2 dx

]1/2

= 0. (24)

The integral does not register the difference of the functions at the single point x = 1/2. More on this

later.

The question that you may now ask is, “So, which metric is better?” The answer is, “It depends

on what you want.” If you are working on a problem that demands functions to be close to each other

over an interval, e.g., uniform convergence, then you use the d∞ metric. On the other hand, if the

difference between functions is more aptly described in terms of integrals, then the d2 metric might

be more useful.

In signal and image processing, the space of continuous functions is too restrictive. This should

be quite evident in the case of images which, by their very nature, are not continuous functions –

every edge of an image, e.g., boundary of an object, represents a curve of discontinuity of the function

u(x, y) representing the image.

The fact that you can have convergence of functions in the d2 metric while still having function

values differing by significant amounts will help us to understand the phenomenon of “ringing”, i.e.,

the Gibbs phenomenon, that occurs when approximating functions with Fourier series.

36

Finally, we comment that the above phenomenon observed for the d2 metric will also be observed

for the other dp metrics, p ≥ 1, that are formulated in terms of integrals.

Appendix: Completeness/incompleteness of the function space C[a, b]

The material in this section was not covered in the lecture. It is included here for the

purpose of “completeness” for those interested. It will not be covered on any examina-

tion.

We now return to the idea of completeness, as applied to the space of continuous functions C[a, b]

on a closed interval. We shall simply state the following result. Its proof is probably discussed in an

advanced course on analysis (and certainly covered in AMATH 731, “Applied Functional Analysis”).

Theorem: The metric space (C[a, b], d∞) is complete.

In other words, if the metric d∞ is employed, a Cauchy sequence of continuous functions {fn} ⊂C[a, b] converges to a limit function f which is continuous, i.e., f ∈ C[a, b]. (Remember that the

sequence {fn} is Cauchy as measured by the metric d∞.)

On the other hand:

Theorem: The metric space (C[a, b], d2) is not complete.

In other words, sequence of of continuous functions {fn} ⊂ C[a, b] which is Cauchy with respect

to the d2 metric may converge to a function f that is NOT continuous, i.e., NOT an element of C[a, b].

An example was presented earlier – the “hat function” centered at x = 1/2. Here is another example.

Example: Let [a, b] = [0, 1] and consider the sequence of functions fn(x) = xn, n = 1, 2, · · ·, sketchedschematically below.

Clearly, fn ∈ C[a, b]. Let’s examine the behaviour of these functions as n → ∞.

1. For any x such that 0 ≤ x < 1, xn → 0 as n → ∞.

2. For x = 1, clearly, xn = 1.

37

y = xn

0x

y

1

1

y = x

y = x2

y = x3

Some graphs of xn on [0, 1].

We’ll simply state the fact that the sequence {fn} is Cauchy in d2 metric (and leave it to the reader

as an optional exercise). Instead, we’ll focus on the limit f of this sequence which, in this case, can

be found rather easily.

Pointwise, it appears that the functions fn(x) are converging to the function f(x) given by

f(x) =

0, 0 ≤ x < 1

1, x = 1.(25)

Clearly, f(x) is not a continuous function – it is discontinuous at x = 1. Let us now examine the

distances d2(fn, f):

d2(fn, f) =

[∫ 1

0[fn(x)− f(x)]2 dx

]1/2

=

[∫ 1

0[xn − 0]2 dx

]1/2

=1√

2n+ 1. (26)

(The single point x = 0 at which f(x) = 1 can be ignored in the integration, since its contribution to

the integral is zero: fn(1)− f(1) = 0. Even if fn(1) 6= f(1), the contribution to the integral would be

zero.) Clearly, d2(fn, f) → 0 as n → ∞, i.e., the sequence of functions {fn} is converging to f in d2

metric.

In summary, we have constructed a sequence of continuous functions fn that converges, in d2

metric, to a discontinuous function, i.e., a function that does not belong to C[0, 1]. Therefore, the

metric space (C[a, b], d2) is not complete.

38

The moral of the story

The moral of the story is that if we wish to talk about convergence in the d2 metric – which we shall

have to do in our applications to signal and image processing – then we’ll have to extend the class of

functions considered. The space C[a, b] of continuous functions is insufficient. We must be prepared

to allow discontinuous functions.

The “completion” of the space C[a, b] with respect to the d2 metric is the space denoted as “L2[a, b].

It is the metric space of functions for which the following distance is finite,

d2(f, g) =

[∫ b

a[f(x)− g(x)]2 dx

]1/2

< ∞. (27)

More on this later.

End of Appendix

39

Lecture 4

Normed linear spaces

Metric spaces – sets with distance functions – are very nice, but they are not sufficient for the appli-

cations we wish to study. We would like to be able to add and subtract signals and images, even take

linear combinations of them. For this reason, we would like to work with spaces that have a vector

space structure. In the literature, vector spaces are also called “linear spaces.”

Definition: Let X be a real (or complex) vector space. A real-valued function ‖x‖ defined on X is a

norm on X if the following properties are satisfied:

1. (positivity) ‖x‖ ≥ 0 for all x ∈ X.

2. (strict positivity) ‖x‖ = 0 if and only if x = 0.

3. (triangle inequality) ‖x+ y‖ ≤ ‖x‖+ ‖y‖.

4. (homogeneity) ‖αx‖ = |α|‖x‖ for any scalar α (α ∈ R or α ∈ C) for any x ∈ X.

‖x‖ is the “length” of x ∈ X.

The pair X, ‖ · ‖) is called a normed linear space.

The norm ‖ · ‖ defines a metric on X,

d(x, y) = ‖x− y‖, x, y ∈ X. (28)

It can be verified that this metric satisfies all of the required properties of a metric. In particular,

the triangle inequality for norms given above guarantees that the above metric satisfies the triangle

inequality for metrics: If we replace x with x− z and y with −y + z in Property 3 above, we obtain

‖x− y‖ ≤ ‖x− z‖+ ‖y − z‖, (29)

or

d(x, y) ≤ d(x, z) + d(z, y). (30)

As a result, normed linear spaces are metric spaces.

40

If we now consider a normed linear space X as a metric space with metric d(x, y) = ‖x− y‖, thenwe may ask whether or not it is complete, i.e., whether all Cauchy sequences in X with respect to this

metric converge to an element in X. If so, then we say that the normed linear space X is complete.

Definition: A complete normed space is called a Banach space.

This is in honour of Stefan Banach (1892-1945), a distinguished Polish mathematician, considered to

be the founder of modern functional analysis.

Examples:

1. The space X = Rn. Of course, you are very familiar with the Euclidean magnitude of a vector,

i.e., an n-tuple, x = (x1, x2, · · · , xn) to characterize its length as follows,

‖x‖2 = (x21 + x22 + · · ·+ x2n)1/2. (31)

This is a particular example of a family of norms that can be assigned to elements of Rn, the

p-norms,

‖x‖p = [|x1|p + |x2|p + · · ·+ |xn|p]1/p , p ≥ 1. (32)

As we saw for the p-metrics, the limit p → ∞ is a special case:

‖x‖∞ = max1≤i≤n

|xi|. (33)

By virtue of the completeness property of the real numbers, the normed linear spaces (Rn, ‖ · ‖p)are complete for all p ≥ 1 as well as the case p = ∞. They are all Banach spaces.

2. X = C[a, b], the space of continuous functions on the interval [a, b]. Here, we use the “infinity

norm:” For an f ∈ C[a, b],

‖f‖∞ = maxa≤x≤b

|f(x)|. (34)

It generates the d∞ metric studied earlier: For f, g ∈ C[a, b],

d∞(f, g) = ‖f − g‖∞ = maxa≤x≤b

|f(x)− g(x)|. (35)

3. The lp sequence spaces: For a p ≥ 1,

lp = {x = (x1, x2, · · ·) |∑

i]1

|xi|p < ∞} (36)

41

We define the p-norm as follows: For an x = (x1, x2, · · ·),

‖x‖p =

[

∞∑

i=1

|xi|p]1/p

. (37)

As an example, consider the sequence,

x =

(

1,1

2,1

3, · · · , 1

n, · · ·

)

, (38)

x ∈ lp for p ≥ 2 but x /∈ l1. This follows from the fact that the sum∞∑

n=1

1

npconverges for

p = 2, 3, 4, · · · but not for p = 1.

4. The space of real-valued, p-integrable functions on [a, b], denoted as Lp[a, b] and defined as

follows:

Lp[a, b] = {f : [a, b] → R |∫ b

a|f(x)|p dx < ∞}, p = 1, 2, · · · . (39)

The associated p-norms are given by

‖f‖p =[∫ b

a|f(x)|p dx

]1/p

. (40)

These norms define the dp metrics introduced in a previous lecture: For f, g ∈ Lp[a, b],

dp(f, g) = ‖f − g‖p =

[∫ b

a|f(x)− g(x)|p dx

]1/p

. (41)

The most commonly used normed linear space in this family is the case p = 2, namely, the space

L2[a, b] of square integrable functions with norm,

‖f‖2 =[∫ b

a|f(x)|2 dx

]1/2

. (42)

(We say that a function f ∈ L2[a, b] is “square integrable” since the integration of its squared

magnitude, |f(x)|2, over the interval [a, b] is finite.) The metric associated with this norm is the

usual “L2 metric”,

d2(f, g) = ‖f − g‖2 =[∫ b

a|f(x)− g(x)|2 dx

]1/2

. (43)

As you may recall – and as will certainly be discussed very shortly – this is the function space

that is relevant to Fourier series.

42

Note that for any p ≥ 1, the space Lp[a, b] includes the space of continuous functions C[a, b].

This follows from the fact that continuous functions on a closed interval are bounded: For each

f ∈ C[a, b], there exists an M ≥ 0 such that |f(x)| ≤ M for all x ∈ [a, b]. It then follows that

∫

a|f(x)|p dx ≤

∫ b

aMp dx

= Mp

∫ b

adx

= Mp(b− a)

< ∞. (44)

Therefore f ∈ Lp.

But the Lp spaces include a great deal more: discontinuous functions, even unbounded functions.

(The latter won’t be needed in our applications.) And the Lp spaces are not identical. For

example, consider the example [a, b] = [0, 1]. The function f(x) = x−1/2 is an element of L1[0, 1]

since∫ 1

0|f(x)| dx =

∫ 1

0x−1/2 dx = 2 < ∞. (45)

But it is not an element of L2[0, 1] since

∫ 1

0|f(x)|2 dx =

∫ 1

0x−1 dx diverges. (46)

Therefore, there are functions which are in L1[0, 1] but not in L2[0, 1], suggesting that L2[a, b] ⊂L1[a, b]. This result can be generalized to Lp[a, b] ⊂ Lq[a, b] when p > q. But this is beyond the

scope of the course.

Note: The Lp spaces may easily be extended to include complex-valued functions f [a, b] → C.

The definition in (39) holds, with “R” simply replaced by “C”.

5. The space of real/complex-valued p-integrable functions on the real line R, i.e.

Lp(R) = {f : R → R (or C) |∫ ∞

−∞

|f(x)|p dx < ∞}, p = 1, 2, · · · , (47)

with associated p-norms,

‖f‖p =[∫ ∞

−∞

|f(x)|p dx]1/p

. (48)

It will be necessary to consider functions defined on the entire real line in this course. From your

previous encounters with improper integrals, you will not be surprised to know that functions

43

in Lp(R) must satisfy a rather stringent condition – that

f(x) → 0 as |x| → ∞. (49)

It’s actually a little more complicated than this – the rate of decay will depend on p.

“Best approximation” in normed linear spaces

We now come to an extremely important concept of this course. You are, of course, very familiar with

finite-dimensional normed linear spaces, such as X = Rn. In such cases, which we shall denote as

“dim(X) = n”, suppose that we have a set of n linearly independent elements ui ∈ X, 1 ≤ i ≤ n, i.e.,

c1u1 + c2u2 + · · ·+ cnun = 0 if and only if c1 = c2 = · · · = cn = 0. (50)

As you know, the {ui} form a basis for X: Given an element v ∈ X, there exists a set of coefficients

ci ∈ R, 1 ≤ i ≤ n, such that

v = c1u1 + c2u2 + · · ·+ cnun. (51)

This course is concerned with the approximation of functions which, as stated in a previous lecture,

are elements of infinite-dimensional spaces, i.e., dim(X) = ∞. In such cases, we have to be a little

more careful. Suppose that we have a set of linearly independent elements ui ∈ X, 1 ≤ i ≤ n. Define

Sn to be the span of the ui, i.e.,

Sn = span {u1, u2, · · · , un}

= {x ∈ X | x = c1u1 + c2u2 + · · ·+ cnun for some (c1, c2, · · · , cn) ∈ Rn}. (52)

Sn is an n-dimensional subspace of the infinite-dimensional space X.

Now let v be an arbitrary element of X, depicted schematically below. The best approximation

in Sn to v ∈ X is given by the element vn ∈ Sn that is closest to v in terms of the distance function

defined by the norm on X, i.e.,

d(v, x) = ‖v − x‖. (53)

We shall denote this minimum distance as

∆n = minx∈Sn

‖x− v‖ = ‖vn − v‖. (54)

44

X

Sn

.

.vn

v∆n = ‖vn − v‖

Best (finite-dimensional) approximation vn ∈ Sn to v ∈ X.

Once again, vn is the element in Sn that lies closest to v, as measured by the metric that is

defined by the norm ‖ · ‖ in X.

The above statement regarding vn, which involves the preceding equation (54), may be written

mathematically and more compactly by means of the so-called “arg” notation. We write,

vn = arg minx∈Sn

‖x− v‖ . (55)

This statement may be read as follows: “vn is the ‘argument’ or ‘element’ in Sn at which the mini-

mization over Sn is achieved”. In other words, “vn is the element in Sn that minimizes the distance

to v, i.e., ‖x− v‖.

(Note: At this point we should make a parenthetical remark that, technically speaking, after the phrase “given

by the element vn ∈ Sn” should be added “(provided it exists)”. As well, the “min” in the equation should

be replaced by “inf”. There are pathological situations where a minimum value for the error is approached

in some limit by a sequence of approximations, with the actual best approximation not existing. But in most

applications, including those considered here, this will not be the case, i.e., Eq. (54) will hold. And in the case

of Hilbert spaces, i.e., complete inner product spaces – to be discussed next – such a “minimizer” always exists,

and it is unique.)

Since the approximation vn lies in Sn, it will have an expansion in the basis set ui, i.e.,

vn = c1u1 + c2u2 + · · · + cnun. (56)

We then write that

v ≈ vn = c1u1 + c2u2 + · · ·+ cnun, (57)

45

often adding the phrase “in the norm ‖ · ‖ or associated metric on X” (e.g., “in L2 norm,” “in d2

metric” or even “in L2 metric”). The error of this approximation or simply “approximation error” is

∆n = ‖v − vn‖ . (58)

As we increase n, i.e., the number of basis elements ui employed, the approximation cannot get

worse, i.e., the error ∆n cannot increase. (If we employ basis elements uk that are not used, i.e., their

associated coefficients are zero, then the approximation error will remain the same.) Hopefully, the

∆n will decrease as n increases. And, indeed, we would hope that ∆n → 0 as n → ∞. Of course, if,

for some n, v ∈ Sn, then ∆n = 0, i.e., v = vn, admitting the expansion in (57).

Note: In the discussion above, there has been no mention of orthogonality. This is because normed

linear spaces, in general, are not necessarily inner product spaces that are equipped with an inner

product, from which comes the property of orthogonality. The special case of best approximation in

inner product spaces will be discussed in the next section of this course.

We now consider some examples of best approximations in normed linear spaces.

1. Example No. 1: Let X = C[a, b], the normed linear space of continuous functions on [a, b].

The functions uk(x) = xk−1, k = 1, 2, · · ·, i.e., {1, x, x2, · · ·}, form a linearly independent set,

hence a basis in C[0, 1]. In this case, given a function f ∈ C[0, 1], the best approximation in Sn

would be the n− 1-degree polynomial approximation to f having the form,

f(x) ∼= vn = c0 + c1x+ · · · cn−1xn−1. (59)

If we use the ‖ · ‖∞ norm on this space, i.e.,

‖f‖∞ = maxx∈[a,b]

|f(x)| , (60)

the best approximation vn ∈ Sn is obtained by solving the following optimization problem,

minc0,···,cn−1

‖f − vn‖ = minc0,···,cn−1

maxx∈[a,b]

|f(x)− c1 − c1x− · · · − cn−1xn−1| . (61)

From a practical point of view, this is a very complicated optimization problem.

(a) The special case n = 1. Here, we use only one function from the basis set, i.e., u1(x) = 1.

The best approximation v1 ∈ S1 is the best constant approximation to f(x) in the d∞

46

metric, i.e., f(x) ≃ c, where c is obtained by minimizing the distance function,

∆∞(c) = d∞(f, c) = ‖f − c‖∞ = maxx∈[a,b]

|f(x)− c| . (62)

If the formula for f(x) exists, we may be able to remove the absolute value by considering

the various intervals over which the integrand, f(x)−c, is positive and negative. In Problem

Set No. 1, you are asked to find the best constant approximation to f(x) = x2 on [0,1]

using the above distance function.

2. Example No. 2: Let X = L1[a, b], the normed linear space of functions on [a, b] satisfying the

condition that

‖f‖1 =∫ b

a|f(x)| dx < ∞ . (63)

Recall that the metric associated with this norm is given by

d1(f, g) = ‖f − g‖1 =

∫ b

a|f(x)− g(x)| dx f, g ∈ L1[a, b] . (64)

Recall that the space of continuous functions C[a, b] is a subset of this space. The functions

uk(x) = xk−1, k = 1, 2, · · ·, also form a basis in this space. Given a function f ∈ L1[0, 1], its best

approximation in Sn is given by the solution to the following optimization problem,

minc0,···,cn−1

‖f − vn‖1 = minc0,···,cn−1

∫ b

a|f(x)− c0 − c1x− · · · − cn−1x

n−1| dx . (65)

This problem actually turns out to be a bit more tractable – it can be solved by the method of

linear programming since the optimization problem in linear in the unknowns ci.

(a) The special case n = 1. Here, we use only one function from the basis set, i.e., u1(x) = 1.

The best approximation v1 ∈ S1 is the best constant approximation to f(x) in the L1

metric, i.e., f(x) ≃ c, where we minimize the distance function,

∆1(c) = d1(f, c) = ‖f − c‖1 =∫ b

a|f(x)− c| dx , (66)

where we have replaced c0 with c. Because the integrand contains an absolute value we

cannot use differentiation methods to find the c-value which minimizes ∆1(c). If the formula

for f(x) exists and is not too complicated, it may be able to evaluate the integral by

evaluating it on separate intervals over which f(x)− c is positive and negative. In Problem

Set No. 1, you are asked to find the best constant approximation to f(x) = x2 on [0,1]

using the above distance function.

47

3. Example No. 3: Let X = L2[a, b], the normed linear space of functions on [a, b] satisfying the

condition

‖f‖22 =∫ b

a[f(x)]2 dx < ∞ . (67)

Recall that the metric associated with this norm is given by

d2(f, g) = ‖f − g‖2 =

[∫ b

a[f(x)− g(x)]2 dx

]1/2

, f, g ∈ L1[a, b] . (68)

Once again, the space of continuous functions C[a, b] is a subset of this space. The functions

uk(x) = xk−1, k = 1, 2, · · ·, also form a basis in this space. Given a function f ∈ L2[0, 1], its best

approximation in Sn is given by the solution to the following optimization problem,

minc0,···,cn−1

‖f − vn‖22 = minc0,···,cn−1

∫ b

a[f(x)− c0 − c1x− · · · − cn−1x

n−1]2 dx . (69)

Note that we have chosen to minimize the squared L2 distance since it avoids square roots.

(Minimizing the square of a positive function h(x) is the same as minimizing h(x).) This problem

can, in principle, be solved analytically by finding the stationary points of the squared distance

function,

∆22(c0, c1, · · · , cn−1) =

∫ b

a[f(x)− c0 − c1x− · · · − cn−1x

n−1]2 dx . (70)

This is one of the reasons that the metric associated with the space L2 is used in most signal

and image processing applications. We’ll illustrate with two special cases.

(a) The special case n = 1. Here again, we use only one function from the basis set, i.e.,

u1(x) = 1. The best approximation v1 ∈ S1 is once again the best constant approximation

to f(x), but this time with respect to the L2 metric, i.e., f(x) ≃ c, where we minimize the

squared L2 distance function,

h(c) = ∆21(c) = [d2(f, c)]

2 = ‖f − c‖22 =

∫ b

a[f(x)− c]2 dx . (71)

There are two ways to solve this minimization problem.

• Method No. 1: Expand the integrand and integrate to produce an expression for

∆22(c) in terms of c:

h(c) = ∆22(c) =

∫ b

a[f(x)]2 dx− 2c

∫ b

af(x) dx+ c2

∫ b

adx

=

∫ b

a[f(x)]2 dx− 2c

∫ b

af(x) dx+ c2(b− a) . (72)

48

Find critical points:

h′(c) = −2

∫ b

af(x) dx+ 2c(b− a) . (73)

Setting h′(c) = 0 yields the result,

c =1

b− a

∫ b

af(x) dx . (74)

Note that h′′(c) = 2(b− a) > 0, so the critical point is a global minimum.

In summary, the best constant approximation to a function f(x) using the L2 metric

on [a, b] is given by

f(x) ≃ 1

b− a

∫ b

af(x) dx . (75)

Note that this constant is the mean or average value of f(x) over [a, b], often denoted

as f̄[a,b]. This is a very well known result in signal and image processing.

• Method No. 2: Use Leibniz’ Rule to differentiate the integral in (71),

h′(c) = 2

∫ b

a[f(x)− c] dx = 0 , (76)

which yields the same result for c.

In Problem Set No. 1, you are asked to provide the best piecewise-constant approxi-

mation to the function f(x) = x2 on [0,1] using the above method.

(b) The special case n = 2. We now use two functions rom the basis set, i.e., u1(x) = 1 and

u2(x) = x to produce the best approximation, v2(x), to f(x) having the form,

f(x) ≃ c0 + c1x . (77)

We must now minimize the following squared L2 distance function,

h(c0, c1) = ∆22(c0, c1) = ‖f − c0 − c1x‖22 =

∫ b

a[f(x)− c0 − c1x]

2 dx , (78)

which is now a function of two variables, c0 and c1. Critical points (c0, c1) must satisfy the

following stationarity conditions,

∂h

∂c0= −2

∫ b

a[f(x)− c0 − c1x] dx = 0 ,

∂h

∂c1= −2

∫ b

a[f(x)− c0 − c1x]x dx = 0 , . (79)

49

These conditions yield the following set of inhomogeneous linear equations in c and d,

[∫ b

adx

]

c0 +

[∫ b

ax dx

]

c1 =

∫ b

af(x) dx

[∫ b

ax dx

]

c0 +

[∫ b

ax2 dx

]

c1 =

∫ b

axf(x) dx . (80)

The integrals on the LHS are easily evaluated in terms of the endpoints a and b. Assuming

that the integrals on the RHS involving f(x) can be evaluated, the system is easily solved

using Cramer’s Rule.

This system of equations may be viewed as a continuous version of the classical method of

least squares best approximation of data points (xi, f(xi)) by the straight line y = c0+ c1x.

4. Example No. 4: We return to the approximations that are yielded by partial sums of the

Fourier series of a function f(x) defined on the interval [−π, π], i.e.,

f(x) ≃ SN (x) = a0 +

N∑

n=1

[an cosnx+ bn sinnx] , (81)

where the coefficients an and bn are given in Eq. (2) in Lecture 2. The relevant normed linear

space is L2[−π, π] which is also an inner product space. The approximation SN (x) is an element

of a 2N + 1-dimensional space that is spanned by the basis functions,

1, cos x, sinx, · · · , cosNx, sinNx , (82)

Here, we simply state that SN (x) is the best approximation to f(x) in this space and that the

coefficients an and bn are obtained from the inner product defined on this space. We shall justify

these statements in the next section, where we discuss approximations in an inner product, or

“Hilbert”, space.

Once again, we mention that in the discussion preceding the above examples, nothing was said

about the “orthogonality” of the basis set un. That is because nothing could be said! We have been

working with normed linear spaces, which are not necessarily inner product spaces. Only in

inner product spaces, the subject of the next section, can we have the property of orthogonality.

50

Inner product spaces

Of course, you are familiar with the idea of inner product spaces – at least finite-dimensional ones.

Let X be an abstract vector space with an inner product, denoted as “〈 , 〉”, a mapping from X×X

to R (or perhaps C). The inner product satisfies the following conditions,

1. 〈x+ y, z〉 = 〈x, z〉+ 〈y, z〉, ∀x, y, z ∈ X ,

2. 〈αx, y〉 = α〈x, y〉, ∀x, y ∈ X, α ∈ R,

3. 〈x, y〉 = 〈y, x〉, ∀x, y ∈ X,

4. 〈x, x〉 ≥ 0, ∀x ∈ X, 〈x, x〉 = 0 if and only if x = 0.

We then say that (X, 〈 , 〉) is an inner product space.

In the case that the field of scalars is C, then Property 3 above becomes

3. 〈x, y〉 = 〈y, x〉, ∀x, y ∈ X,

where the bar indicates complex conjugation. Note that this implies that, from Property 2,

〈x, αy〉 = α〈x, y〉, ∀x, y ∈ X, α ∈ C.

Note: For anyone who has taken courses in Mathematical Physics, the above properties may be

slightly different than what you have seen, as regards the complex conjugations. In Physics, the usual

convention is to complex conjugate the first entry, i.e. 〈αx, y〉 = α〈x, y〉.

The inner product defines a norm as follows,

〈x, x〉 = ‖x‖2 or ‖x‖ =√

〈x, x〉. (83)

(Note: An inner product always generates a norm. But not the other way around, i.e., a norm is not

always expressible in terms of an inner product. You may have seen this in earlier courses – the norm

has to satisfy the so-called “parallelogram law.”)

And where there is a norm, there is a metric: The norm defined by the inner product 〈 , 〉 will definethe following metric,

d(x, y) = ‖x− y‖ =√

〈x− y, x− y〉, ∀x, y ∈ X. (84)

And where there is a metric, we can discuss convergence of sequences, etc..

51

A complete inner product space is called a Hilbert space, in honour of the celebrated

mathematician David Hilbert (1862-1943).

Finally, the inner product satisfies the following relation, called the “Cauchy-Schwarz inequality,”

|〈x, y〉| ≤ ‖x‖‖y‖, ∀x, y ∈ X. (85)

You probably saw this relation in your studies of finite-dimensional inner product spaces, e.g., Rn. It

holds in abstract spaces as well.

52

Lecture 5

Inner product spaces (cont’d)

Examples:

1. X = Rn. Here, for x = (x1, x2, · · · , xn) and y = (y1, y2, · · · , yn),

〈x, y〉 = x1y1 + x2y2 + · · ·+ xnyn. (86)

The norm induced by the inner product is the familiar Euclidean norm, i.e.

‖x‖ = ‖x‖2 =[

n∑

i=1

x2i

]1/2

. (87)

And associated with this norm is the Euclidean metric, i.e.,

d(x, y) = ‖x− y‖ =

[

n∑

i=1

[xi − yi]2

]1/2

. (88)

You’ll note that the inner product generates bf only the p = 2 norm or metric. Rn is also

complete, i.e., it is a Hilbert space.

2. X = Cn – a minor modification of the real vector space case. Here, for x = (x1, x2, · · · , xn) andy = (y1, y2, · · · , yn),

〈x, y〉 = x1y1 + x2y2 + · · ·+ xnyn. (89)

The norm induced by the inner product will be

‖x‖ = ‖x‖2 =

[

n∑

i=1

|xi|2]1/2

, (90)

and the associated metric is

d(x, y) = ‖x− y‖ =

[

n∑

i=1

|xi − yi|2]1/2

. (91)

Cn is complete, therefore a Hilbert space.

3. The sequence space l2 introduced earlier: Here, x = (x1, x2, · · ·) ∈ l2 implies that

∞∑

i=1

x2i < ∞.

For x, y ∈ l2, the inner product is defined as

〈x, y〉 =∞∑

i=1

xiyi. (92)

Note that l2 is the only lp space for which an inner product exists. It is a Hilbert space.

53

4. X = C[a, b], the space of continuous functions on [a, b] is NOT an inner product space.

5. The space of (real- or complex-valued) square-integrable functions L2[a, b] introduced earlier.

Here,

‖f‖2 = 〈f, f〉 =∫ b

a|f(x)|2 dx < ∞. (93)

The inner product in this space is given by

〈f, g〉 =∫ b

af(x)g(x) dx, (94)

where we have allowed for complex-valued functions.

As in the case of sequence spaces, L2 is the only Lp space for which an inner product exists. It

is also a Hilbert space.

6. The space of (real- or complex-valued) square-integrable functions L2(R) on the real line, also

introduced earlier. Here,

‖f‖2 = 〈f, f〉 =∫ ∞

−∞

|f(x)|2 dx < ∞. (95)

The inner product in this space is given by

〈f, g〉 =∫ ∞

−∞

f(x)g(x) dx, (96)

Once again, L2 is the only Lp space for which an inner product exists. It is also a Hilbert space,

and will the primary space in which we will be working for the remainder of this course.

Orthogonality in inner product spaces

An important property of inner product spaces is “orthogonality.” Let X be an inner product space.

If 〈x, y〉 = 0 for two elements x, y ∈ X, then x and y are said to be orthogonal (to each other).

Mathematically, this relation is written as “x ⊥ y,” just as is done for vectors in Rn.

We’re now going to need a few ideas and definitions for the discussion that is coming up.

• Recall that a subspace Y of a vector space X is a nonempty subset Y ⊂ X such that for all

y1, y2 ∈ Y , and all scalars c1, c2, the element c1y1+ c2y2 ∈ Y , i.e., Y is itself a vector space. This

implies that Y must contain the zero element, i.e., y = 0.

54

• Moreover the subspace Y is convex: For every x, y ∈ Y , the “segment” joining x + y, i.e., the

set of all convex combinations,

z = αx+ (1− α)y, 0 ≤ α ≤ 1, (97)

is contained in Y .

• A vector space X is said to be the direct sum of two subspaces, Y and Z, written as follows,

X = Y ⊕ Z, (98)

if each x ∈ Z has a unique representation of the form

x = y + z, y ∈ Y, z ∈ Z. (99)

The sets Y and Z are said to be algebraic complements of each other.

Note: The concept of an algebraic complement does not have to invoke the use of orthogonality.

(Unfortunately, it was used in the lecture.)

• In Rn and inner product spaces X in general, it is convenient to consider spaces that are orthog-

onal to each other. Let S ⊂ X be a subset of X and define

S⊥ = {x ∈ X | x ⊥ S} = {x ∈ X | 〈x, y〉 = 0 ∀y ∈ S}. (100)

The set S⊥ is said to be the orthogonal complement of S.

55

Aside: Some remarks regarding the idea of a “complement”

This section was not covered in the lecture. It is included here only for supplementary

purposes. You will not be examined on this material.

The concept of an algebraic complement does not have to invoke the use of orthogonality. With

thanks to a student who once raised the question of algebraic vs. orthogonal complementarity in class,

let us consider the following example.

Let X denote the (11-dimensional) vector space of polynomials of degree 10, i.e.,

X = {10∑

k=0

ckxk, ck ∈ R, 0 ≤ k ≤ 10}. (101)

Equivalently,

X = span {1, x, x2, · · · , x10}. (102)

Now define

Y = span {1, x, x2, x3, x4, x5}, Z = span {x6, x7, x8, x9, x10}. (103)

First of all, Y and Z are subspaces of X. Furthermore, X is a direct sum of the subspaces Y and Z.

However, the spaces Y and Z are not orthogonal complements of each other.

First of all, for the notion of orthogonal complmentarity, we would have to define an interval of

support, e.g., [0, 1], over which the inner products of the functions is defined. (And then we would

have to make sure that all inner products involving these functions are defined.) Using the linearly

independent functions, xk, − ≤ k ≤ 10, one can then construct (via Gram-Schmidt orthogonalization)

an orthogonal set of polynomial basis functions, {φk(x)}, 0 ≤ k ≤ 10, over X. It is possible that the

first few members of this orthogonal set will contain the functions xk, 0 ≤ k ≤ 5, which come from

the set Y . But the remaining members of the orthogonal set will contain higher powers of x, i.e., xk,

6 ≤ k ≤ 10, as well as lower powers of x, i.e., xk¡ 0 ≤ k ≤ 5. In other words, the remaining members

of the orthogonal set will not be elements of the set Y – they will have nonzero components in X.

See also Example 3 below.

56

Examples:

1. Let X be the Hilbert space R3 and S ⊂ X defined as S = span{(1, 0, 0)}. Then S⊥ =

span{(0, 1, 0), (0, 0, 1)}. In this case both S and S⊥ are subspaces.

2. As before X = R3 but with S = {(c, 0, 0) | c ∈ [0, 1]}. Now, S is no longer a subspace but simply

a subset of X. Nevertheless S⊥ is the same set as in 1. above, i.e., S⊥ = span{(0, 1, 0), (0, 0, 1)}.We have to include all elements of X that are orthogonal to the elements of S. That being

said, we shall normally be working more along the lines of Example 1, i.e., subspaces and their

orthogonal complements.

3. Further to the discussion of algebraic vs. orthogonal complementarity, consider the same spaces

X, Y and Z as defined in Eqs. (102) and (103), but defined over the interval [−1, 1]. The

orthogonal polynomials φk(x) over [−1, 1] that may be constructed from the functions xk, 0 ≤k ≤ 10 are the so-called Legendre polynomials, Pn(x), listed below:

n Pn(x)

0 1

1 x

2 12(3x

2 − 1)

3 12(5x

3 − 3x)

4 18(35x

4 − 30x2 + 3)

5 18(63x

5 − 70x3 + 15x)

6 116(231x

6 − 315x4 + 105x2 − 5)

7 116 (429x

7 − 693x5 + 315x3 − 35x)

8 1128 (6435x

8 − 12012x6 + 6930x4 − 1260x2 + 35)

9 1128 (12155x

9 − 25740x7 + 18018x5 − 4620x3 + 315x)

10 1256(46189x

10 − 109395x8 + 90090x6 − 30030x4 + 3465x2 − 63)

(104)

These polynomials satisfy the following orthogonality property,

∫ 1

−1Pm(x)Pn(x) dx =

2

2n+ 1δmn , (105)

57

where δmn is the Kronecker delta,

δmn =

1 , m = n ,

0 , m 6= n .(106)

From the above table, we see that the Legendre polynomials Pn(x), 1 ≤ n ≤ 5 belong to space

Y , whereas the polynomials Pn, 5 ≤ n ≤ 10, do not belong solely to Z. This suggests that the

spaces Y and Z are not orthogonal complements. However, the following spaces are orthogonal

complements:

Y ′ = span{P0, P1, P2, P3, P4, P5}, Z ′ = span{P6, P7, P8, P9, P10}. (107)

(Actually, Y ′ is identical to Y defined earlier.)

There is, however, another decomposition going on in this space, which is made possible by

the fact that the interval [−1, 1] is symmetric with respect to the point x = 0. Note that the

polynomials Pn(x) are either even or odd. This suggests that we should consider the following

subsets of X,

Y ′′ = {u ∈ X | u is an even function}, Z ′′ = {u ∈ X | u is an odd function}. (108)

It is a quite simple exercise to show that any function f(x) defined on an interval [−a, a] may

be written as a sum of an even function and an odd function. This implies that any element

u ∈ X may be expressed n the form

u = v + w, v ∈ Y ′′, w ∈ Z ′′. (109)

Therefore the spaces Y ′ and Z ′ are algebraic complements. In terms of the inner product of

functions over [−1, 1], however, Y ′ and Z ′ are also orthogonal complements since

∫ 1

−1f(x)g(x) dx = 0, (110)

if f and g have different parity.

58

The discussion that follows will be centered around Hilbert spaces, i.e., complete inner product

spaces. This is because we shall need the closure properties of these spaces, i.e., that they contain

the limit points of all sequences. The following result is very important.

The “Projection Theorem” for Hilbert spaces

Theorem: Let H be a Hilbert space and Y ⊂ H any closed subspace of H. (Note: This means that

Y contains its limit points. In the case of finite-dimensional spaces, e.g., Rn, a subspace is closed. But

a subspace of an infinite-dimensional vector space need not be closed.) Now let Z = Y ⊥. Then for

any x ∈ H, there is a unique decomposition of the form

x = y + z, y ∈ Y, z ∈ Z = Y ⊥. (111)

The point y is called the (orthogonal) projection of x on Y .

This is an extremely important result from analysis, and equally important in applications. We’ll

examine its implications a little later, in terms of “best approximations” in a Hilbert space. In the

figure below, we provide a sketch that will hopefully illustrate the situation.

y

0

H

Y

Y

x

The space Y is contained between the two lines that emanate from 0. Note that Y lies on both sides

of 0: If p ∈ Y , then −p, which lies on the “other side” of 0, is also an element of Y . The point y ∈ Y is

the orthogonal projection of x onto the set Y and may be viewed as the point of intersection betwenn

a line which extends from x to the set Y in such a way that it is “perpendicular” to Y . The examples

that we consider should clarify this idea.

59

From Eq. (111), we can define a mapping PY : H → Y , the projection of H onto Y so that

PY : x → y. (112)

Note that

PY : H → Y, PY : Y → Y, PY : Y ⊥ → {0}. (113)

Furthermore, PY is an idempotent operator, i.e.,

P 2Y = PY . (114)

This follows from the fact that PY (x) = y and PY (y) = y, implying that PY (PY (x)) = PY (x).

Finally, note that

‖x‖2 = ‖y‖2 + ‖z‖2. (115)

This follows from the fact that the norm is defined by means of the inner product:

‖x‖2 = 〈x, x〉

= 〈y + z, y + z〉

= 〈y, y〉+ 〈y, z〉+ 〈z, y〉 + 〈z, z〉

= ‖y‖2 + ‖z‖2, (116)

where the final line results from the fact that 〈y, z〉 = 〈z, y〉 since y ∈ Y and z ∈ Z = y⊥.

Example: Let H = R3, Y = span{(1, 0, 0)} and Y ⊥ = span{(0, 1, 0), (0, 0, 1)}. Then x = (1, 2, 3)

admits the unique expansion,

(1, 2, 3) = (1, 0, 0) + (0, 2, 3), (117)

where y = (1, 0, 0) ∈ Y and z = (0, 2, 3) ∈ Y ⊥. y is the unique projection of x on Y .

Orthogonal/orthonormal sets of a Hilbert space

Let H denote a Hilbert space. A set {u1, u2, · · · un} ⊂ H is said to form an orthogonal set in H if

〈ui, uj〉 = 0 for i 6= j. (118)

If, in addition,

〈ui, ui〉 = ‖ui‖2 = 1, 1 ≤ i ≤ n, (119)

60

then the {ui} are said to form an orthonormal set in H.

You will not be surprised by the following result, since you have most probably seen it in earlier

courses in linear algebra.

Theorem: An orthogonal set {u1, u2, · · ·} not containing the element {0} is linearly independent.

Proof: Assume that there are scalars c1, c2, · · · , cn such that

c1u1 + c2u2 · · · cnun = 0. (120)

For each k = 1, 2, · · · n, form the inner product of both sides of the above equation with uk, i.e.,

〈uk, c1u1 + c2u2 · · · + cnun〉 = 〈uk, 0〉 = 0. (121)

By the orthogonality of the ui, the LHS of the above equation reduces to ck‖uk‖2, implying that

ck‖uk‖2 = 0, k = 1, 2, · · · , n. (122)

By assumption, however, uk 6= 0, implying that ‖uk‖2 6= 0. This implies that all scalars ak are zero,

which means that the set {u1, u2, · · · , un} is linearly independent.

Note: As you have also seen in courses in linear algebra, given a linearly independent set {v1, v2, · · · , vn},we can always construct an orthonormal set {e1, e2, · · · , en} via theGram-Schmidt orthogonalization

procedure. Moreover,

span{v1, v2, · · · , vn} = span{e1, e2, · · · , en}. (123)

More on this later.

We have now arrived at the most important result of this section.

61

Best approximation in Hilbert spaces

Recall the idea of the best approximation in normed linear spaces, discussed a couple of lectures ago.

Let X be an infinite-dimensional normed linear space. Furthermore, let ui ∈ X, 1 ≤ i ≤ n, be a set

of n linearly independent elements of X and define the n-dimensional subspace,

Sn = span{u1, u2, · · · , un}. (124)

Then let x be an arbitrary element of X. We wish to find the best approximation to x in the subspace

Sn. It will be given by the element yn ∈ Sn that lies closest to x, i.e.,

yn = arg minv∈Sn

‖x− v‖. (125)

(The variables used above may be different from those used in the earlier lecture.)

We are going to use the same idea of best approximation, but in a Hilbert space setting, where

we have the additional property that an inner product exists in our space. This, of course, opens the

door to the idea of orthogonality, which will play an important role.

The best approximation in Hilbert spaces may be phrased as follows:

Theorem: Let {e1, e2, · · · , en} be an orthonormal set in a Hilbert spaceH. Define Y = span{ei}ni=1.

Y is a subspace of H. Then for any x ∈ H, the best approximation of x in Y is given by the unique

element

y = PY (x) =

n∑

k=1

ckek (projection of x onto Y ), (126)

where

ck = 〈x, ek〉, k = 1, 2, · · · , n. (127)

The ck are called the Fourier coefficients of x w.r.t. the set {ek}.Furthermore,

n∑

k=1

|ck|2 ≤ ‖x‖2. (128)

Proof: Any element v ∈ Y may be written in the form

v =n∑

k=1

ckek. (129)

62

The best approximation to x in Y is the point y ∈ Y that minimizes the distance ‖x− v‖, v ∈ Y , i.e.,

y = arg minv∈Y

‖x− v‖. (130)

In other words, we must find scalars c1, c2, · · · , cn such that the distance,

f(c1, c2, · · · , cn) =∥

∥

∥

∥

∥

x−n∑

k=1

ckek

∥

∥

∥

∥

∥

, (131)

is minimized. Here, f : Rn (or Cn) → R. It is easier to consider the non-negative squared distance

function,

g(c1, c2, · · · , cn) =

∥

∥

∥

∥

∥

x−n∑

k=1

ckek

∥

∥

∥

∥

∥

2

=

⟨

x−n∑

k=1

ckek, x−n∑

l=1

clel

⟩

≥ 0. (132)

Minimizing g is equivalent to minimizing f .

Here, we consider the real-scalar-valued case, i.e., ci ∈ R, which is somewhat simpler than the

complex-valued case. (The latter which, of course, includes the former, is proved in an Appendix

at the end of this day’s lecture.) Using the linearity properties of the inner product, we can first

right-hand side into four components as follows,⟨

x−n∑

k=1

ckek, x−n∑

l=1

clel

⟩

= 〈x, x〉 −⟨

x,

m∑

l=1

clel

⟩

−⟨

n∑

k=1

ckek, x

⟩

−⟨

n∑

k=1

ckek,

n∑

l=1

clel

⟩

. (133)

The first term on the RHS is simply 〈x, x〉 = ‖x‖2. The second term, using once again the linearity

properties of the inner product, may be expressed as follows,⟨

x,

n∑

l=1

clel

⟩

=

n∑

l=1

〈x, clel〉 =n∑

l=1

cl〈x, el〉 . (134)

Likewise, the third term becomes⟨

n∑

k=1

ckek, x

⟩

=

n∑

k=1

ck〈ek, x〉 =n∑

k=1

ck〈x, ek〉 . (135)

The final inner product on the RHS of Eq. (133) becomes,⟨

n∑

k=1

ckek,n∑

l=1

clel

⟩

=n∑

k=1

n∑

l=1

ckcl〈ek, el〉

=n∑

k=1

c2k , (136)

63

where the final line is a result of the orthonormality of the en.

From all of these calculations, Eq. (133) becomes,

g(c1, c2, · · · , cn) = ‖x‖2 −n∑

l=1

cl〈x, el〉 −n∑

k=1

ck〈x, ek〉+n∑

k=1

c2k. (137)

Recall that we would like to minimize this function of n variables. We first impose the necessary

stationarity conditions for a minimum,

∂g

∂cp= −〈x, ep〉 − 〈ep, x〉+ 2cp = 0, p = 1, 2, · · · , n. (138)

Therefore,

cp = 〈x, ep〉, p = 1, 2, · · · , n, (139)

which identifies a unique point c ∈ Rn, to which corresponds a unique element y ∈ Y . In order to

check that this point corresponds to a minimum, we examine the second partial derivatives,

∂2g

∂cq∂cp= 2δqp. (140)

In other words, the Hessian matrix is diagonal and positive definite for all c 6= 0. Therefore the point

correponds to a global minimum.

Finally, substitution of these (real or complex) values of ck into the squared distance function in

Eq. (132) yields the result

g(c1, c2, · · · cn) = ‖x‖2 −n∑

l=1

|cl|2 −n∑

k=1

|ck|2 +n∑

k=1

|ck|2

= ‖x‖2 −n∑

k=1

|ck|2

≥ 0, (141)

which then implies Eq. (128). The proof of the Theorem is complete.

Some additional comments regarding the best approximation:

1. The above result implies that the element x ∈ H may be expressed uniquely as

x = y + z, y ∈ Y, z ∈ Z = Y ⊥. (142)

To see this, define

z = x− y = x−n∑

k=1

ckek, (143)

64

where the ck are given by (127). For l = 1, 2, · · · , n, take the inner product of el with both sides

of this equation to give

〈z, el〉 = 〈x, el〉 −n∑

k=1

ck〈ek, el〉

= 〈x, el〉 − cl

= 0. (144)

Therefore, z ⊥ el, l = 1, 2, · · · , n, implying that z ∈ Y ⊥.

2. The term z in (142) may be viewed as the residual in the approximation x ≈ y. Its norm then

defines the error of approximation,

‖x− y‖ = ‖z‖. (145)

3. As the dimension n of the orthonormal set {e1, e2, · · · , en} is increased, we expect to obtain

better approximations to the element x – unless, of course, x is an element of one of these

finite-dimensional spaces, in which case we arrive at zero approximation error, and no further

improvement is possible. Let us designate Yn = span{e1, e2, · · · , en}, and yn = PYn(x) the best

approximation to x in Yn. Then the error of this approximation is given by

‖zn‖ = ‖x− yn‖. (146)

We expect that ‖zn+1‖ ≤ ‖zn‖.

4. Note also that the inequalityn∑

k=1

|ck|2 ≤ ‖x‖2 (147)

holds for all appropriate values of n > 0. (If H is finite-dimensional, i.e., dim(H) = N , then

n = 1, 2, · · · , N .) In other words, the partials sums on the left are bounded from above. This

inequality, known as Bessel’s inequality, will have important consequences.

Appendix: Best approximation in the complex-scalar case

We now consider the squared distance function g(a) in Eq. (132) in the case that H is a complex

Hilbert space, i.e., a ∈ Cn.

g(c1, c2, · · · , cn) =

∥

∥

∥

∥

∥

x−n∑

k=1

ckek

∥

∥

∥

∥

∥

2

65

= 〈x−n∑

k=1

ckek, x−n∑

l=1

clel〉

= ‖x‖2 − 〈n∑

k=1

ckek, x〉 − 〈x,n∑

l=1

clel〉+n∑

k=1

n∑

l=1

〈ckek, clel〉

= ‖x‖2 −n∑

k=1

[

ak〈x, ek〉+ ck〈x, ek〉+ |ck|2]

= ‖x‖2 +n∑

k=1

[

|〈x, ek〉|2 − ak〈x, ek〉 − ck〈x, ek〉+ |ck|2]

−n∑

k=1

|〈x, ek〉|2

= ‖x‖2 +n∑

k=1

|〈x, ek〉 − ck|2 −n∑

k=1

|〈x, ek〉|2. (148)

The first and last terms are fixed. The middle term is a sum of nonnegative numbers. The minimum

value is achieved when all of these terms are zero. Consequently, f(c1, · · · , cn) is a minimum if and

only if ck = 〈x, ek〉 for k = 1, 2, · · · , n. As in the real case, we have

‖x‖2 −n∑

k=1

|〈x, ek〉|2 ≥ 0. (149)

66

lecture3 - university of waterloolinks.uwaterloo.ca/amath391docs/week2.pdf · lecture3 convergence...

Documents