introduction to calculus in several variables

Introduction to Calculusin Several Variables

Michael E. Taylor

1

2

Contents

0. One-variable calculus1. The derivative2. Inverse function and implicit function theorem3. Fundamental local existence theorem for ODE4. The Riemann integral in n variables5. Integration on surfaces6. Differential forms7. Products and exterior derivatives of forms8. The general Stokes formula9. The classical Gauss, Green, and Stokes formulas

10. Holomorphic functions and harmonic functions11. The Brouwer fixed-point theorem

A. Metric spaces, convergence, and compactnessB. Partitions of unityC. Differential forms and the change of variable formulaD. Remarks on power seriesE. The Weierstrass theorem and the Stone-Weierstrass theoremF. Convolution approach to the Weierstrass approximation theoremG. Fourier series – first stepsH. Inner product spaces

3

Introduction

The goal of this text is to present a compact treatment of the basic results of multivari-able calculus, to students who have had 3 semesters of calculus, including an introductorytreatment of multivariable calculus, and who have had a course in linear algebra, but whocould use a logical development of the basics, particularly of the integral.

We begin with an introductory section, §0, presenting the elements of one-dimensionalcalculus. We first define the Riemann integral of a class of functions on an interval. Wethen introduce the derivative, and establish the Fundamental Theorem of Calculus, relatingdifferentiation and integration as essentially inverse operations. Further results are dealtwith in the exercises, such as the change of variable formula for integrals, and the Taylorformula with remainder.

In §1 we define the derivative of a function F : O → Rm, when O is an open subset ofRn, as a linear map from Rn to Rm. We establish some basic properties, such as the chainrule. We use the one-dimensional integral as a tool to show that, if the matrix of firstorder partial derivatives of F is continuous on O, then F is differentiable on O. We alsodiscuss two convenient multi-index notations for higher derivatives, and derive the Taylorformula with remainder for a smooth function F on O ⊂ Rn.

In §2 we establish the Inverse Function Theorem, stating that a smooth map F : O → Rn

with an invertible derivative DF (p) has a smooth inverse defined near q = F (p). We derivethe Implicit Function Theorem as a consequence of this. As a tool in proving the InverseFunction Theorem, we use a fixed point theorem known as the Contraction MappingPrinciple.

In §3 we establish the fundamental local existence theorem for systems of ordinarydifferential equations. ODE is not a major topic in this course, but a treatment of theexistence theory fits in well here, as it makes further use of the Contraction MappingPrinciple.

In §4 we take up the multidimensional Riemann integral. The basic definition is quiteparallel to the one-dimensional case, but a number of central results, while parallel instatement to the one-dimensional case, require more elaborate demonstrations in higherdimensions. This section is the most demanding in these notes, but it is in a sense the heartof the course. One central result is the change of variable formula for multidimensionalintegrals. Another is the reduction of multiple integrals to iterated integrals.

In §5 we pass from integrals of functions defined on domains in Rn to integrals of func-tions defined on n-dimensional surfaces. This includes the study of surface area. Surfacearea is defined via a “metric tensor,” an object that is a little more complicated than avector.

In §6 we introduce a further class of objects that are defined on surfaces, differentialforms. A k-form can be integrated over a k-dimensional surface, endowed with an extrapiece of structure, an “orientation.” The change of variable formula established in §4 playsa central role in establishing this. Properties of differential forms are developed further inthe next two sections. In §7 we define exterior products of forms, and interior products

4

of forms with vector fields. Then we define the exterior derivative of a form. Section 8is devoted to the general Stokes formula, an important integral identity which containsas special cases classical identities of Green, Gauss, and Stokes. These special cases arediscussed in some detail in §9.

In §10 we use Green’s Theorem to derive fundamental properties of holomorphic func-tions of a complex variable. Sprinkled throughout earlier sections are some allusions tofunctions of complex variables, particularly in some of the exercises in §§1–2. Readers withno previous exposure to complex variables might wish to return to these exercises after get-ting through §10. In this section, we also discuss some results on the closely related studyof harmonic functions. One result is Liouville’s Theorem, stating that a bounded harmonicfunction on all of Rn must be constant. When specialized to holomorphic functions onC = R2, this yields a proof of the Fundamental Theorem of Algebra.

In §11 we present the Brouwer Fixed Point Theorem, which states that a continuousmap from the closed unit ball in Rn to itself must have a fixed point. We give a proof ofthis that makes use of the Stokes formula for differential forms. This fixed point theoremplays an important role in several areas of mathematics, as the student who continues tostudy the subject will see. Here we present it principally as an illustration of the usefulnessof the calculus of differential forms.

Appendix A at the end of these notes covers some basic notions of metric spaces andcompactness, used from time to time throughout the notes, particularly in the study ofthe Riemann integral and in the proof of the fundamental existence theorem for ODE.Appendix B discusses partitions of unity, useful particularly in the proof of the Stokesformula. Appendix C presents a proof of the change of variable formula for multipleintegrals, following an idea presented by Lax in [L], which is somewhat shorter than thatgiven in §4. Appendix D discusses the remainder term in the Taylor series of a function.Appendix E gives the Weierstrass theorem on approximating a continuous function bypolynomials, and an extension, known as the Stone-Weierstrass theorem, which is a usefultool in analysis. Appendix F treats another approach to the Weierstrass approximationtheorem. Appendix G provides an introduction to Fourier series. Appendix H gives somebasic material on inner product spaces.

It has been our express intention to make this presentation of multivariable calculusshort. As part of this package, the exercises play a particularly important role in developingthe material. In addition to exercises providing practice in applying results establishedin the text, there are also exercises asking the reader to supply details for some of thearguments used in the text. There are also scattered exercises intended to give the readera fresh perspective on topics familiar from previous study. We mention in particularexercises on determinants and on the cross product in §1, and on trigonometric functionsin §3.

5

0. One-variable calculus

In this brief discussion of one variable calculus, we first consider the integral, and thenrelate it to the derivative. We will define the Riemann integral of a bounded function overan interval I = [a, b] on the real line. To do this, we partition I into smaller intervals. Apartition P of I is a finite collection of subintervals Jk : 0 ≤ k ≤ N, disjoint except fortheir endpoints, whose union is I. We can order the Jk so that Jk = [xk, xk+1], where

(0.1) x0 < x1 < · · · < xN < xN+1, x0 = a, xN+1 = b.

We call the points xk the endpoints of P. We set

(0.2) `(Jk) = xk+1 − xk, maxsize (P) = max0≤k≤N

`(Jk)

We then set

(0.3)

IP(f) =∑

k

supJk

f(x) `(Jk),

IP(f) =∑

k

infJk

f(x) `(Jk).

(See (A.10)–(A.11) for the definition of sup and inf.) Note that IP(f) ≤ IP(f). Thesequantities should approximate the Riemann integral of f, if the partition P is sufficiently“fine.”

To be more precise, if P and Q are two partitions of I, we say P refines Q, and writeP Â Q, if P is formed by partitioning each interval in Q. Equivalently, P Â Q if and onlyif all the endpoints of Q are also endpoints of P. It is easy to see that any two partitionshave a common refinement; just take the union of their endpoints, to form a new partition.Note also that

(0.4) P Â Q =⇒ IP(f) ≤ IQ(f), and IP(f) ≥ IQ(f).

Consequently, if Pj are any two partitions and Q is a common refinement, we have

(0.5) IP1(f) ≤ IQ(f) ≤ IQ(f) ≤ IP2(f).

Now, whenever f : I → R is bounded, the following quantities are well defined:

(0.6) I(f) = infP∈Π(I)

IP(f), I(f) = supP∈Π(I)

IP(f),

where Π(I) is the set of all partitions of I. Clearly, by (0.5), I(f) ≤ I(f). We then saythat f is Riemann integrable provided I(f) = I(f), and in such a case, we set

(0.7)∫ b

a

f(x) dx =∫

I

f(x) dx = I(f) = I(f).

We will denote the set of Riemann integrable functions on I by R(I).We derive some basic properties of the Riemann integral.

6

Proposition 0.1. If f, g ∈ R(I), then f + g ∈ R(I), and

(0.8)∫

I

(f + g) dx =∫

I

f dx +∫

I

g dx.

Proof. If Jk is any subinterval of I, then

supJk

(f + g) ≤ supJk

f + supJk

g,

so, for any partition P, we have IP(f + g) ≤ IP(f) + IP(g). Also, using common re-finements, we can simultaneously approximate I(f) and I(g) by IP(f) and IP(g). Thusthe characterization (0.6) implies I(f + g) ≤ I(f) + I(g). A parallel argument impliesI(f + g) ≥ I(f) + I(g), and the proposition follows.

Next, there is a fair supply of Riemann integrable functions.

Proposition 0.2. If f is continuous on I, then f is Riemann integrable.

Proof. Any continuous function on a compact interval is bounded (see Proposition A.14)and uniformly continuous (see Proposition A.15); let ω(δ) be a modulus of continuity forf, so

(0.9) |x− y| ≤ δ =⇒ |f(x)− f(y)| ≤ ω(δ), ω(δ) → 0 as δ → 0.

Then

(0.10) maxsize (P) ≤ δ =⇒ IP(f)− IP(f) ≤ ω(δ) · `(I),

which yields the proposition.

We denote the set of continuous functions on I by C(I). Thus Proposition 0.2 says

C(I) ⊂ R(I).

The proof of Proposition 0.2 provides a criterion on a partition guaranteeing that IP(f)and IP(f) are close to

∫If dx when f is continuous. We produce an extension, showing

when IP(f) and I(f) are close, and IP(f) and I(f) are close, given f bounded on I. Givena partition P0 of I, set

(0.11) minsize(P0) = min`(Jk) : Jk ∈ P0.

7

Proposition 0.3. Let P0 and P be two partitions of I. Assume

(0.12) maxsize(P) ≤ 1k

minsize(P0).

Let |f | ≤ M on I. Then

(0.13)IP(f) ≤ IP0(f) +

2M

k`(I),

IP(f) ≥ IP0(f)− 2M

k`(I).

Proof. Let P1 denote the minimal common refinement of P and P0. Consider on the onehand those intervale in P that are contained in intervals in P0 and on the other hand thoseintervals in P that are not contained in intervals in P0 (whose lengths sum to ≤ `(I)/k).We obtain

IP(f) ≤ IP1(f) +2M

k`(I),

IP(f) ≥ IP1(f)− 2M

k`(I).

Since also IP1(f) ≤ IP0(f) and IP1(f) ≥ IP0

(f), we obtain (0.13).

The following corollary is sometimes called Darboux’s Theorem.

Corollary 0.4. Let Pν be a sequence of partitions of I into ν intervals Jνk, 1 ≤ k ≤ ν,such that

maxsize(Pν) −→ 0.

If f : I → R is bounded, then

(0.14) IPν (f) → I(f) and IPν(f) → I(f).

Consequently,

(0.15) f ∈ R(I) ⇐⇒ I(f) = limν→∞

ν∑

k=1

f(ξνk)`(Jνk),

for arbitrary ξνk ∈ Jνk, in which case the limit is∫

If dx.

The sums on the right side of (0.15) are called Riemann sums, approximating∫

If dx

(when f is Riemann integrable).

Remark. A second proof of Proposition 0.1 can readily be deduced from Corollary 0.4.

One should be warned that, once such a specific choice of Pν and ξνk has been made,the limit on the right side of (0.15) might exist for a bounded function f that is not

8

Riemann integrable. This and other phenomena are illustrated by the following exampleof a function which is not Riemann integrable. For x ∈ I, set

(0.16) ϑ(x) = 1 if x ∈ Q, ϑ(x) = 0 if x /∈ Q,

where Q is the set of rational numbers. Now every interval J ⊂ I of positive length containspoints in Q and points not in Q, so for any partition P of I we have IP(ϑ) = `(I) andIP(ϑ) = 0, hence

(0.17) I(ϑ) = `(I), I(ϑ) = 0.

Note that, if Pν is a partition of I into ν equal subintervals, then we could pick each ξνk

to be rational, in which case the limit on the right side of (10.15) would be `(I), or wecould pick each ξνk to be irrational, in which case this limit would be zero. Alternatively,we could pick half of them to be rational and half to be irrational, and the limit would be12`(I).

Associated to the Riemann integral is a notion of size of a set S, called content. If S isa subset of I, define the “characteristic function”

(0.18) χS(x) = 1 if x ∈ S, 0 if x /∈ S.

We define “upper content” cont+ and “lower content” cont− by

(0.19) cont+(S) = I(χS), cont−(S) = I(χS).

We say S “has content,” or “is contented” if these quantities are equal, which happens ifand only if χS ∈ R(I), in which case the common value of cont+(S) and cont−(S) is

(0.20) m(S) =∫

I

χS(x) dx.

It is easy to see that

(0.21) cont+(S) = inf N∑

k=1

`(Jk) : S ⊂ J1 ∪ · · · ∪ JN

,

where Jk are intervals. Here, we require S to be in the union of a finite collection ofintervals.

See the appendix at the end of this section for a generalization of Proposition 0.2, givinga sufficient condition for a bounded function to be Riemann integrable on I, in terms ofthe upper content of its set of discontinuities.

There is a more sophisticated notion of the size of a subset of I, called Lebesgue measure.The key to the construction of Lebesgue measure is to cover a set S by a countable (eitherfinite or infinite) set of intervals. The outer measure of S ⊂ I is defined by

(0.22) m∗(S) = inf∑

k≥1

`(Jk) : S ⊂⋃

k≥1

Jk

.

9

Here Jk is a finite or countably infinite collection of intervals. Clearly

(0.23) m∗(S) ≤ cont+(S).

Note that, if S = I ∩Q, then χS = ϑ, defined by (0.16). In this case it is easy to see thatcont+(S) = `(I), but m∗(S) = 0. Zero is the “right” measure of this set. More materialon the development of measure theory can be found in a number of books, including [Fol]and [T2].

It is useful to note that∫

If dx is additive in I, in the following sense.

Proposition 0.5. If a < b < c, f : [a, c] → R, f1 = f∣∣[a,b]

, f2 = f∣∣[b,c]

, then

(0.24) f ∈ R([a, c]

) ⇐⇒ f1 ∈ R([a, b]

)and f2 ∈ R

([b, c]

),

and, if this holds,

(0.25)∫ c

a

f dx =∫ b

a

f1 dx +∫ c

b

f2 dx.

Proof. Since any partition of [a, c] has a refinement for which b is an endpoint, we may aswell consider a partition P = P1∪P2, where P1 is a partition of [a, b] and P2 is a partitionof [b, c]. Then

(0.26) IP(f) = IP1(f1) + IP2(f2), IP(f) = IP1(f1) + IP2

(f2),

so

(0.27) IP(f)− IP(f) =IP1(f1)− IP1

(f1)

+IP2(f2)− IP2

(f2).

Since both terms in braces in (0.27) are ≥ 0, we have equivalence in (0.24). Then (0.25)follows from (0.26) upon taking sufficiently fine partitions.

Let I = [a, b]. If f ∈ R(I), then f ∈ R([a, x]) for all x ∈ [a, b], and we can consider thefunction

(0.28) g(x) =∫ x

a

f(t) dt.

If a ≤ x0 ≤ x1 ≤ b, then

(0.29) g(x1)− g(x0) =∫ x1

x0

f(t) dt,

so, if |f | ≤ M,

(0.30) |g(x1)− g(x0)| ≤ M |x1 − x0|.In other words, if f ∈ R(I), then g is Lipschitz continuous on I.

A function g : (a, b) → R is said to be differentiable at x ∈ (a, b) provided there existsthe limit

(0.31) limh→0

1h

[g(x + h)− g(x)

]= g′(x).

When such a limit exists, g′(x), also denoted dg/dx, is called the derivative of g at x.Clearly g is continuous wherever it is differentiable.

The next result is part of the Fundamental Theorem of Calculus.

10

Theorem 0.6. If f ∈ C([a, b]), then the function g, defined by (0.28), is differentiable ateach point x ∈ (a, b), and

(0.32) g′(x) = f(x).

Proof. Parallel to (0.29), we have, for h > 0,

(0.33)1h

[g(x + h)− g(x)

]=

1h

∫ x+h

x

f(t) dt.

If f is continuous at x, then, for any ε > 0, there exists δ > 0 such that |f(t)− f(x)| ≤ εwhenever |t− x| ≤ δ. Thus the right side of (0.33) is within ε of f(x) whenever h ∈ (0, δ].Thus the desired limit exists as h 0. A similar argument treats h 0.

The next result is the rest of the Fundamental Theorem of Calculus.

Theorem 0.7. If G is differentiable and G′(x) is continuous on [a, b], then

(0.34)∫ b

a

G′(t) dt = G(b)−G(a).

Proof. Consider the function

(0.35) g(x) =∫ x

a

G′(t) dt.

We have g ∈ C([a, b]), g(a) = 0, and, by Theorem 0.6,

g′(x) = G′(x), ∀ x ∈ (a, b).

Thus f(x) = g(x)−G(x) is continuous on [a, b], and

(0.36) f ′(x) = 0, ∀ x ∈ (a, b).

We claim that (0.36) implies f is constant on [a, b]. Granted this, since f(a) = g(a)−G(a) =−G(a), we have f(x) = −G(a) for all x ∈ [a, b], so the integral (0.35) is equal to G(x)−G(a)for all x ∈ [a, b]. Taking x = b yields (0.34).

The fact that (0.36) implies f is constant on [a, b] is a consequence of the followingresult, known as the Mean Value Theorem.

Theorem 0.8. Let f : [a, β] → R be continuous, and assume f is differentiable on (a, β).Then ∃ ξ ∈ (a, β) such that

(0.37) f ′(ξ) =f(β)− f(a)

β − a.

11

Proof. Replacing f(x) by f(x) = f(x)−κ(x−a), where κ is the right side of (0.37), we canassume without loss of generality that f(a) = f(β). Then we claim that f ′(ξ) = 0 for someξ ∈ (a, β). Indeed, since [a, β] is compact, f must assume a maximum and a minumum on[a, β]. If f(a) = f(β), one of these must be assumed at an interior point, ξ, at which f ′

clearly vanishes.

Now, to see that (0.36) implies f is constant on [a, b], if not, ∃ β ∈ (a, b] such thatf(β) 6= f(a). Then just apply Theorem 0.8 to f on [a, β]. This completes the proof ofTheorem 0.7.

If a function G is differentiable on (a, b), and G′ is continuous on (a, b), we say G isa C1 function, and write G ∈ C1

((a, b)

). Inductively, we say G ∈ Ck

((a, b)

)provided

G′ ∈ Ck−1((a, b)

).

An easy consequence of the definition (0.31) of the derivative is that, for any realconstants a, b, and c,

f(x) = ax2 + bx + c =⇒ df

dx= 2ax + b.

Now, it is a simple enough step to replace a, b, c by y, z, w, in these formulas. Having donethat, we can regard y, z, and w as variables, along with x :

F (x, y, z, w) = yx2 + zx + w.

We can then hold y, z and w fixed (e.g., set y = a, z = b, w = c), and then differentiatewith respect to x; we get

∂F

∂x= 2yx + z,

the partial derivative of F with respect to x. Generally, if F is a function of n variables,x1, . . . , xn, we set

∂F

∂xj(x1, . . . , xn)

= limh→0

1h

[F (x1, . . . , xj−1, xj + h, xj+1, . . . , xn)− F (x1, . . . , xj−1, xj , xj+1, . . . , xn)

],

where the limit exists. Section one carries on with a further investigation of the derivativeof a function of several variables.

Complementary results on Riemann integrability

Here we provide a condition, more general then Proposition 0.2, which guarantees Rie-mann integrability.

12

Proposition 0.9. Let f : I → R be a bounded function, with I = [a, b]. Suppose that theset S of points of discontinuity of f has the property

(0.38) cont+(S) = 0.

Then f ∈ R(I).

Proof. Say |f(x)| ≤ M . Take ε > 0. As in (0.21), take intervals J1, . . . , JN such thatS ⊂ J1 ∪ · · · ∪ JN and

∑Nk=1 `(Jk) < ε. In fact, fatten each Jk such that S is contained

in the interior of this collection of intervals. Consider a partition P0 of I, whose intervalsinclude J1, . . . , JN , amongst others, which we label I1, . . . , IK . Now f is continuous oneach interval Iν , so, subdividing each Iν as necessary, hence refining P0 to a partitionP1, we arrange that sup f − inf f < ε on each such subdivided interval. Denote thesesubdivided intervals I ′1, . . . , I

′L. It readily follows that

0 ≤ IP1(f)− IP1(f) <

N∑

k=1

2M`(Jk) +L∑

k=1

ε`(I ′k)

< 2εM + ε`(I).

Since ε can be taken arbitrarily small, this establishes that f ∈ R(I).

Remark. An even better result is that such f is Riemann integrable provided m∗(S) = 0,where m∗(S) is defined by (0.22). Standard books on measure theory, including [Fol] and[T2], establish this.

We mention an alternative characterization of I(f) and I(f), which can be useful. GivenI = [a, b], we say g : I → R is piecewise constant on I (and write g ∈ PK(I)) provided thereexists a partition P = Jk of I such that g is constant on the interior of each interval Jk.Clearly PK(I) ⊂ R(I). It is easy to see that, if f : I → R is bounded,

(0.39)

I(f) = inf∫

I

f1 dx : f1 ∈ PK(I), f1 ≥ f

,

I(f) = sup∫

I

f0 dx : f0 ∈ PK(I), f0 ≤ f

.

Hence, given f : I → R bounded,

(0.40)

f ∈ R(I) ⇔ for each ε > 0, ∃f0, f1 ∈ PK(I) such that

f0 ≤ f ≤ f1 and∫

I

(f1 − f0) dx < ε.

This can be used to prove

(0.41) f, g ∈ R(I) =⇒ fg ∈ R(I),

13

via the fact that

(0.42) fj , gj ∈ PK(I) =⇒ fjgj ∈ PK(I).

Exercises

1. Let c > 0 and let f : [ac, bc] → R be Riemann integrable. Working directly with thedefinition of integral, show that

(0.43)∫ b

a

f(cx) dx =1c

∫ bc

ac

f(x) dx.

More generally, show that

(0.44)∫ b−d/c

a−d/c

f(cx + d) dx =1c

∫ bc

ac

f(x) dx.

2. Let f : I×S → R be continuous, where I = [a, b] and S ⊂ Rn. Take ϕ(y) =∫

If(x, y) dx.

Show that ϕ is continuous on S.Hint. If fj : I → R are continuous and |f1(x)− f2(x)| ≤ δ on I, then

(0.45)∣∣∣∫

I

f1 dx−∫

I

f2 dx∣∣∣ ≤ `(I)δ.

3. With f as in Exercise 2, suppose gj : S → R are continuous and a ≤ g0(y) < g1(y) ≤ b.

Take ϕ(y) =∫ g1(y)

g0(y)f(x, y) dx. Show that ϕ is continuous on S.

Hint. Make a change of variables, linear in x, to reduce this to Exercise 2.

4. Suppose f : (a, b) → (c, d) and g : (c, d) → R are differentiable. Show that h(x) =g(f(x)

)is differentiable and

(0.46) h′(x) = g′(f(x)

)f ′(x).

This is the chain rule.Hint. Peek at the proof of the chain rule in §1.

5. If f1 and f2 are differentiable on (a, b), show that f1(x)f2(x) is differentiable and

(0.47)d

dx

(f1(x)f2(x)

)= f ′1(x)f2(x) + f1(x)f ′2(x).

14

If f2(x) 6= 0, ∀ x ∈ (a, b), show that f1(x)/f2(x) is differentiable and

(0.48)d

dx

(f1(x)f2(x)

)=

f ′1(x)f2(x)− f1(x)f ′2(x)f2(x)2

.

6. If f : (a, b) → R is differentiable, show that

(0.49)d

dxf(x)n = nf(x)n−1 f ′(x),

for all positive integers n. If f(x) 6= 0 ∀ x ∈ (a, b), show that this holds for all n ∈ Z.Hint. To get (0.49) for positive n, use induction and apply (0.47).

7. Let ϕ : [a, b] → [A,B] be C1 on a neighborhood J of [a, b], with ϕ′(x) > 0 for allx ∈ [a, b]. Assume ϕ(a) = A, ϕ(b) = B. Show that the identity

(0.50)∫ B

A

f(y) dy =∫ b

a

f(ϕ(t)

)ϕ′(t) dt,

for any f ∈ C(J), follows from the chain rule and the Fundamental Theorem of Calculus.Hint. Replace b by x, B by ϕ(x), and differentiate.Note that this result contains that of Exercise 1.(Optional) Try to establish (0.50) directly by working with the definition of the integralas a limit of Riemann sums.

8. Show that, if f and g are C1 on a neighborhood of [a, b], then

(0.51)∫ b

a

f(s)g′(s) ds = −∫ b

a

f ′(s)g(s) ds +[f(b)g(b)− f(a)g(a)

].

This transformation of integrals is called “integration by parts.”

9. Let f : (−a, a) → R be a Cj+1 function. Show that, for x ∈ (−a, a),

(0.52) f(x) = f(0) + f ′(0)x +f ′′(0)

2x2 + · · ·+ f (j)(0)

j!xj + Rj(x),

where

(0.53) Rj(x) =∫ x

0

(x− s)j

j!f (j+1)(s) ds

This is Taylor’s formula with remainder.Hint. Use induction. If (0.52)–(0.53) holds for 0 ≤ j ≤ k, show that it holds for j = k + 1,by showing that

(0.54)∫ x

0

(x− s)k

k!f (k+1)(s) ds =

f (k+1)(0)(k + 1)!

xk+1 +∫ x

0

(x− s)k+1

(k + 1)!f (k+2)(s) ds.

15

To establish this, use the integration by parts formula (0.51), with f(s) replaced byf (k+1)(s), and with appropriate g(s). See Exercise 7 of §1 for another approach. Notethat another presentation of (0.53) is

(0.55) Rj(x) =xj+1

(j + 1)!

∫ 1

0

f (j+1)((

1− t1/(j+1))x)

dt.

10. Assume f : (−a, a) → R is a Cj function. Show that, for x ∈ (−a, a), (0.52) holds,with

(0.56) Rj(x) =1

(j − 1)!

∫ x

0

(x− s)j−1[f (j)(s)− f (j)(0)

]ds.

Hint. Apply (0.53) with j replaced by j−1. Add and subtract f (j)(0) to the factor f (j)(s)in the resulting integrand. See Appendix D for further discussion.

11. Given I = [a, b], show that

(0.57) f, g ∈ R(I) =⇒ fg ∈ R(I),

as advertised in (0.41).

12. Assume fk ∈ R(I) and fk → f uniformly on I. Prove that f ∈ R(I) and

(0.58)∫

I

fk dx −→∫

I

f dx.

13. Given I = [a, b], Iε = [a + ε, b− ε], assume fk ∈ R(I), |fk| ≤ M on I for all k, and

(0.59) fk −→ f uniformly on Iε,

for all ε ∈ (0, (b− a)/2). Prove that f ∈ R(I) and (0.58) holds.

14. We say f ∈ R(R) provided f |[k,k+1] ∈ R([k, k + 1]) for each k ∈ Z, and

(0.60)∞∑

k=−∞

∫ k+1

k

|f(x)| dx < ∞.

If f ∈ R(R), we set

(0.61)∫ ∞

−∞f(x) dx = lim

k→∞

∫ k

−k

f(x) dx.

Formulate and demonstrate basic properties of the integral over R of elements of R(R).

16

1. The derivative

Let O be an open subset of Rn, and F : O → Rm a continuous function. We say F isdifferentiable at a point x ∈ O, with derivative L, if L : Rn → Rm is a linear transformationsuch that, for y ∈ Rn, small,

(1.1) F (x + y) = F (x) + Ly + R(x, y)

with

(1.2)‖R(x, y)‖‖y‖ → 0 as y → 0.

We denote the derivative at x by DF (x) = L. With respect to the standard bases of Rn

and Rm, DF (x) is simply the matrix of partial derivatives,

(1.3) DF (x) =(

∂Fj

∂xk

)=

∂F1/∂x1 · · · ∂F1/∂xn

......

∂Fm/∂x1 · · · ∂Fm/∂xn

,

so that, if v = (v1, . . . , vn)t, (regarded as a column vector) then

(1.4) DF (x)v =

∑k

(∂F1/∂xk)vk

...∑k

(∂Fm/∂xk)vk

.

It will be shown below that F is differentiable whenever all the partial derivatives existand are continuous on O. In such a case we say F is a C1 function on O. More generally,F is said to be Ck if all its partial derivatives of order ≤ k exist and are continuous. If Fis Ck for all k, we say F is C∞.

In (1.2), we can use the Euclidean norm on Rn and Rm. This norm is defined by

(1.5) ‖x‖ =(x2

1 + · · ·+ x2n

)1/2

for x = (x1, . . . , xn) ∈ Rn. Any other norm would do equally well.An application of the Fundamental Theorem of Calculus, to functions of each variable

xj separately, yields the following. If we assume F : O → Rm is differentiable in eachvariable separately, and that each ∂F/∂xj is continuous on O, then

(1.6)

F (x + y) = F (x) +n∑

j=1

[F (x + zj)− F (x + zj−1)

]= F (x) +

n∑

j=1

Aj(x, y)yj ,

Aj(x, y) =∫ 1

0

∂F

∂xj

(x + zj−1 + tyjej

)dt,

17

where z0 = 0, zj = (y1, . . . , yj , 0, . . . , 0), and ej is the standard basis of Rn. Consequently,

(1.7)

F (x + y) = F (x) +n∑

j=1

∂F

∂xj(x) yj + R(x, y),

R(x, y) =n∑

j=1

yj

∫ 1

0

∂F

∂xj(x + zj−1 + tyjej)− ∂F

∂xj(x)

dt.

Now (1.7) implies F is differentiable on O, as we stated below (1.4). Thus we have estab-lished the following.

Proposition 1.1. If O is an open subset of Rn and F : O → Rm is of class C1, then Fis differentiable at each point x ∈ O.

As is shown in many calculus texts, one can use the Mean Value Theorem instead ofthe Fundamental Theorem of Calculus, and obtain a slightly sharper result.

Let us give some examples of derivatives. First, take n = 2, m = 1, and set

(1.8) F (x) = (sin x1)(sin x2).

Then

(1.9) DF (x) = ((cos x1)(sin x2), (sinx1)(cos x2)).

Next, take n = m = 2 and

(1.10) F (x) =(

x1x2

x21 − x2

2

).

Then

(1.11) DF (x) =(

x2 x1

2x1 −2x2

).

We can replace Rn and Rm by more general finite-dimensional real vector spaces, iso-morphic to Euclidean space. For example, the space M(n,R) of real n × n matrices isisomorphic to Rn2

. Consider the function

(1.12) S : M(n,R) −→ M(n,R), S(X) = X2.

We have

(1.13)(X + Y )2 = X2 + XY + Y X + Y 2

= X2 + DS(X)Y + R(X,Y ),

with R(X, Y ) = Y 2, and hance

(1.14) DS(X)Y = XY + Y X.

18

For our next example, we take

(1.15) O = G`(n,R) = X ∈ M(n,R) : det X 6= 0,which, as shown below, is open in M(n,R). We consider

(1.16) Φ : G`(n,R) −→ M(n,R), Φ(X) = X−1,

and compute DΦ(I). We use the following. If, for A ∈ M(n,R),

(1.17) ‖A‖ = sup‖Av‖ : v ∈ Rn, ‖v‖ ≤ 1,then

(1.18) A, B ∈ M(n,R) ⇒ ‖AB‖ ≤ ‖A‖ · ‖B‖, hence Y ∈ M(n,R) ⇒ ‖Y k‖ ≤ ‖Y ‖k.

Also

(1.19)

Sk = I − Y + Y 2 − · · ·+ (−1)kY k

⇒ Y Sk = SkY = Y − Y 2 + Y 3 − · · ·+ (−1)kY k+1

⇒ (I + Y )Sk = Sk(I + Y ) = I + (−1)kY k+1,

hence

(1.20) ‖Y ‖ < 1 =⇒ (I + Y )−1 =∞∑

k=0

(−1)kY k = I − Y + Y 2 − · · · ,

so

(1.21) DΦ(I)Y = −Y.

Related calculations show that G`(n,R) is open in M(n,R). In fact, given X ∈ G`(n,R), Y ∈M(n,R),

(1.22) X + Y = X(I + X−1Y ),

which by (1.20) is invertible as long as

(1.23) ‖X−1Y ‖ < 1.

One can proceed from here to compute DΦ(X). See the exercises.We return to general considerations, and derive the chain rule for the derivative. Let

F : O → Rm be differentiable at x ∈ O, as above, let U be a neighborhood of z = F (x) inRm, and let G : U → Rk be differentiable at z. Consider H = G F. We have

(1.24)

H(x + y) = G(F (x + y))

= G(F (x) + DF (x)y + R(x, y)

)

= G(z) + DG(z)(DF (x)y + R(x, y)

)+ R1(x, y)

= G(z) + DG(z)DF (x)y + R2(x, y)

19

with‖R2(x, y)‖

‖y‖ → 0 as y → 0.

Thus G F is differentiable at x, and

(1.25) D(G F )(x) = DG(F (x)) ·DF (x).

Another useful remark is that, by the Fundamental Theorem of Calculus, applied toϕ(t) = F (x + ty),

(1.26) F (x + y) = F (x) +∫ 1

0

DF (x + ty)y dt,

provided F is C1. For a typical application, see (6.6).For the study of higher order derivatives of a function, the following result is fundamen-

tal.

Proposition 1.2. Assume F : O → Rm is of class C2, with O open in Rn. Then, foreach x ∈ O, 1 ≤ j, k ≤ n,

(1.27)∂

∂xj

∂F

∂xk(x) =

∂

∂xk

∂F

∂xj(x).

Proof. It suffices to take m = 1. We label our function f : O → R. For 1 ≤ j ≤ n, we set

(1.28) ∆j,hf(x) =1h

(f(x + hej)− f(x)

),

where e1, . . . , en is the standard basis of Rn. The mean value theorem (for functions ofxj alone) implies that if ∂jf = ∂f/∂xj exists on O, then, for x ∈ O, h > 0 sufficientlysmall,

(1.29) ∆j,hf(x) = ∂jf(x + αjhej),

for some αj ∈ (0, 1), depending on x and h. Iterating this, if ∂j(∂kf) exists on O, then,for x ∈ O, h > 0 sufficiently small,

(1.30)

∆k,h∆j,hf(x) = ∂k(∆j,hf)(x + αkhek)

= ∆j,h(∂kf)(x + αkhek)

= ∂j∂kf(x + αkhek + αjhej),

with αj , αk ∈ (0, 1). Here we have used the elementary result

(1.31) ∂k∆j,hf = ∆j,h(∂kf).

We deduce the following.

20

Proposition 1.3. If ∂kf and ∂j∂kf exist on O and ∂j∂kf is continuous at x0 ∈ O, then

(1.32) ∂j∂kf(x0) = limh→0

∆k,h∆j,hf(x0).

Clearly

(1.33) ∆k,h∆j,hf = ∆j,h∆k,hf,

so we have the following, which easily implies Proposition 1.2.

Corollary 1.4. In the setting of Proposition 1.3, if also ∂jf and ∂k∂jf exist on O and∂k∂jf is continuous at x0, then

(1.34) ∂j∂kf(x0) = ∂k∂jf(x0).

We now describe two convenient notations to express higher order derivatives of a Ck

function f : Ω → R, where Ω ⊂ Rn is open. In one, let J be a k-tuple of integers between1 and n; J = (j1, . . . , jk). We set

(1.35) f (J)(x) = ∂jk· · · ∂j1f(x), ∂j =

∂

∂xj.

We set |J | = k, the total order of differentiation. As we have seen in Proposition 1.2,∂i∂jf = ∂j∂if provided f ∈ C2(Ω). It follows that, if f ∈ Ck(Ω), then ∂jk

· · · ∂j1f =∂`k

· · · ∂`1f whenever `1, . . . , `k is a permutation of j1, . . . , jk. Thus, another conve-nient notation to use is the following. Let α be an n-tuple of non-negative integers,α = (α1, . . . , αn). Then we set

(1.36) f (α)(x) = ∂α11 · · · ∂αn

n f(x), |α| = α1 + · · ·+ αn.

Note that, if |J | = |α| = k and f ∈ Ck(Ω),

(1.37) f (J)(x) = f (α)(x), with αi = #` : j` = i.

Correspondingly, there are two expressions for monomials in x = (x1, . . . , xn):

(1.38) xJ = xj1 · · ·xjk, xα = xα1

1 · · ·xαnn ,

and xJ = xα provided J and α are related as in (1.37). Both these notations are called“multi-index” notations.

We now derive Taylor’s formula with remainder for a smooth function F : Ω → R,making use of these multi-index notations. We will apply the one variable formula (0.52)–(0.53), i.e.,

(1.39) ϕ(t) = ϕ(0) + ϕ′(0)t +12ϕ′′(0)t2 + · · ·+ 1

k!ϕ(k)(0)tk + rk(t),

21

with

(1.40) rk(t) =1k!

∫ t

0

(t− s)kϕ(k+1)(s) ds,

given ϕ ∈ Ck+1(I), I = (−a, a). (See the exercises, and also Appendix D for furtherdiscussion.) Let us assume 0 ∈ Ω, and that the line segment from 0 to x is contained inΩ. We set ϕ(t) = F (tx), and apply (1.39)–(1.40) with t = 1. Applying the chain rule, wehave

(1.41) ϕ′(t) =n∑

j=1

∂jF (tx)xj =∑

|J|=1

F (J)(tx)xJ .

Differentiating again, we have

(1.42) ϕ′′(t) =∑

|J|=1,|K|=1

F (J+K)(tx)xJ+K =∑

|J|=2

F (J)(tx)xJ ,

where, if |J | = k, |K| = `, we take J + K = (j1, . . . , jk, k1, . . . , k`). Inductively, we have

(1.43) ϕ(k)(t) =∑

|J|=k

F (J)(tx)xJ .

Hence, from (1.39) with t = 1,

F (x) = F (0) +∑

|J|=1

F (J)(0)xJ + · · ·+ 1k!

∑

|J|=k

F (J)(0)xJ + Rk(x),

or, more briefly,

(1.44) F (x) =∑

|J|≤k

1|J |!F

(J)(0)xJ + Rk(x),

where

(1.45) Rk(x) =1k!

∑

|J|=k+1

(∫ 1

0

(1− s)kF (J)(sx) ds)xJ .

This gives Taylor’s formula with remainder for F ∈ Ck+1(Ω), in the J-multi-index notation.We also want to write the formula in the α-multi-index notation. We have

(1.46)∑

|J|=k

F (J)(tx)xJ =∑

|α|=k

ν(α)F (α)(tx)xα,

where

(1.47) ν(α) = #J : α = α(J),

22

and we define the relation α = α(J) to hold provided the condition (1.37) holds, orequivalently provided xJ = xα. Thus ν(α) is uniquely defined by

(1.48)∑

|α|=k

ν(α)xα =∑

|J|=k

xJ = (x1 + · · ·+ xn)k.

One sees that, if |α| = k, then ν(α) is equal to the product of the number of combinationsof k objects, taken α1 at a time, times the number of combinations of k−α1 objects, takenα2 at a time, · · · times the number of combinations of k− (α1 + · · ·+αn−1) objects, takenαn at a time. Thus

(1.49) ν(α) =(

k

α1

)(k − α1

α2

)· · ·

(k − α1 − · · · − αn−1

αn

)=

k!α1!α2! · · ·αn!

In other words, for |α| = k,

(1.50) ν(α) =k!α!

, where α! = α1! · · ·αn!

Thus the Taylor formula (1.44) can be rewritten

(1.51) F (x) =∑

|α|≤k

1α!

F (α)(0)xα + Rk(x),

where

(1.52) Rk(x) =∑

|α|=k+1

k + 1α!

(∫ 1

0

(1− s)kF (α)(sx) ds)xα.

The formula (1.51)–(1.52) holds for F ∈ Ck+1. It is significant that (1.51), with avariant of (1.52), holds for F ∈ Ck. In fact, for such F , we can apply (1.52) with kreplaced by k − 1, to get

(1.53) F (x) =∑

|α|≤k−1

1α!

F (α)(0)xα + Rk−1(x),

with

(1.54) Rk−1(x) =∑

|α|=k

k

α!

(∫ 1

0

(1− s)k−1F (α)(sx) ds)xα.

We can add and subtract F (α)(0) to F (α)(sx) in the integrand above, and obtain thefollowing.

23

Proposition 1.5. If F ∈ Ck on a ball Br(0), the formula (1.51) holds for x ∈ Br(0),with

(1.55) Rk(x) =∑

|α|=k

k

α!

(∫ 1

0

(1− s)k−1[F (α)(sx)− F (α)(0)

]ds

)xα.

Remark. Note that (1.55) yields the estimate

(1.56) |Rk(x)| ≤∑

|α|=k

|xα|α!

sup0≤s≤1

|F (α)(sx)− F (α)(0)|.

The term corresponding to |J | = 2 in (1.44), or |α| = 2 in (1.51), is of particular interest.It is

(1.57)12

∑

|J|=2

F (J)(0)xJ =12

n∑

j,k=1

∂2F

∂xk∂xj(0) xjxk.

We define the Hessian of a C2 function F : O → R as an n× n matrix:

(1.58) D2F (y) =(

∂2F

∂xk∂xj(y)

).

Then the power series expansion of second order about 0 for F takes the form

(1.59) F (x) = F (0) + DF (0)x +12x ·D2F (0)x + R2(x),

where, by (1.56),

(1.60) |R2(x)| ≤ Cn|x|2 sup0≤s≤1,|α|=2

|F (α)(sx)− F (α)(0)|.

In all these formulas we can translate coordinates and expand about y ∈ O. For example,(1.59) extends to

(1.61) F (x) = F (y) + DF (y)(x− y) +12(x− y) ·D2F (y)(x− y) + R2(x, y),

with

(1.62) |R2(x, y)| ≤ Cn|x− y|2 sup0≤s≤1,|α|=2

|F (α)(y + s(x− y))− F (α)(y)|.

Example. If we take F (x) as in (1.8), so DF (x) is as in (1.9), then

D2F (x) =(− sin x1 sin x2 cosx1 cos x2

cos x1 cosx2 − sin x1 sinx2

).

24

The results (1.61)–(1.62) are useful for extremal problems, i.e., determining where asufficiently smooth function F : O → R has local maxima and local minima. Clearly ifF ∈ C1(O) and F has a local maximum or minimum at x0 ∈ O, then DF (x0) = 0. Insuch a case, we say x0 is a critical point of F . To check what kind of critical point x0 is,we look at the n × n matrix A = D2F (x0), assuming F ∈ C2(O). By Proposition 1.2, Ais a symmetric matrix. A basic result in linear algebra is that if A is a real, symmetricn × n matrix, then Rn has an orthonormal basis of eigenvectors, v1, . . . , vn, satisfyingAvj = λjvj ; the real numbers λj are the eigenvalues of A. We say A is positive definite ifall λj > 0, and we say A is negative definite if all λj < 0. We say A is strongly indefiniteif some λj > 0 and another λk < 0. Equivalently, given a real, symmetric matrix A,

(1.63)A positive definite ⇐⇒ v ·Av ≥ C|v|2,A negative definite ⇐⇒ v ·Av ≤ −C|v|2,

for some C > 0, all v ∈ Rn, and

(1.64)A strongly indefinite ⇐⇒ ∃ v, w ∈ Rn, nonzero, such that

v ·Av ≥ C|v|2, w ·Aw ≤ −C|w|2,for some C > 0. In light of (1.44)–(1.45), we have the following result.

Proposition 1.6. Assume F ∈ C2(O) is real valued, O open in Rn. Let x0 ∈ O be acritical point for F . Then(i) D2F (x0) positive definite ⇒ F has a local minimum at x0,(ii) D2F (x0) negative definite ⇒ F has a local maximum at x0,(iii) D2F (x0) strongly indefinite ⇒ F has neither a local maximum nor a local minimumat x0.

In case (iii), we say x0 is a saddle point for F .The following is a test for positive definiteness.

Proposition 1.7. Let A = (aij) be a real, symmetric, n× n matrix. For 1 ≤ ` ≤ n, formthe `× ` matrix A` = (aij)1≤i,j≤`. Then

(1.65) A positive definite ⇐⇒ det A` > 0, ∀ ` ∈ 1, . . . , n.Regarding the implication ⇒, note that if A is positive definite, then det A = det An

is the product of its eigenvalues, all > 0, hence is > 0. Also in this case, it follows from(1.65) that each A` must be positive definite, hence have positive determinant, so we have⇒.

The implication ⇐ is easy enough for 2× 2 matrices. If A is symmetric and det A > 0,then either both its eigenvalues are positive (so A is positive definite) or both are negative(so A is negative definite). In the latter case, A1 = (a11) must be negative, so we have ⇐in this case.

We prove ⇐ for n ≥ 3, using induction. The inductive hypothesis implies that ifdetA` > 0 for each ` ≤ n, then An−1 is positive definite. The next lemma then guaranteesthat A = An has at least n−1 positive eigenvalues. The hypothesis that det A > 0 does notallow that the remaining eigenvalue be ≤ 0, so all the eigenvalues of A must be positive.Thus Proposition 1.7 is proven, once we have the following.

25

Lemma 1.8. In the setting of Proposition 1.7, if An−1 is positive definite, then A = An

has at least n− 1 positive eigenvalues.

Proof. Since A is symmetric, Rn has an orthonormal basis v1, . . . , vn of eigenvectors of A;Avj = λjvj . If the conclusion of the lemma is false, at least two of the eigenvalues, sayλ1, λ2, are ≤ 0. Let W = Span(v1, v2), so

w ∈ W =⇒ w ·Aw ≤ 0.

Since W has dimension 2, Rn−1 ⊂ Rn satisfies Rn−1 ∩W 6= 0, so there exists a nonzerow ∈ Rn−1 ∩W , and then

w ·An−1w = w ·Aw ≤ 0,

contradicting the hypothesis that An−1 is positive definite.

Remark. Given (1.65), we see by taking A 7→ −A that if A is a real, symmetric n × nmatrix,

(1.66) A negative definite ⇐⇒ (−1)` detA` > 0, ∀ ` ∈ 1, . . . , n.

We return to higher order power series formulas with remainder and complement Propo-sition 1.5. Let us go back to (1.39)–(1.40) and note that the integral in (1.40) is 1/(k + 1)times a weighted average of ϕ(k+1)(s) over s ∈ [0, t]. Hence we can write

rk(t) =1

(k + 1)!ϕ(k+1)(θt), for some θ ∈ [0, 1],

if ϕ is of class Ck+1. This is the Lagrange form of the remainder; see Appendix D for moreon this, and for a comparison with the Cauchy form of the remainder. If ϕ is of class Ck,we can replace k + 1 by k in (1.39) and write

(1.67) ϕ(t) = ϕ(0) + ϕ′(0)t + · · ·+ 1(k − 1)!

ϕ(k−1)(0)tk−1 +1k!

ϕ(k)(θt)tk,

for some θ ∈ [0, 1]. Pluging (1.67) into (1.43) for ϕ(t) = F (tx) gives

(1.68) F (x) =∑

|J|≤k−1

1|J |!F

(J)(0)xJ +1k!

∑

|J|=k

F (J)(θx)xJ ,

for some θ ∈ [0, 1] (depending on x and on k, but not on J), when F is of class Ck on aneighborhood Br(0) of 0 ∈ Rn. Similarly, using the α-multi-index notation, we have, asan alternative to (1.53)–(1.54),

(1.69) F (x) =∑

|α|≤k−1

1α!

F (α)(0)xα +∑

|α|=k

1α!

F (α)(θx)xα,

26

for some θ ∈ [0, 1] (depending on x and on |α|, but not on α), if F ∈ Ck(Br(0)). Note alsothat

(1.70)

12

∑

|J|=2

F (J)(θx)xJ =12

n∑

j,k=1

∂2F

∂xk∂xj(θx)xjxk

=12

x ·D2F (θx)x,

with D2F (y) as in (1.58), so if F ∈ C2(Br(0)), we have, as an alternative to (1.59),

(1.71) F (x) = F (0) + DF (0)x +12x ·D2F (θx)x,

for some θ ∈ [0, 1].We next complemant the multi-index notations for higher derivatives of a function F

by a multi-linear notation, defined as follows. If k ∈ N, F ∈ Ck(U), and y ∈ U ⊂ Rn, set

(1.72) DkF (y)(u1, . . . , uk) = ∂t1 · · · ∂tkF (y + t1u1 + · · ·+ tkuk)

∣∣∣t1=···=tk=0

,

for u1, . . . , uk ∈ Rn. For k = 1, this formula is equivalent to the definition of DF given atthe beginning of this section. For k = 2, we have

(1.73) D2F (y)(u, v) = u ·D2F (y)v,

with D2F (y) as in (1.58). Generally, (1.72) defines DkF (y) as a symmetric, k-linear formin u1, . . . , uk ∈ Rn.

We can relate (1.72) to the J-multi-index notation as follows. We start with

(1.74) ∂t1F (y + t1u1 + · · ·+ tkuk) =∑

|J|=1

F (J)(y + Σtjuj)uJ1 ,

and inductively obtain

(1.75) ∂t1 · · · ∂tkF (y + Σtjuj) =

∑

|J1|=···=|Jk|=1

F (J1+···+Jk)(y + Σtjuj)uJ11 · · ·uJk

k ,

hence

(1.76) DkF (y)(u1, . . . , uk) =∑

|J1|=···=|Jk|=1

F (J1+···+Jk)(y)uJ11 · · ·uJk

k .

In particular, if u1 = · · · = uk = u,

(1.77) DkF (y)(u, . . . , u) =∑

|J|=k

F (J)(y)uJ .

27

Hence (1.68) yields

(1.78) F (x) = F (0)+DF (0)x+ · · ·+ 1(k − 1)!

Dk−1F (0)(x, . . . , x)+1k!

DkF (θx)(x, . . . , x),

for some θ ∈ [0, 1], if F ∈ Ck(Br(0)). In fact, rather than appealing to (1.68), we can notethat

ϕ(t) = F (tx) =⇒ ϕ(k)(t) = ∂t1 · · · ∂tkϕ(t + t1 + · · ·+ tk)

∣∣∣t1=···=tk=0

= DkF (tx)(x, . . . , x),

and obtain (1.78) directly from (1.67). We can also use the notation

DjF (y)x⊗j = DjF (y)(x, . . . , x),

with j copies of x within the last set of parentheses, and rewrite (1.78) as

(1.79) F (x) = F (0) + DF (0)x + · · ·+ 1(k − 1)!

Dk−1F (0)x⊗(k−1) +1k!

DkF (θx)x⊗k.

Note how (1.78) and (1.79) generalize (1.71).

Exercises

1. Consider the following function f : R2 → R:

f(x, y) = (cos x)(cos y).

Find all its critical points, and determine which of these are local maxima, local minima,and saddle points.

2. Let M(n,R) denote the space of real n×n matrices. Assume F,G : M(n,R) → M(n,R)are of class C1. Show that H(X) = F (X)G(X) defines a C1 map H : M(n,R) → M(n,R),and

DH(X)Y = DF (X)Y G(X) + F (X)DG(X)Y.

3. Let Gl(n,R) ⊂ M(n,R) denote the set of invertible matrices. Show that

Φ : Gl(n,R) −→ M(n,R), Φ(X) = X−1

is of class C1 and thatDΦ(X)Y = −X−1Y X−1.

4. Identify R2 and C via z = x+ iy. Then multiplication by i on C corresponds to applying

J =(

0 −11 0

)

28

Let O ⊂ R2 be open, f : O → R2 be C1. Say f = (u, v). Regard Df(x, y) as a 2 × 2 realmatrix. One says f is holomorphic, or complex-analytic, provided the Cauchy-Riemannequations hold:

∂u

∂x=

∂v

∂y,

∂u

∂y= −∂v

∂x.

Show that this is equivalent to the condition

Df(x, y)J = J Df(x, y).

Generalize to O open in Cm, f : O → Cn.

5. Let f be C1 on a region in R2 containing [a, b]× y. Show that, as h → 0,

1h

[f(x, y + h)− f(x, y)

] −→ ∂f

∂y(x, y), uniformly on [a, b]× y.

Hint. Show that the left side is equal to

1h

∫ h

0

∂f

∂y(x, y + s) ds,

and use the uniform continuity of ∂f/∂y on [a, b]× [y − δ, y + δ]; cf. Proposition A.15.

6. In the setting of Exercise 5, show that

d

dy

∫ b

a

f(x, y) dx =∫ b

a

∂f

∂y(x, y) dx.

7. Considering the power series

f(x) = f(y) + f ′(y)(x− y) + · · ·+ f (j)(y)j!

(x− y)j + Rj(x, y),

show that∂Rj

∂y= − 1

j!f (j+1)(y)(x− y)j , Rj(x, x) = 0.

Use this to re-derive (0.53), and hence (1.22)–(1.23).

We define “big oh” and “little oh” notation:

f(x) = O(x) (as x → 0) ⇔∣∣∣f(x)

x

∣∣∣ ≤ C as x → 0,

f(x) = o(x) (as x → 0) ⇔ f(x)x

→ 0 as x → 0.

29

8. Let O ⊂ Rn be open and y ∈ O. Show that

f ∈ Ck+1(O) ⇒ f(x) =∑

|α|≤k

1α!

f (α)(y)(x− y)α + O(|x− y|k+1),

f ∈ Ck(O) ⇒ f(x) =∑

|α|≤k

1α!

f (α)(y)(x− y)α + o(|x− y|k).

Exercises 9–11 deal with properties of the determinant, as a differentiable function onspaces of matrices. Foundational work on determinants is developed in an accompanyingset of auxiliary exercises on determinants.

9. Let Mn×n be the space of complex n × n matrices, det : Mn×n → C the determinant.Show that, if I is the identity matrix,

D det(I)B = TrB,

i.e.,d

dtdet(I + tB)|t=0 = TrB.

10. If A(t) =(ajk(t)

)is a curve in Mn×n, use the expansion of (d/dt) det A(t) as a sum

of n determinants, in which the rows of A(t) are successively differentiated, to show that,for A ∈ Mn×n,

D det(A)B = Tr(Cof(A)t ·B),

where Cof(A) is the cofactor matrix of A.

11. Suppose A ∈ Mn×n is invertible. Using

det(A + tB) = (det A) det(I + tA−1B),

show thatD det(A)B = (detA)Tr(A−1B).

Comparing the result of Exercise 10, deduce Cramer’s formula:

(detA)A−1 = Cof(A)t.

12. Assume G : U → O, F : O → Ω. Show that

F, G ∈ C1 =⇒ F G ∈ C1.

(Hint. Use (1.7).) Show that, for any k ∈ N,

F, G ∈ Ck =⇒ F G ∈ Ck.

30

13. Show that the map Φ : Gl(n,R) → Gl(n,R) given by Φ(X) = X−1 is Ck for each k,i.e., Φ ∈ C∞.Hint. Start with the material of Exercise 3.

Auxiliary exercises on determinants

If Mn×n denotes the space of n× n complex matrices, we want to show that there is amap

(1.79) det : Mn×n → C

which is uniquely specified as a function ϑ : Mn×n → C satisfying:(a) ϑ is linear in each column aj of A,

(b) ϑ(A) = −ϑ(A) if A is obtained from A by interchanging two columns.(c) ϑ(I) = 1.

1. Let A = (a1, . . . , an), where aj are column vectors; aj = (a1j , . . . , anj)t. Show that, if(a) holds, we have the expansion

(1.80)

det A =∑

j

aj1 det (ej , a2, . . . , an) = · · ·

=∑

j1,··· ,jn

aj11 · · · ajnn det (ej1 , ej2 , . . . , ejn),

where e1, . . . , en is the standard basis of Cn.

2. Show that, if (b) and (c) also hold, then

(1.81) det A =∑

σ∈Sn

(sgn σ) aσ(1)1aσ(2)2 · · · aσ(n)n,

where Sn is the set of permutations of 1, . . . , n, and

(1.82) sgn σ = det (eσ(1), . . . , eσ(n)) = ±1.

To define sgn σ, the “sign” of a permutation σ, we note that every permutation σ can bewritten as a product of transpositions: σ = τ1 · · · τν , where a transposition of 1, . . . , ninterchanges two elements and leaves the rest fixed. We say sgn σ = 1 if ν is even and sgnσ = −1 if ν is odd. It is necessary to show that sgn σ is independent of the choice of sucha product representation. (Referring to (1.82) begs the question until we know that det iswell defined.)

31

3. Let σ ∈ Sn act on a function of n variables by

(1.83) (σf)(x1, . . . , xn) = f(xσ(1), . . . , xσ(n)).

Let P be the polynomial

(1.84) P (x1, . . . , xn) =∏

1≤j<k≤n

(xj − xk).

Show that

(1.85) (σP )(x) = (sgn σ) P (x),

and that this implies that sgn σ is well defined.

4. Deduce that there is a unique determinant satisfying (a)–(c), and that it is given by(1.81).

5. Show that (1.81) implies

(1.86) det A = det At.

Conclude that one can replace columns by rows in the characterization (a)–(c) of determinants.

Hint. aσ(j)j = a`τ(`) with ` = σ(j), τ = σ−1. Also, sgn σ = sgn τ.

6. Show that, if (a)–(c) hold (for rows), it follows that(d) ϑ(A) = ϑ(A) if A is obtained from A by adding cρ` to ρk, for some c ∈ C,

where ρ1, . . . , ρn are the rows of A.Re-prove the uniqueness of ϑ satisfying (a)–(d) (for rows) by applying row operations toA until either some row vanishes or A is converted to I.

7. Show that

(1.87) det (AB) = (det A)(det B).

Hint. For fixed B ∈ Mn×n, compare ϑ1(A) = det(AB) and ϑ2(A) = (det A)(det B). Foruniqueness, use an argument from Exercise 6.

8. Show that

(1.88) det

1 a12 · · · a1n

0 a22 · · · a2n...

......

0 an2 · · · ann

= det

1 0 · · · 00 a22 · · · a2n...

......

0 an2 · · · ann

= det A11

32

where A11 = (ajk)2≤j,k≤n.Hint. Do the first identity by the analogue of (d), for columns. Then exploit uniquenessfor det on M(n−1)×(n−1).

9. Deduce that det(ej , a2, . . . , an) = (−1)j−1 det A1j where Akj is formed by deleting thekth column and the jth row from A.

10. Deduce from the first sum in (1.80) that

(1.89) det A =n∑

j=1

(−1)j−1aj1 det A1j .

More generally, for any k ∈ 1, . . . , n,

(1.90) det A =n∑

j=1

(−1)j−kajk detAkj .

This is called an expansion of det A by minors, down the kth column. By definition, thecofactor matrix of A is given by

Cof(A)jk = (−1)j−k detAkj .

11. Show that

(1.91) det

a11 a12 · · · a1n

a22 · · · a2n

. . ....

ann

= a11a22 · · · ann.

Hint. Use (1.88) and induction.

Auxiliary exercises on the cross product

1. If u, v ∈ R3, show that the formula

(1.92) w · (u× v) = det

w1 u1 v1

w2 u2 v2

w3 u3 v3

for u×v = Π(u, v) defines uniquely a bilinear map Π : R3×R3 → R3. Show that it satisfies

i× j = k, j × k = i, k × i = j,

33

where i, j, k is the standard basis of R3.

2. We say T ∈ SO(3) provided that T is a real 3 × 3 matrix satisfying T tT = I and detT > 0, (hence det T = 1). Show that

(1.93) T ∈ SO(3) =⇒ Tu× Tv = T (u× v).

Hint. Multiply the 3× 3 matrix in Exercise 1 on the left by T.

3. Show that, if θ is the angle between u and v in R3, then

(1.94) |u× v| = |u| |v| | sin θ|.

Hint. Check this for u = i, v = ai + bj, and use Exercise 2 to show this suffices.

4. Show that κ : R3 → Skew(3), the set of antisymmetric real 3× 3 matrices, given by

(1.95) κ(y1, y2, y3) =

0 −y3 y2

y3 0 −y1

−y2 y1 0

satisfies

(1.96) Kx = y × x, K = κ(y).

Show that, with [A,B] = AB −BA,

(1.97)κ(x× y) =

[κ(x), κ(y)

],

Tr(κ(x)κ(y)t

)= 2x · y.

34

2. Inverse function and implicit function theorem

The Inverse Function Theorem gives a condition under which a function can be locallyinverted. This theorem and its corollary the Implicit Function Theorem are fundamentalresults in multivariable calculus. First we state the Inverse Function Theorem. Here, weassume k ≥ 1.

Theorem 2.1. Let F be a Ck map from an open neighborhood Ω of p0 ∈ Rn to Rn, withq0 = F (p0). Suppose the derivative DF (p0) is invertible. Then there is a neighborhoodU of p0 and a neighborhood V of q0 such that F : U → V is one-to-one and onto, andF−1 : V → U is a Ck map. (One says F : U → V is a diffeomorphism.)

First we show that F is one-to-one on a neighborhood of p0, under these hypotheses.In fact, we establish the following result, of interest in its own right.

Proposition 2.2. Assume Ω ⊂ Rn is open and convex, and let f : Ω → Rn be C1.Assume that the symmetric part of Df(u) is positive-definite, for each u ∈ Ω. Then f isone-to-one on Ω.

Proof. Take distinct points u1, u2 ∈ Ω, and set u2 − u1 = w. Consider ϕ : [0, 1] → R,given by

ϕ(t) = w · f(u1 + tw).

Then ϕ′(t) = w · Df(u1 + tw)w > 0 for t ∈ [0, 1], so ϕ(0) 6= ϕ(1). But ϕ(0) = w · f(u1)and ϕ(1) = w · f(u2), so f(u1) 6= f(u2).

To continue the proof of Theorem 2.1, let us set

(2.1) f(u) = A(F (p0 + u)− q0

), A = DF (p0)−1.

Then f(0) = 0 and Df(0) = I, the identity matrix. We show that f maps a neighborhoodof 0 one-to-one and onto some neighborhood of 0. Proposition 2.2 applies, so we know fis one-to-one on some neighborhood O of 0. We next show that the image of O under fcontains a neighborhood of 0.

We can write

(2.2) f(u) = u + R(u), R(0) = 0, DR(0) = 0.

For v small, we want to solve

(2.3) f(u) = v.

This is equivalent to u + R(u) = v, so let

(2.4) Tv(u) = v −R(u).

35

Thus solving (2.3) is equivalent to solving

(2.5) Tv(u) = u.

We look for a fixed point u = K(v) = f−1(v). Also, we want to prove that DK(0) = I,i.e., that K(v) = v + r(v) with r(v) = o(‖v‖). (The “little oh” notation is defined inExercise 8 of §1.) If we succeed in doing this, it follows easily that, for general x close toq0, G(x) = F−1(x) is defined, and DG(q0) = DF (p0)−1. A parallel argument, with p0

replaced by nearby u and x = F (u), gives

(2.6) DG(x) =(DF

(G(x)

))−1

.

Then a simple inductive argument shows that G is Ck if F is Ck. See Exercise 6 at theend of this section, for an approach to this last argument.

A tool we will use to solve (2.5) is the following general result, known as the ContractionMapping Principle.

Theorem 2.3. Let X be a complete metric space, and let T : X → X satisfy

(2.7) dist(Tx, Ty) ≤ r dist(x, y),

for some r < 1. (We say T is a contraction.) Then T has a unique fixed point x. For anyy0 ∈ X, T ky0 → x as k →∞.

Proof. Pick y0 ∈ X and let yk = T ky0. Then dist(yk, yk+1) ≤ rk dist(y0, y1), so

(2.8)

dist(yk, yk+m) ≤ dist(yk, yk+1) + · · ·+ dist(yk+m−1, yk+m)

≤ (rk + · · ·+ rk+m−1

)dist(y0, y1)

≤ rk(1− r

)−1 dist(y0, y1).

It follows that (yk) is a Cauchy sequence, so it converges; yk → x. Since Tyk = yk+1 andT is continuous, it follows that Tx = x, i.e., x is a fixed point. Uniqueness of the fixedpoint is clear from the estimate dist(Tx, Tx′) ≤ r dist(x, x′), which implies dist(x, x′) = 0if x and x′ are fixed points. This proves Theorem 2.3.

Returning to the problem of solving (2.5), we pick b > 0 such that

(2.9) B2b(0) ⊂ O and ‖w‖ ≤ 2b ⇒ ‖DR(w)‖ ≤ 12.

We claim that

(2.10) ‖v‖ ≤ b =⇒ Tv : Xv → Xv,

where

(2.11) Xv = u ∈ O : ‖u− v‖ ≤ Av, Av = sup‖w‖≤2‖v‖

‖R(w)‖.

36

See Fig. 2.1. To prove that (2.10) holds, note that

(2.12) Tv(u)− v = −R(u),

so we need to show that

(2.13) ‖v‖ ≤ b, u ∈ Xv =⇒ ‖R(u)‖ ≤ Av.

Indeed,

(2.14) u ∈ Xv =⇒ ‖u‖ ≤ ‖v‖+ Av,

and, by (2.9) and (2.11),

(2.15) ‖v‖ ≤ b =⇒ Av ≤ ‖v‖,

and we have

(2.16)u ∈ Xv =⇒ ‖u‖ ≤ 2‖v‖ (hence, parenthetically, Xv ⊂ B2b(0))

=⇒ ‖R(u)‖ ≤ sup‖w‖≤2‖v‖

‖R(w)‖ = Av.

This establishes (2.10).Now, given uj ∈ Xv, ‖v‖ ≤ b,

(2.17)‖Tv(u1)− Tv(u2)‖ = ‖R(u2)−R(u1)‖

≤ 12‖u1 − u2‖,

the last inequality by (2.9), so the map (2.10) is a contraction, if ‖v‖ ≤ b. Hence thereexists a unique fixed point u = K(v) ∈ Xv. Also, since u ∈ Xv,

(2.18) ‖K(v)− v‖ ≤ Av = o(‖v‖),

so DK(0) = I, and the Inverse Function Theorem is proved

Thus if DF is invertible on the domain of F, F is a local diffeomorphism. Strongerhypotheses are needed to guarantee that F is a global diffeomorphism onto its range.Proposition 2.2 provides one tool for doing this. Here is a slight strengthening.

Corollary 2.4. Assume Ω ⊂ Rn is open and convex, and that F : Ω → Rn is C1. Assumethere exist n× n matrices A and B such that the symmetric part of ADF (u)B is positivedefinite for each u ∈ Ω. Then F maps Ω diffeomorphically onto its image, an open set inRn.

Proof. Exercise.

37

We make a comment about solving the equation F (x) = y, under the hypotheses ofTheorem 2.1, when y is close to q0. The fact that finding the fixed point for Tv in (2.9) isaccomplished by taking the limit of T k

v (v) implies that, when y is sufficiently close to q0,the sequence (xk), defined by

(2.19) x0 = p0, xk+1 = xk + DF (p0)−1(y − F (xk)

),

converges to the solution x. An analysis of the rate at which xk → x, and F (xk) → y, canbe made by applying F to (2.19), yielding

F (xk+1) = F (xk + DF (p0)−1(y − F (xk))

= F (xk) + DF (xk)DF (p0)−1(y − F (xk)) + R(xk, DF (p0)−1(y − F (xk))),

and hence

(2.20) y − F (xk+1) =(I −DF (xk)DF (p0)−1

)(y − F (xk)) + R(xk, y − F (xk)),

with ‖R(xk, y − F (xk))‖ = o(‖y − F (xk)‖).It turns out that replacing p0 by xk in (2.19) yields a faster approximation. This method,

known as Newton’s method, is described in the exercises.We consider some examples of maps to which Theorem 2.1 applies. First, we look at

(2.21) F : (0,∞)× R −→ R2, F (r, θ) =(

r cos θ

r sin θ

)=

(x(r, θ)y(r, θ)

).

Then

(2.22) DF (r, θ) =(

∂rx ∂θx∂ry ∂θy

)=

(cos θ −r sin θsin θ r cos θ

),

so

(2.23) det DF (r, θ) = r cos2 θ + r sin2 θ = r.

Hence DF (r, θ) is invertible for all (r, θ) ∈ (0,∞) × R. Theorem 2.1 implies that each(r0, θ0) ∈ (0,∞)×R has a neighborhood U and (x0, y0) = (r0 cos θ0, r0 sin θ0) has a neigh-borhood V such that F is a smooth diffeomorphism of U onto V . In this simple situation,it can be verified directly that

(2.24) F : (0,∞)× (−π, π) −→ R2 \ (x, 0) : x ≤ 0

is a smooth diffeomorphism.Note that DF (1, 0) = I in (2.22). Let us check the domain of applicability of Proposition

2.2. The symmetric part of DF (r, θ) in (2.22) is

(2.25) S(r, θ) =(

cos θ 12 (1− r) sin θ

12 (1− r) sin θ r cos θ

).

38

By Proposition 1.7, this is positive definite if and only if

(2.26) cos θ > 0,

and

(2.27) det S(r, θ) = r cos2 θ − 14(1− r)2 sin2 θ > 0.

Now (2.26) holds for θ ∈ (−π/2, π/2), but not on all of (−π, π). Furthermore, (2.27) holdsfor (r, θ) in a neighborhood of (r0, θ0) = (1, 0), but it does not hold on all of (0,∞) ×(−π/2, π/2). We see that Proposition 2.2 does not capture the full force of (2.24).

We move on to another example. As in §1, we can extend Theorem 2.1, replacing Rn by afinite dimensional real vector space, isometric to a Euclidean space, such as M(n,R) ≈ Rn2

.As an example, consider

(2.28) Exp : M(n,R) −→ M(n,R), Exp(X) = eX =∞∑

k=0

1k!

Xk.

Since

(2.29) Exp(Y ) = I + Y +12Y 2 + · · · ,

we have

(2.30) D Exp(0)Y = Y, ∀Y ∈ M(n,R),

so D Exp(0) is invertible. Then Theorem 2.1 implies that there exist a neighborhod U of0 ∈ M(n,R) and a neighborhood V of I ∈ M(n,R) such that Exp : U → V is a smoothdiffeomorphism.

To motivate the next result, we consider the following example. Take a > 0 and considerthe equation

(2.31) x2 + y2 = a2, F (x, y) = x2 + y2.

Note that

(2.32) DF (x, y) = (2x 2y), DxF (x, y) = 2x, DyF (x, y) = 2y.

The equation (2.31) defines y “implicitly” as a smooth function of x if |x| < a. Explicitly,

(2.33) |x| < a =⇒ y =√

a2 − x2,

Similarly, (2.31) defines x implicitly as a smooth function of y if |y| < a; explicitly

(2.34) |y| < a =⇒ x =√

a2 − y2.

39

Now, given x0 ∈ R, a > 0, there exists y0 ∈ R such that F (x0, y0) = a2 if and only if|x0| ≤ a. Furthermore,

(2.35) given F (x0, y0) = a2, DyF (x0, y0) 6= 0 ⇔ |x0| < a.

Similarly, given y0 ∈ R, there exists x0 such that F (x0, y0) = a2 if and only if |y0| ≤ a,and

(2.36) given F (x0, y0) = a2, DxF (x0, y0) 6= 0 ⇔ |x0| < a.

Note also that, whenever (x, y) ∈ R2 and F (x, y) = a2 > 0,

(2.37) DF (x, y) 6= 0,

so either DxF (x, y) 6= 0 or DyF (x, y) 6= 0, and, as seen above whenever (x0, y0) ∈ R2 andF (x0, y0) = a2 > 0, we can solve F (x, y) = a2 for either y as a smooth function of x for xnear x0 or for x as a smooth funciton of y for y near y0.

We move from these observaitons to the next result, the Implicit Function Theorem.

Theorem 2.5. Suppose U is a neighborhood of x0 ∈ Rm, V a neighborhood of y0 ∈ R`,and we have a Ck map

(2.38) F : U × V −→ R`, F (x0, y0) = u0.

Assume DyF (x0, y0) is invertible. Then the equation F (x, y) = u0 defines y = g(x, u0) forx near x0 (satisfying g(x0, u0) = y0) with g a Ck map.

To prove this, consider H : U × V → Rm × R` defined by

(2.39) H(x, y) =(x, F (x, y)

).

(Actually, regard (x, y) and (x, F (x, y)) as column vectors.) We have

(2.40) DH =(

I 0DxF DyF

).

Thus DH(x0, y0) is invertible, so J = H−1 exists, on a neighborhood of (x0, u0), and isCk, by the Inverse Function Theorem. It is clear that J(x, u0) has the form

(2.41) J(x, u0) =(x, g(x, u0)

),

and g is the desired map.

Here is an example where Theorem 2.5 applies. Set

(2.42) F : R4 −→ R2, F (u, v, x, y) =(

x(u2 + v2)xu + yv

).

40

We have

(2.43) F (2, 0, 1, 1) =(

42

).

Note that

(2.44) Du,vF (u, v, x, y) =(

2xu 2xvx y

),

hence

(2.45) Du,vF (2, 0, 1, 1) =(

4 01 1

)

is invertible, so Theorem 2.5 (with (u, v) in place of y and (x, y) in place of x) implies thatthe equation

(2.46) F (u, v, x, y) =(

42

)

defines smooth functions

(2.47) u = u(x, y), v = v(x, y),

for (x, y) near (x0, y0) = (1, 1), satisfying (2.46), with (u(1, 1), v(1, 1)) = (2, 0).Let us next focus on the case ` = 1 of Theorem 2.5, so

(2.48) z = (x, y) ∈ Rn, x ∈ Rn−1, y ∈ R, F (z) ∈ R.

Then DyF = ∂yF . If F (x0, y0) = u0, Theorem 2.5 says that if

(2.49) ∂yF (x0, y0) 6= 0,

then one can solve

(2.50) F (x, y) = u0 for y = g(x, u0),

for x near x0 (satisfying g(x0, u0) = y0), with g a Ck function. This phenomenon wasillustrated in (2.31)–(2.35). To generalize the observations involving (2.36)–(2.37), wenote the following. Set (x, y) = z = (z1, . . . , zn), z0 = (x0, y0). The condition (2.49) isthat ∂znF (z0) 6= 0. Now a simple permutation of variables allows us to assume

(2.51) ∂zj F (z0) 6= 0, F (z0) = u0,

and deduce that one can solve

(2.52) F (z) = u0, for zj = g(z1, . . . , zj−1, zj+1, . . . , zn).

Let us record this result, changing notation and replacing z by x.

41

Proposition 2.6. Let Ω be a neighborhood of x0 ∈ Rn. Asume we have a Ck function

(2.53) F : Ω −→ R, F (x0) = u0,

and assume

(2.54) DF (x0) 6= 0, i.e., (∂1F (x0), . . . , ∂nF (x0)) 6= 0.

Then there exists j ∈ 1, . . . , n such that one can solve F (x) = u0 for

(2.55) xj = g(x1, . . . , xj−1, xj+1, . . . , xn),

with (x10, . . . , xj0, . . . , xn0) = x0, for a Ck function g.

Remark. For F : Ω → R, it is common to denote DF (x) by ∇F (x),

(2.56) ∇F (x) = (∂1F (x), . . . , ∂nF (x)).

Here is an example to which Proposition 2.6 applies. Using the notation (x, y) =(x1, x2), set

(2.57) F : R2 −→ R, F (x, y) = x2 + y2 − x.

Then

(2.58) ∇F (x, y) = (2x− 1, 2y),

which vanishes if and only if x = 1/2, y = 0. Hence Proposition 2.6 applies if and only if(x0, y0) 6= (1/2, 0).

Let us give an example involving a real valued function on M(n,R), namely

(2.59) det : M(n,R) −→ R.

As indicated in Exercise 11 of §1 (the first exercise set), if det X 6= 0,

(2.60) D det(X)Y = (det X) Tr(X−1Y ),

so

(2.61) det X 6= 0 =⇒ D det(X) 6= 0.

We deduce that, if

(2.62) X0 ∈ M(n,R), detX0 = a 6= 0,

42

then, writing

(2.63) X = (xjk)1≤j,k≤n,

there exist µ, ν ∈ 1, . . . , n such that the equation

(2.64) detX = a

has a smooth solution of the form

(2.65) xµν = g(xαβ : (α, β) 6= (µ, ν)

),

such that, if the argument of g consists of the matrix entries of X0 other than the µ, νentry, then the left side of (2.65) is the µ, ν entry of X0.

Let us return to the setting of Theorem 2.5, with ` not necessarily equal to 1. In notationparallel to that of (2.51), we assume F is a Ck map,

(2.66) F : Ω −→ R`, F (z0) = u0,

whre Ω is a neighborhood of z0 in Rn. We assume

(2.67) DF (z0) : Rn −→ R` is surjective.

Then, upon reordering the variables z = (z1, . . . , zn), we can write z = (x, y), x =(x1, . . . , xn−`), y = (y1, . . . , y`), such that DyF (z0) is invertible, and Theorem 2.5 ap-plies. Thus (for this reordering of variables), we have a Ck solution to

(2.68) F (x, y) = u0, y = g(x, u0),

satisfying y0 = g(x0, u0), z0 = (x0, y0).To give one example to which this result applies, we take another look at F : R4 → R2

in (2.42). We have

(2.69) DF (u, v, x, y) =(

2xu 2xv u2 + v2 0x y u v

).

The reader is invited to determine for which (u, v, x, y) ∈ R4 the matrix on the right sideof (2.69) has rank 2.

Here is another example, involving a map defined on M(n,R). Set

(2.70) F : M(n,R) −→ R2, F (X) =(

detX

Tr X

).

Parallel to (2.60), if detX 6= 0, Y ∈ M(n,R),

(2.71) DF (X)Y =(

(det X)Tr(X−1Y )TrY

).

43

Hence, given det X 6= 0, DF (X) : M(n,R) → R2 is surjective if and only if

(2.72) L : M(n,R) → R2, LY =(

Tr(X−1Y )TrY

)

is surjective. This is seen to be the case if and only if X is not a scalar multiple of theidentity I ∈ M(n,R).

Exercises

1. Supose F : U → Rn is a C2 map, p ∈ U, open in Rn, and DF (p) is invertible. Withq = F (p), define a map N on a neighborhood of p by

(2.73) N(x) = x + DF (x)−1(q − F (x)

).

Show that there exists ε > 0 and C < ∞ such that, for 0 ≤ r < ε,

‖x− p‖ ≤ r =⇒ ‖N(x)− p‖ ≤ C r2.

Conclude that, if ‖x1 − p‖ ≤ r with r < min(ε, 1/2C), then xj+1 = N(xj) defines asequence converging very rapidly to p. This is the basis of Newton’s method, for solvingF (p) = q for p.Hint. Apply F to both sides of (2.73).

2. Applying Newton’s method to f(x) = 1/x, show that you get a fast approximation todivision using only addition and multiplication.Hint. Carry out the calculation of N(x) in this case and notice a “miracle.”

3. Identify R2n with Cn via z = x + iy, as in Exercise 4 of §1. Let U ⊂ R2n be open,F : U → R2n be C1. Assume p ∈ U, DF (p) invertible. If F−1 : V → U is given as inTheorem 2.1, show that F−1 is holomorphic provided F is.

4. Let O ⊂ Rn be open. We say a function f ∈ C∞(O) is real analytic provided that, foreach x0 ∈ O, we have a convergent power series expansion

(2.74) f(x) =∑

α≥0

1α!

f (α)(x0)(x− x0)α,

valid in a neighborhood of x0. Show that we can let x be complex in (2.16), and obtainan extension of f to a neighborhood of O in Cn. Show that the extended function isholomorphic, i.e., satisfies the Cauchy-Riemann equations.Remark. It can be shown that, conversely, any holomorphic function has a power seriesexpansion. See §10. For the next exercise, assume this as known.

5. Let O ⊂ Rn be open, p ∈ O, f : O → Rn be real analytic, with Df(p) invertible. Take

44

f−1 : V → U as in Theorem 2.1. Show f−1 is real analytic.Hint. Consider a holomorphic extension F : Ω → Cn of f and apply Exercise 3.

6. Use (2.6) to show that if a C1 diffeomorphism has a C1 inverse G, and if actually F isCk, then also G is Ck.Hint. Use induction on k. Write (2.6) as

G(x) = Φ F G(x),

with Φ(X) = X−1, as on Exercises 3 and 13 of §1, G(x) = DG(x),F(x) = DF (x). ApplyExercise 12 of §1 to show that, in general

G,F , Φ ∈ C` =⇒ G ∈ C`.

Deduce that if one is given F ∈ Ck and one knows that G ∈ Ck−1, then this result appliesto give G = DG ∈ Ck−1, hence G ∈ Ck.

7. Show that there is a neighborhood O of (1, 0) ∈ R2 and there are functions u, v, w ∈C1(O) (u = u(x, y), etc.) satisfying the equations

u3 + v3 − xw3 = 0,

u2 + yw2 + v = 1,

xu + yvw = 1,

for (x, y) ∈ O, and satisfying

u(1, 0) = 1, v(1, 0) = 0, w(1, 0) = 1.

Hint. Define F : R5 → R3 by

F (u, v, w, x, y) =

u3 + v3 − xw3

u2 + yw2 + vxu + yvw

,

Then F (1, 0, 1, 1, 0) = (0, 1, 1)t. Evaluate the 3× 3 matrix Du,v,wF (1, 0, 1, 1, 0). Compare(2.42)–(2.47).

8. Consider F : M(n,R) → M(n,R), given by F (X) = X2. Show that F is a diffeomor-phism of a neighborhood of the identity matrix I onto a neighborhood of I. Show that Fis not a diffeomorphism of a neighborhood of

(1 00 −1

)

onto a neighborhood of I (in case n = 2).

9. Prove Corollary 2.4.

45

3. Fundamental local existence theorem for ODE

The goal of this section is to establish existence of solutions to an ODE

(3.1)dy

dt= F (t, y), y(t0) = y0.

We will prove the following fundamental result.

Theorem 3.1. Let y0 ∈ O, an open subset of Rn, I ⊂ R an interval containing t0.Suppose F is continuous on I ×O and satisfies the following Lipschitz estimate in y :

(3.2) ‖F (t, y1)− F (t, y2)‖ ≤ L‖y1 − y2‖

for t ∈ I, yj ∈ O. Then the equation (3.1) has a unique solution on some t-intervalcontaining t0.

To begin the proof, we note that the equation (3.1) is equivalent to the integral equation

(3.3) y(t) = y0 +∫ t

t0

F(s, y(s)

)ds.

Existence will be established via the Picard iteration method, which is the following. Guessy0(t), e.g., y0(t) = y0. Then set

(3.4) yk(t) = y0 +∫ t

t0

F(s, yk−1(s)

)ds.

We aim to show that, as k → ∞, yk(t) converges to a (unique) solution of (3.3), at leastfor t close enough to t0.

To do this, we use the Contraction Mapping Principle, established in §2. We look for afixed point of T, defined by

(3.5) (Ty)(t) = y0 +∫ t

t0

F(s, y(s)

)ds.

Let

(3.6) X =u ∈ C(J,Rn) : u(t0) = y0, sup

t∈J‖u(t)− y0‖ ≤ K

.

Here J = [t0 − ε, t0 + ε], where ε will be chosen, sufficiently small, below. The quantity Kis picked so that y : ‖y − y0‖ ≤ K is contained in O, and we also suppose J ⊂ I. Thenthere exists M such that

(3.7) sups∈J,‖y−y0‖≤K

‖F (s, y)‖ ≤ M.

46

Then, provided

(3.8) ε ≤ K

M,

we have

(3.9) T : X → X.

Now, using the Lipschitz hypothesis (3.2), we have, for t ∈ J,

(3.10)‖(Ty)(t)− (Tz)(t)‖ ≤

∫ t

t0

L‖y(s)− z(s)‖ ds

≤ ε L sups∈J

‖y(s)− z(s)‖

assuming y and z belong to X. It follows that T is a contraction on X provided one has

(3.11) ε <1L

in addition to the hypotheses above. This proves Theorem 3.1.In view of the lower bound on the length of the interval J on which the existence theorem

works, it is easy to show that the only way a solution can fail to be globally defined, i.e., toexist for all t ∈ I, is for y(t) to “explode to infinity” by leaving every compact set K ⊂ O,as t → t1, for some t1 ∈ I.

Often one wants to deal with a higher-order ODE. There is a standard method ofreducing an nth-order ODE

(3.12) y(n)(t) = f(t, y, y′, . . . , y(n−1))

to a first-order system. One sets u = (u0, . . . , un−1) with

(3.13) u0 = y, uj = y(j),

and then

(3.14)du

dt=

(u1, . . . , un−1, f(t, u0, . . . , un−1)

)= g(t, u).

If y takes values in Rk, then u takes values in Rkn.If the system (3.1) is non-autonomous, i.e., if F explicitly depends on t, it can be

converted to an autonomous system (one with no explicit t-dependence) as follows. Setz = (t, y). We then have

(3.15)dz

dt=

(1,

dy

dt

)=

(1, F (z)

)= G(z).

47

Sometimes this process destroys important features of the original system (3.1). For ex-ample, if (3.1) is linear, (3.15) might be nonlinear. Nevertheless, the trick of converting(3.1) to (3.15) has some uses.

Many systems of ODE are difficult to solve explicitly. There is one very basic class ofODE which can be solved explicitly, in terms of integrals, namely the single first orderlinear ODE:

(3.16)dy

dt= a(t)y + b(t), y(0) = y0,

where a(t) and b(t) are continuous real or complex valued functions. Set

(3.17) A(t) =∫ t

0

a(s) ds.

Then (3.16) can be written as

(3.18) eA(t) d

dt

(e−A(t)y

)= b(t).

See Exercise 5 below for a comment on the exponential function. From (3.18) we get

(3.19) y(t) = eA(t)y0 + eA(t)

∫ t

0

e−A(s)b(s) ds.

In an appendix to this section, we will discuss the flow generated by a vector field X onan open set U ⊂ Rn, in terms of solutions to dy/dt = X(y).

Exercises

1. Solve the initial value problem

dy

dt= y2, y(0) = a,

given a ∈ R. On what t-interval is the solution defined?

2. Under the hypotheses of Theorem 3.1, if y solves (3.1) for t ∈ [T0, T1], and y(t) ∈ K,compact in O, for all such t, prove that y(t) extends to a solution for t ∈ [S0, S1], withS0 < T0, T1 > T0, as stated below (3.11).

3. Let M be a compact smooth surface in Rn. Suppose F : Rn → Rn is a smooth map(vector field), such that, for each x ∈ M, F (x) is tangent to M, i.e., the line γx(t) =x + tF (x) is tangent to M at x, at t = 0. Show that, if x ∈ M, then the initial valueproblem

dy

dt= F (y), y(0) = x

48

has a solution for all t ∈ R, and y(t) ∈ M for all t.Hint. Locally, straighten out M to be a linear subspace of Rn, to which F is tangent. Useuniqueness. Material in §2 helps do this local straightening.

4. Show that the initial value problem

dx

dt= −x(x2 + y2),

dy

dt= −y(x2 + y2), x(0) = x0, y(0) = y0

has a solution for all t ≥ 0, but not for all t < 0, unless (x0, y0) = (0, 0).

5. Let a ∈ R. Show that the unique solution to u′(t) = au(t), u(0) = 1 is given by

(3.20) u(t) =∞∑

j=0

aj

j!tj .

We denote this function by u(t) = eat, the exponential function. We also write exp(t) = et.Hint. Integrate the series term by term and use the Fundamental Theorem of Calculus.Alternative. Setting u0(t) = 1, and using the Picard iteration method (3.4) to define thesequence uk(t), show that uk(t) =

∑kj=0 ajtj/j!

6. Show that, for all s, t ∈ R,

(3.21) ea(s+t) = easeat.

Hint. Show that u1(t) = ea(s+t) and u2(t) = easeat solve the same initial value problem.

7. Show that exp : R→ (0,∞) is a diffeomorphism. We denote the inverse by

log : (0,∞) −→ R.

Show that v(x) = log x solves the ODE dv/dx = 1/x, v(1) = 0, and deduce that

(3.22)∫ x

1

1y

dy = log x.

8. Let a ∈ R, i =√−1. Show that the unique solution to f ′(t) = iaf, f(0) = 1 is given

by

(3.23) f(t) =∞∑

j=0

(ia)j

j!tj .

We denote this function by f(t) = eiat. Show that, for all t ∈ R,

(3.24) eia(s+t) = eiaseiat.

49

9. Write

(3.25) eit =∞∑

j=0

(−1)j

(2j)!t2j + i

∞∑

j=0

(−1)j

(2j + 1)!t2j+1 = u(t) + iv(t).

Show thatu′(t) = −v(t), v′(t) = u(t).

We denote these functions by u(t) = cos t, v(t) = sin t. The identity

(3.26) eit = cos t + i sin t

is called Euler’s formula.

Auxiliary exercises on trigonometric functions

We use the definitions of sin t and cos t given in Exercise 9 of the last exercise set.

1. Use (3.24) to derive the identities

(3.27)sin(x + y) = sin x cos y + cos x sin y

cos(x + y) = cos x cos y − sin x sin y.

2. Use (3.26)–(3.27) to show that

(3.28) sin2 t + cos2 t = 1, cos2 t =12(1 + cos 2t

).

3. Show thatγ(t) = (cos t, sin t)

is a map of R onto the unit circle S1 ⊂ R2 with non-vanishing derivative, and, as t increases,γ(t) moves monotonically, counterclockwise.We define π to be the smallest number t1 ∈ (0,∞) such that γ(t1) = (−1, 0), so

cosπ = −1, sin π = 0.

Show that 2π is the smallest number t2 ∈ (0,∞) such that γ(t2) = (1, 0), so

cos 2π = 1, sin 2π = 0.

Show thatcos(t + 2π) = cos t, cos(t + π) = − cos t

sin(t + 2π) = sin t, sin(t + π) = − sin t.

50

Show that γ(π/2) = (0, 1), and that

cos(t +

12π)

= − sin t, sin(t +

12π)

= cos t.

4. Show that sin : (−π/2, π/2) → (−1, 1) is a diffeomorphism. We denote its inverse by

arcsin : (−1, 1) −→(−1

2π,

12π).

Show that u(t) = arcsin t solves the ODE

du

dt=

1√1− t2

, u(0) = 0.

Hint. Apply the chain rule to sin(u(t)

)= t.

Deduce that, for t ∈ (−1, 1),

(3.29) arcsin t =∫ t

0

dx√1− x2

.

5. Show that

eπi/3 =12

+√

32

i, eπi/6 =√

32

+12

i.

Hint. First compute (12

+√

32

i)3

and use Exercise 3. Then compute eπi/2e−πi/3.For intuition behind these formulas, look at Fig. 3.1.

6. Show that sin(π/6) = 1/2, and hence that

π

6=

∫ 1/2

0

dx√1− x2

=∞∑

n=0

an

2n + 1

(12

)2n+1

,

wherea0 = 1, an+1 =

2n + 12n + 2

an.

Show thatπ

6−

k∑n=0

an

2n + 1

(12

)2n+1

<4−k

3(2k + 3).

Using a calculator, sum the series over 0 ≤ n ≤ 20, and verify that

π ≈ 3.141592653589 · · ·

51

7. For x 6= (k + 1/2)π, k ∈ Z, set

tan x =sin x

cosx.

Show that 1 + tan2 x = 1/ cos2 x. Show that w(x) = tan x satisfies the ODE

dw

dx= 1 + w2, w(0) = 0.

8. Show that tan : (−π/2, π/2) → R is a diffeomorphism. Denote the inverse by

arctan : R −→(−1

2π,

12π).

Show that

(3.30) arctan y =∫ y

0

dx

1 + x2.

3b. Vector fields and flows

Let U ⊂ Rn be open. A vector field on U is a smooth map

(3.31) X : U −→ Rn.

Consider the corresponding ODE

(3.32)dy

dt= X(y), y(0) = x,

with x ∈ U. A curve y(t) solving (3.32) is called an integral curve of the vector field X. Itis also called an orbit. For fixed t, write

(3.33) y = y(t, x) = F tX(x).

The locally defined F tX , mapping (a subdomain of) U to U, is called the flow generated by

the vector field X. In (3.33), y is a smooth function of (t, x). A proof of this can be foundin §6 of [T:1].

The vector field X defines a differential operator on scalar functions, as follows:

(3.34) LXf(x) = limh→0

h−1[f(Fh

Xx)− f(x)]

=d

dtf(F t

Xx)∣∣t=0

.

52

We also use the common notation

(3.35) LXf(x) = Xf,

that is, we apply X to f as a first order differential operator.Note that, if we apply the chain rule to (3.34) and use (3.32), we have

(3.36) LXf(x) = X(x) · ∇f(x) =∑

aj(x)∂f

∂xj,

if X =∑

aj(x)ej , with ej the standard basis of Rn. In particular, using the notation(3.35), we have

(3.37) aj(x) = Xxj .

In the notation (3.35),

(3.38) X =∑

aj(x)∂

∂xj.

We next consider mapping properties of vector fields. If F : V → W is a diffeomorphismbetween two open domains in Rn, and Y is a vector field on W, we define a vector fieldF#Y on V so that

(3.39) F tF#Y = F−1 F t

Y F,

or equivalently, by the chain rule,

(3.40) F#Y (x) =(DF−1

)(F (x)

)Y

(F (x)

).

In particular, if U ⊂ Rn is open and X is a vector field on U, defining a flow F t, thenfor a vector field Y, F t

#Y is defined on most of U, for |t| small, and we can define the Liederivative:

(3.41) LXY = limh→0

h−1(Fh

#Y − Y)

=d

dtF t

#Y∣∣t=0

,

as a vector field on U.Another natural construction is the operator-theoretic bracket:

(3.42) [X,Y ] = XY − Y X,

where the vector fields X and Y are regarded as first order differential operators on C∞(U).One verifies that (3.42) defines a vector field on U. In fact, if X =

∑aj(x)∂/∂xj , Y =∑

bj(x)∂/∂xj , then

(3.43) [X, Y ] =∑

j,k

(ak

∂bj

∂xk− bk

∂aj

∂xk

) ∂

∂xj.

The basic elementary fact about the Lie bracket is the following.

53

Theorem 3.2. If X and Y are smooth vector fields, then

(3.44) LXY = [X, Y ].

Proof. We examine LXY = (d/ds)FsX#Y

∣∣s=0

, using (3.40), which implies that

(3.45) Ys(x) = FsX#Y (x) = DF−s

X

(FsX(x)

)Y

(FsX(x)

).

Let us set Gs = DF−sX . Note that Gs : U → End(Rn). Hence, for x ∈ U, DGs(x) is an

element of Hom(Rn, End(Rn)

). Taking the s-derivative of (3.45), we have

(3.46)d

dsYs(x) = −DX

(FsX(x)

)Y

(FsX(x)

)+ DGs

(FsX(x)

)X

(FsX(x)

)Y

(FsX(x)

)

+ DF−sX

(FsX(x)

)DY

(FsX(x)

)X

(FsX(x)

).

Note that G0(x) = I ∈ End(Rn) for all x ∈ U, so DG0 = 0. Thus

(3.47) LXY =d

dsYs(x)

∣∣s=0

= −DX(x)Y (x) + DY (x)X(x),

which agrees with the formula (3.43) for [X, Y ].

Corollary 3.3. If X and Y are smooth vector fields on U, then

(3.48)d

dtF t

X#Y = F tX#[X,Y ]

for all t.

Proof. Since locally F t+sX = Fs

XF tX , we have the same identity for F t+s

X#, which yields(3.48) upon taking the s-derivative.

54

4. The Riemann integral in n variables

We define the Riemann integral of a bounded function f : R → R, where R ⊂ Rn

is a cell, i.e., a product of intervals R = I1 × · · · × In, where Iν = [aν , bν ] are intervalsin R. Recall that a partition of an interval I = [a, b] is a finite collection of subintervalsJk : 0 ≤ k ≤ N, disjoint except for their endpoints, whose union is I. We can takeJk = [xk, xk+1], where

(4.1) a = x0 < x1 < · · · < xN < xN+1 = b.

Now, if one has a partition of each Iν into Jν1 ∪ · · · ∪ Jν,N(ν), then a partition P of Rconsists of the cells

(4.2) Rα = J1α1 × J2α2 × · · · × Jnαn ,

where 0 ≤ αν ≤ N(ν). For such a partition, define

(4.3) maxsize (P) = maxα

diam Rα,

where (diam Rα)2 = `(J1α1)2 + · · ·+ `(Jnαn)2. Here, `(J) denotes the length of an interval

J. Each cell has n-dimensional volume

(4.4) V (Rα) = `(J1α1) · · · `(Jnαn).

Sometimes we use Vn(Rα) for emphasis on the dimension. We also use A(R) for V2(R),and, of course, `(R) for V1(R).

We set

(4.5)

IP(f) =∑α

supRα

f(x)V (Rα),

IP(f) =∑α

infRα

f(x)V (Rα).

Note that IP(f) ≤ IP(f). These quantities should approximate the Riemann integral off, if the partition P is sufficiently “fine.”

To be more precise, if P and Q are two partitions of R, we say P refines Q, and writeP Â Q, if each partition of each interval factor Iν of R involved in the definition of Q isfurther refined in order to produce the partitions of the factors Iν , used to define P, via(4.2). It is an exercise to show that any two partitions of R have a common refinement.Note also that

(4.6) P Â Q =⇒ IP(f) ≤ IQ(f), and IP(f) ≥ IQ(f).

55

Consequently, if Pj are any two partitions of R and Q is a common refinement, we have

(4.7) IP1(f) ≤ IQ(f) ≤ IQ(f) ≤ IP2(f).

Now, whenever f : R → R is bounded, the following quantities are well defined:

(4.8) I(f) = infP∈Π(R)

IP(f), I(f) = supP∈Π(R)

IP(f),

where Π(R) is the set of all partitions of R, as defined above. Clearly, by (4.7), I(f) ≤ I(f).We then say that f is Riemann integrable (on R) provided I(f) = I(f), and in such acase, we set

(4.9)∫

R

f(x) dV (x) = I(f) = I(f).

We will denote the set of Riemann integrable functions on R by R(R). If dim R = 2, wewill often use dA(x) instead of dV (x) in (4.9), and we will often use simply dx.

We derive some basic properties of the Riemann integral. First, the proofs of Proposition0.3 and Corollary 0.4 readily extend, to give:

Proposition 4.1. Let Pν be any sequence of partitions of R such that

maxsize (Pν) = δν → 0,

and let ξνα be any choice of one point in each cell Rνα in the partition Pν . Then, wheneverf ∈ R(R),

(4.10)∫

R

f(x) dV (x) = limν→∞

∑α

f(ξνα) V (Rνα).

Also, we can extend the proof of Proposition 0.1, to obtain:

Proposition 4.2. If fj ∈ R(R) and cj ∈ R, then c1f1 + c2f2 ∈ R(R), and

(4.11)∫

R

(c1f1 + c2f2) dV = c1

∫

R

f1 dV + c2

∫

R

f2 dV.

Next, we establish an integrability result analogous to Proposition 0.2.

Proposition 4.3. If f is continuous on R, then f ∈ R(R).

Proof. As in the proof of Proposition 0.2, we have that,

(4.12) maxsize (P) ≤ δ =⇒ IP(f)− IP(f) ≤ ω(δ) · V (R),

where ω(δ) is a modulus of continuity for f on R. This proves the proposition.

56

When the number of variables exceeds one, it becomes more important to identify somenice classes of discontinuous functions on R that are Riemann integrable. A useful tool forthis is the following notion of size of a set S ⊂ R, called content. Extending (0.18)–(0.19),we define “upper content” cont+ and “lower content” cont− by

(4.13) cont+(S) = I(χS), cont−(S) = I(χS),

where χS is the characteristic function of S. We say S has content, or “is contented,” ifthese quantities are equal, which happens if and only if χS ∈ R(R), in which case thecommon value of cont+(S) and cont−(S) is

(4.14) V (S) =∫

R

χS(x) dV (s).

We mention that, if S = I1 × · · · × In is a cell, it is readily verified that the definitions in(4.5), (4.8), and (4.13) yield

cont+(S) = cont−(S) = `(I1) · · · `(In),

so the definition of V (S) given by (4.14) is consistent with that given in (4.4).It is easy to see that

(4.15) cont+(S) = inf N∑

k=1

V (Rk) : S ⊂ R1 ∪ · · · ∪RN

,

where Rk are cells contained in R. In the formal definition, the Rα in (4.15) should bepart of a partition P of R, as defined above, but if R1, . . . , RN are any cells in R, theycan be chopped up into smaller cells, some perhaps thrown away, to yield a finite cover ofS by cells in a partition of R, so one gets the same result.

It is an exercise to see that, for any set S ⊂ R,

(4.16) cont+(S) = cont+(S),

where S is the closure of S.We note that, generally, for a bounded function f on R,

(4.17) I(f) + I(1− f) = V (R).

This follows directly from (4.5). In particular, given S ⊂ R,

(4.18) cont−(S) + cont+(R \ S) = V (R).

Using this together with (4.16), with S and R \ S switched, we have

(4.19) cont−(S) = cont−(S),

57

whereS is the interior of S. The difference S \

S is called the boundary of S, and denoted

bS.Note that

(4.20) cont−(S) = sup N∑

k=1

V (Rk) : R1 ∪ · · · ∪RN ⊂ S

,

where here we take R1, . . . , RN to be cells within a partition P of R, and let P vary overall partitions of R. Now, given a partition P of R, classify each cell in P as either being

contained in R \S, or intersecting bS, or contained inS. Letting P vary over all partitions

of R, we see that

(4.21) cont+(S) = cont+(bS) + cont−(S).

In particular, we have:

Proposition 4.4. If S ⊂ R, then S is contented if and only if cont+(bS) = 0.

If a set Σ ⊂ R has the property that cont+(Σ) = 0, we say that Σ has content zero, oris a nil set. Clearly Σ is nil if and only if Σ is nil. It follows easily from Proposition 4.2that, if Σj are nil, 1 ≤ j ≤ K, then

⋃Kj=1 Σj is nil.

If S1, S2 ⊂ R and S = S1 ∪ S2, then S = S1 ∪ S2 andS ⊃

S1 ∪

S2. Hence bS ⊂

b(S1) ∪ b(S2). It follows then from Proposition 4.4 that, if S1 and S2 are contented, sois S1 ∪ S2. Clearly, if Sj are contented, so are Sc

j = R \ Sj . It follows that, if S1 andS2 are contented, so is S1 ∩ S2 =

(Sc

1 ∪ Sc2

)c. A family F of subsets of R such that

R ∈ F , Sj ∈ F ⇒ S1∪S2 ∈ F , and S1 ∈ F ⇒ R \S1 ∈ F is called an algebra of subsets ofR. Algebras of sets are automatically closed under finite intersections also. We see that:

Proposition 4.5. The family of contented subsets of R is an algebra of sets.

The following result specifies a useful class of Riemann integrable functions.

Proposition 4.6. If f : R → R is bounded and the set S of points of discontinuity of fis a nil set, then f ∈ R(R).

Proof. Suppose |f | ≤ M on R, and take ε > 0. Take a partition P of R, and writeP = P ′ ∪ P ′′, where cells in P ′ do not meet S, and cells in P ′′ do intersect S. Sincecont+(S) = 0, we can pick P so that the cells in P ′′ have total volume ≤ ε. Now f iscontinuous on each cell in P ′. Further refining the partition if necessary, we can assumethat f varies by ≤ ε on each cell in P ′. Thus

(4.22) IP(f)− IP(f) ≤ [V (R) + 2M

]ε.

This proves the proposition.

To give an example, suppose K ⊂ R is a closed set such that bK is nil. Let f : K → Rbe continuous. Define f : R → R by

(4.23)f(x) = f(x) for x ∈ K,

0 for x ∈ R \K.

58

Then the set of points of discontinuity of f is contained in bK. Hence f ∈ R(R). We set

(4.24)∫

K

f dV =∫

R

f dV.

In connection with this, we note the following fact, whose proof is an exercise. Suppose R

and R are cells, with R ⊂ R. Suppose that g ∈ R(R) and that g is defined on R, to beequal to g on R and to be 0 on R \R. Then

(4.25) g ∈ R(R), and∫

R

g dV =∫

R

g dV.

This can be shown by an argument involving refining any given pair of partitions of R andR, respectively, to a pair of partitions PR and P

Rwith the property that each cell in PR

is a cell in PR.

The following is a useful criterion for a set S ⊂ Rn to have content zero.

Proposition 4.7. Let Σ ⊂ Rn−1 be a closed bounded set and let g : Σ → R be continuous.Then the graph of g,

G =(

x, g(x))

: x ∈ Σ

is a nil subset of Rn.

Proof. Put Σ in a cell R0 ⊂ Rn−1. Suppose |f | ≤ M on Σ. Take N ∈ Z+ and set ε = M/N.Pick a partition P0 of R0, sufficiently fine that g varies by at most ε on each set Σ∩Rα, forany cell Rα ∈ P0. Partition the interval I = [−M, M ] into 2N equal intervals Jν , of lengthε. Then Rα × Jν = Qαν forms a partition of R0 × I. Now, over each cell Rα ∈ P0,there lie at most 2 cells Qαν which intersect G, so cont+(G) ≤ 2ε ·V (R0). Letting N →∞,we have the proposition.

Similarly, for any j ∈ 1, . . . , n, the graph of xj as a continuous function of the com-plementary variables is a nil set in Rn. So are finite unions of such graphs. Such sets ariseas boundaries of many ordinary-looking regions in Rn.

Here is a further class of nil sets.

Proposition 4.8. Let S be a compact nil subset of Rn and f : S → Rn a Lipschitz map.Then f(S) is a nil subset of Rn.

Proof. The Lipschitz hypothesis on f is that, for p, q ∈ S,

|f(p)− f(q)| ≤ C1|p− q|.If we cover S with k cells (in a partition), of total volume ≤ α, each cubical with edgesize δ,then f(S) is covered by k sets of diameter ≤ C1

√nδ, hence it can be covered by k cubical

cells of edgesize C1√

nδ, having total volume ≤ (C1√

n)nα. From this we have the (notvery sharp) general bound

(4.26) cont+(f(S)

) ≤ (C1

√n)n cont+(S),

which proves the proposition.In evaluating n-dimensional integrals, it is usually convenient to reduce them to iterated

integrals. The following is a special case of a result known as Fubini’s Theorem.

59

Theorem 4.9. Let Σ ⊂ Rn−1 be a closed, bounded contented set and let gj : Σ → R becontinuous, with g0(x) < g1(x) on Σ. Take

(4.27) Ω =(x, y) ∈ Rn : x ∈ Σ, g0(x) ≤ y ≤ g1(x)

.

Then Ω is a contented set in Rn. If f : Ω → R is continuous, then

(4.28) ϕ(x) =∫ g1(x)

g0(x)

f(x, y) dy

is continuous on Σ, and

(4.29)∫

Ω

f dVn =∫

Σ

ϕdVn−1,

i.e.,

(4.30)∫

Ω

f dVn =∫

Σ

(∫ g1(x)

g0(x)

f(x, y) dy

)dVn−1(x).

Proof. The continuity of (4.28) is an exercise in one-variable integration; see Exercises 2–3of §0. Let ω(δ) be a modulus of continuity for g0, g1, and f, and also ϕ. We also canassume that ω(δ) ≥ δ.

Put Σ in a cell R0 and let P0 be a partition of R0. If A ≤ g0 ≤ g1 ≤ B, partitionthe interval [A,B], and from this and P0 construct a partition P of R = R0 × [A,B]. Wedenote a cell in P0 by Rα and a cell in P by Rα` = Rα × J`. Pick points ξα` ∈ Rα`.

Write P0 = P ′0 ∪ P ′′0 ∪ P ′′′0 , consisting respectively of cells insideΣ, meeting bΣ, and

inside R0 \ Σ. Similarly write P = P ′ ∪ P ′′ ∪ P ′′′, consisting respectively of cells insideΩ,

meeting bΩ, and inside R \ Ω, as illustrated in Fig. 4.1. For fixed α, let

z′(α) = ` : Rα` ∈ P ′,

and let z′′(α) and z′′′(α) be similarly defined. Note that

z′(α) 6= ∅ ⇐⇒ Rα ∈ P ′0,

provided we assume maxsize (P) ≤ δ and 2δ < min[g1(x)− g0(x)], as we will from here on.It follows from (0.9)–(0.10) that

(4.31)∣∣∣

∑

`∈z′(α)

f(ξα`)`(J`)−∫ B(α)

A(α)

f(x, y) dy∣∣∣ ≤ (B −A)ω(δ), ∀ x ∈ Rα,

60

where⋃

`∈z′(α) J` = [A(α), B(α)]. Note that A(α) and B(α) are within 2ω(δ) of g0(x) andg1(x), respectively, for all x ∈ Rα, if Rα ∈ P ′0. Hence, if |f | ≤ M,

(4.32)∣∣∣∫ B(α)

A(α)

f(x, y) dy − ϕ(x)∣∣∣ ≤ 4Mω(δ).

Thus, with C = B −A + 4M,

(4.33)∣∣∣

∑

`∈z′(α)

f(ξα`)`(J`)− ϕ(x)∣∣∣ ≤ Cω(δ), ∀ x ∈ Rα ∈ P ′0.

Multiplying by Vn−1(Rα) and summing over Rα ∈ P ′0, we have

(4.34)∣∣∣

∑

Rα`∈P′f(ξα`)Vn(Rα`)−

∑

Rα∈P′0ϕ(xα)Vn−1(Rα)

∣∣∣ ≤ CV (R0)ω(δ),

where xα is an arbitrary point in Rα.Now, if P0 is a sufficiently fine partition of R0, it follows from the proof of Proposition

4.6 that the second sum in (4.34) is arbitrarily close to∫Σ

ϕ dVn−1, since bΣ has contentzero. Furthermore, an argument such as used to prove Proposition 4.7 shows that bΩ hascontent zero, and one verifies that, for a sufficiently fine partition, the first sum in (4.34)is arbitrarily close to

∫Ω

f dVn. This proves the desired identity (4.29).

We next take up the change of variables formula for multiple integrals, extending theone-variable formula, (0.42). We begin with a result on linear changes of variables. Theset of invertible real n× n matrices is denoted Gl(n,R).

Proposition 4.10. Let f be a continuous function with compact support in Rn. If A ∈Gl(n,R), then

(4.35)∫

f(x) dV = |det A|∫

f(Ax) dV.

Proof. Let G be the set of elements A ∈ Gl(n,R) for which (4.35) is true. Clearly I ∈ G.Using det A−1 = (det A)−1, and det AB = (det A)(det B), we conclude that G is asubgroup of Gl(n,R). To prove the proposition, it will suffice to show that G contains allelements of the following 3 forms, since the method of applying elementary row operationsto reduce a matrix shows that any element of Gl(n,R) is a product of a finite numberof these elements. Here, ej : 1 ≤ j ≤ n denotes the standard basis of Rn, and σ apermutation of 1, . . . , n.

(4.36)

A1ej = eσ(j),

A2ej = cjej , cj 6= 0A3e2 = e2 + ce1, A3ej = ej for j 6= 2.

The first two cases are elementary consequences of the definition of the Riemann integral,and can be left as exercises.

61

We show that (4.35) holds for transformations of the form A3 by using Theorem 4.9 (ina special case), to reduce it to the case n = 1. Given f ∈ C(R), compactly supported, andb ∈ R, we clearly have

(4.37)∫

f(x) dx =∫

f(x + b) dx.

Now, for the case A = A3, with x = (x1, x′), we have

(4.38)

∫f(x1 + cx2, x

′) dVn(x) =∫ (∫

f(x1 + cx2, x′) dx1

)dVn−1(x′)

=∫ (∫

f(x1, x′) dx1

)dVn−1(x′),

the second identity by (4.37). Thus we get (4.35) in case A = A3, so the proposition isproved.

It is desirable to extend Proposition 4.10 to some discontinuous functions. Given a cellR and f : R → R, bounded, we say

(4.39) f ∈ C(R) ⇐⇒ the set of discontinuities of f is nil.

Proposition 4.6 implies

(4.40) C(R) ⊂ R(R).

From the closure of the class of nil sets under finite unions it is clear that C(R) is closedunder sums and products, i.e., that C(R) is an algebra of functions on R. We will denoteby Cc(Rn) the set of functions f : Rn → R such that f has compact support and its set ofdiscontinuities is nil. Any f ∈ Cc(Rn) is supported in some cell R, and f

∣∣R∈ C(R). The

following result extends Proposition 4.10. A stronger extension will be given in Theorem4.15.

Proposition 4.11. Given A ∈ Gl(n,R), the identity (4.35) holds for all f ∈ Cc(Rn).

Proof. Assume |f | ≤ M. Suppose supp f is in the interior of a cell R. Let S be the setof discontinuities of f. The function g(x) = f(Ax) has A−1S as its set of discontinuities,which is also nil, by Proposition 11.8. Thus g ∈ Cc(Rn).

Take ε > 0. Let P be a partition of R, of maxsize ≤ δ, such that S is covered by cells inP of total volume ≤ ε. Write P = P ′∪P ′′∪P ′′′, where P ′′ consists of cells meeting S, P ′′′consists of all cells touching cells in P ′′, and P ′ consists of all the rest. Then P ′′ ∪ P ′′′consists of cells of total volume ≤ 3nε. Thus if we set

(4.41) B =⋃

P′Rα,

we have that B is compact, B ∩ S = ∅, and V (R \B) ≤ 3nε.

62

Now it is an exercise to construct a continuous function ϕ : R → R such that

(4.42) ϕ = 1 on B, ϕ = 0 near S, 0 ≤ ϕ ≤ 1,

and set

(4.43) f = ϕf + (1− ϕ)f = f1 + f2.

We see that f1 is continuous on Rn, with compact support, |f2| ≤ M, and f2 is supportedon R \B, which is a union of cells of total volume ≤ 3nε.

Now Proposition 4.10 applies directly to f1, to yield

(4.44)∫

f1(y) dV (y) = |det A|∫

f1(Ax) dV (x).

We have f2 ∈ Cc(Rn), and g2(x) = f2(Ax) ∈ Cc(Rn). Also, g2 is supported by A−1(R \B),which by the estimate (4.26) has the property cont+(R \B) ≤ (3C1

√n)nε. Thus

(4.45)∣∣∣∫

f2(y) dV (y)∣∣∣ ≤ 3nMε,

∣∣∣∫

f2(Ax) dV (x)∣∣∣ ≤

(3C1

√n)n

Mε.

Taking ε → 0, we have the proposition.

In particular, if Σ ⊂ Rn is a compact set that is contented, then its characteristicfunction χΣ, which has bΣ as its set of points of discontinuity, belongs to Cc(Rn), and wededuce:

Corollary 4.12. If Σ ⊂ Rn is a compact, contented set and A ∈ Gl(n,R), then A(Σ) =Ax : x ∈ Σ is contented, and

(4.46) V(A(Σ)

)= | detA|V (Σ).

We now extend Proposition 4.10 to nonlinear changes of variables.

Proposition 4.13. Let O and Ω be open in Rn, G : O → Ω a C1 diffeomorphism, and fa continuous function with compact support in Ω. Then

(4.47)∫

Ω

f(y) dV (y) =∫

O

f(G(x)

) |det DG(x)| dV (x).

Proof. It suffices to prove the result under the additional assumption that f ≥ 0, whichwe make from here on.

If K = supp f, put G−1(K) inside a rectangle R and let P = Rα be a partition of R.Let A be the set of α such that Rα ⊂ O. For α ∈ A, G(Rα) ⊂ Ω. See Fig. 4.2. Pick P fineenough that α ∈ A whenever Rα ∩G−1(K) is nonempty. Note that bG(Rα) = G(bRα), soG(Rα) is contented, in view of Propositions 4.4 and 4.8.

63

Let ξα be the center of Rα, and let Rα = Rα− ξα, a cell with center at the origin. Then

(4.48) G(ξα) + DG(ξα)(Rα

)= ηα + Hα

is an n-dimensional parallelipiped which is close to G(Rα), if Rα is small enough. SeeFig. 4.3. In fact, given ε > 0, if δ > 0 is small enough and maxsize(P) ≤ δ, then we have

(4.49) ηα + (1 + ε)Hα ⊃ G(Rα),

for all α ∈ A. Now, by (4.46),

(4.50) V (Hα) = |det DG(ξα)|V (Rα).

Hence

(4.51) V(G(Rα)

) ≤ (1 + ε)n|det DG(ξα)|V (Rα).

Now we have

(4.52)

∫f dV =

∑

α∈A

∫

G(Rα)

f dV

≤∑

α∈A

supRα

f G(x)V(G(Rα)

)

≤ (1 + ε)n∑

α∈A

supRα

f G(x) |det DG(ξα)|V (Rα).

To see that the first line of (4.52) holds, note that fχG(Rα) is Riemann integrable, byProposition 4.6; note also that

∑α fχG(Rα) = f except on a set of content zero. Then the

additivity result in Proposition 4.2 applies. The first inequality in (4.52) is elementary;the second inequality uses (4.51) and f ≥ 0. If we set

(4.53) h(x) = f G(x) |det DG(x)|,then we have

(4.54) supRα

f G(x) |det DG(ξα)| ≤ supRα

h(x) + Mω(δ),

provided |f | ≤ M and ω(δ) is a modulus of continuity for DG. Taking arbitrarily finepartitions, we get

(4.55)∫

Ω

f dV ≤∫

O

h dV.

If we apply this result, with G replaced by G−1, O and Ω switched, and f replaced byh, given by (4.53), we have

(4.56)∫

O

h dV ≤∫

Ω

h G−1(y) |det DG−1(y)| dV (y) =∫

Ω

f dV.

The inequalities (4.55) and (4.56) together yield the identity (4.47).

Now the same argument used to establish Proposition 4.11 from Proposition 4.10 yieldsthe following improvement of Proposition 4.13. We say f ∈ Cc(Ω) if f ∈ Cc(Rn) hassupport in an open set Ω.

64

Proposition 4.14. Let O and Ω be open in Rn, G : O → Ω a C1 diffeomorphism, andf ∈ Cc(Ω). Then f G ∈ Cc(O), and

(4.57)∫

Ω

f(y) dV (y) =∫

O

f(G(x)

) |det DG(x)| dV (x).

Proposition 4.14 is adequate for garden-variety functions, but it is natural to extend itto more general Riemann-integrable functions. Say f ∈ Rc(Rn) if f has compact support,say in some cell R, and f ∈ R(R). If Ω ⊂ Rn is open and f ∈ Rc(Rn) has support in Ω,we say f ∈ Rc(Ω).

Theorem 4.15. Let O and Ω be open in Rn, G : O → Ω a C1 diffeomorphism. Iff ∈ Rc(Ω), then f G ∈ Rc(O), and (4.57) holds.

Proof. We can find a compact K ⊂ Ω, a sequence of partitions Pν of a cell containing K,and functions ψν , ϕν , constant on the interior of each cell in the partition Pν , such that

supp ψν , ϕν ⊂ K, ψν ≤ f ≤ ϕν ,

and, with A =∫Ω

f(y) dV (y),

A− 1ν≤

∫ψν(y) dV (y) ≤ A ≤

∫ϕν(y) dV (y) ≤ A +

1ν

.

Now ϕν and ψν belong to Cc(Ω), so Proposition 4.14 applies. Thus

A− 1ν≤

∫

O

ψν

(G(x)

) |det DG(x)| dV (x)

≤ A ≤∫

O

ϕν

(G(x)

) |det DG(x)| dV (x) ≤ A +1ν

.

The proof is completed by the following result, whose proof in turn is an exercise:

Lemma 4.16. Let F : R → R be bounded, A ∈ R. Suppose that, for each ν ∈ Z+, thereexist Ψν , Φν ∈ R(R) such that

Ψν ≤ F ≤ Φν

andA− 1

ν≤

∫

R

Ψν(x) dV (x) ≤∫

R

Φν(x) dV (x) ≤ A +1ν

.

Then F ∈ R(R) and∫

RF (x) dV (x) = A.

The most frequently invoked case of the change of variable formula, in the case n = 2,involves the following change from Cartesian to polar coordinates:

(4.58) x = r cos θ, y = r sin θ.

65

Thus, take G(r, θ) = (r cos θ, r sin θ). We have

(4.59) DG(r, θ) =(

cos θ −r sin θsin θ r cos θ

), det DG(r, θ) = r.

For example, if ρ ∈ (0,∞) and

(4.60) Dρ = (x, y) ∈ R2 : x2 + y2 ≤ ρ2,then, for f ∈ C(Dρ),

(4.61)∫

Dρ

f(x, y) dA =∫ ρ

0

∫ 2π

0

f(r cos θ, r sin θ)r dθ dr.

To get this, we first apply Proposition 4.14, with O = [ε, ρ]×[0, 2π−ε], then apply Theorem4.9, then let ε 0.

It is often useful to integrate a function whose support is not bounded. Generally, givena bounded function f : Rn → R, we say f ∈ R(Rn) provided f

∣∣R∈ R(R) for each cell

R ⊂ Rn, and∫

R|f | dV ≤ C, for some C < ∞, independent of R. If f ∈ R(Rn), we set

(4.62)∫

Rn

f dV = lims→∞

∫

Rs

f dV, Rs = x ∈ Rn : |xj | ≤ s, ∀ j.

The existence of the limit in (4.62) can be established as follows. If M < N , then∫

RN

f dV −∫

RM

f dV =∫

RN\RM

f dV,

which is dominated in absolute value by∫

RN\RM|f | dV . If f ∈ R(Rn), then aN =∫

RN|f | dV is a bounded monotone sequence, which hence converges, so

∫

RN\RM

|f | dV =∫

RN

|f | dV −∫

RM

|f | dV −→ 0, as M, N →∞.

It can be shown that, if Kν is any sequence of compact contented subsets of Rn suchthat each Rs, for s < ∞, is contained in all Kν for ν sufficiently large, i.e., ν ≥ N(s), then,whenever f ∈ R(Rn),

(4.63)∫

Rn

f dV = limν→∞

∫

Kν

f dV.

Change of variables formulas and Fubini’s Theorem extend to this case. For example,the limiting case of (4.61) as ρ →∞ is

(4.64) f ∈ C(R2) ∩R(R2) =⇒∫

R2

f(x, y) dA =∫ ∞

0

∫ 2π

0

f(r cos θ, r sin θ)r dθ dr.

66

The following is a good example. Take f(x, y) = e−x2−y2. We have

(4.65)∫

R2

e−x2−y2dA =

∫ ∞

0

∫ 2π

0

e−r2r dθ dr = 2π

∫ ∞

0

e−r2r dr.

Now, methods of §0 allow the substitution s = r2, so

∫ ∞

0

e−r2r dr =

12

∫ ∞

0

e−s ds =12.

Hence

(4.66)∫

R2

e−x2−y2dA = π.

On the other hand, Theorem 4.9 extends to give

(4.67)

∫

R2

e−x2−y2dA =

∫ ∞

−∞

∫ ∞

−∞e−x2−y2

dy dx

=(∫ ∞

−∞e−x2

dx

)(∫ ∞

−∞e−y2

dy

).

Note that the two factors in the last product are equal. We deduce that

(4.68)∫ ∞

−∞e−x2

dx =√

π.

We can generalize (4.67), to obtain (via (4.68))

(4.69)∫

Rn

e−|x|2dVn =

(∫ ∞

−∞e−x2

dx

)n

= πn/2.

The integrals (4.65)–(4.69) are called Gaussian integrals, and their evaluation has manyuses. We shall see some in §5.

Exercises

1. Show that any two partitions of a cell R have a common refinement.Hint. Consider the argument given for the one-dimensional case in §0.

2. Write down a careful proof of the identity (4.16), i.e., cont+(S) = cont+(S).

67

3. Write down the details of the argument giving (4.25), on the independence of theintegral from the choice of cell.

4. Write down a direct proof that the transformation formula (4.35) holds for those lineartransformations of the form A1 and A2 in (4.36). Compare Exercise 1 of §0.

5. Show that there is a continuous function ϕ having the properties in (4.42).Hint. To start, Let K be a compact neighborhood of S, disjoint from B, and let ψ(x) =dist(x,K). Alter this to get ϕ.

6. Write down the details for a proof of Proposition 4.14.

7. Consider spherical polar coordinates on R3, given by

x = ρ sin ϕ cos θ, y = ρ sinϕ sin θ, z = ρ cosϕ,

i.e., take G(ρ, ϕ, θ) =(ρ sin ϕ cos θ, ρ sin ϕ sin θ, ρ cos θ

). Show that

det DG(ρ, ϕ, θ) = ρ2 sin ϕ.

Use this to compute the volume of the unit ball in R3.

8. Prove Lemma 4.16.

9. Prove the following variant of Lemma 4.16:

Lemma 4.16A. Let F : R → R be bounded. Suppose there exist Ψν , Φν ∈ R(R) such that

Ψν ≤ F ≤ Φν ,

∫

R

Φν(x) dV (x)−∫

R

Ψν(x) dV (x) ≤ 1ν

.

Then F ∈ R(R).

10. Given f1, f2 ∈ R(R), prove that f1f2 ∈ R(R).Hint. Considering the effect of adding constants, reduce to considering positive fj . Takepartitions Pν and positive functions ψjν , ϕjν , constant on the interior of each cell in Pν ,such that

0 < A ≤ ψjν ≤ fj ≤ ϕjν ≤ B

and ∫ψjν(x) dV (x),

∫ϕjν(x) dV (x) →

∫fj(x) dV (x).

Show that the result of Exercise 9 holds for F = f1f2, Ψν = ψ1νψ2ν , Φν = ϕ1νϕ2ν .

68

11. If R is a cell and S ⊂ R is a contented set, and f ∈ R(R), define∫

S

f(x) dV (x) =∫

R

χS(x)f(x) dV (x).

Show that, if Sj ⊂ R are contented and they are disjoint (or more generally cont+(S1 ∩S2) = 0), then, for f ∈ R(R),

∫

S1∪S2

f(x) dV (x) =∫

S1

f(x) dV (x) +∫

S2

f(x) dV (x).

12. Establish the convergence result (4.63), for all f ∈ R(Rn).

13. Extend the change of variable formula to elements of R(Rn).

14. Let f : Rn → R be unbounded. For s ∈ (0,∞), set

fs(x) = f(x) if |f(x)| ≤ s

0 otherwise.

Assume fs ∈ R(Rn) for each s < ∞, and assume∫

Rn

|fs(x)| dx ≤ B < ∞, ∀ s < ∞.

In such a case, say f ∈ R#(Rn). Set∫

Rn

f(x) dx = lims→∞

∫

Rn

fs(x) dx.

Show that this limit exists whenever f ∈ R#(Rn).

15. Extend the change of variable formula to functions in R#(Rn).Show that

f(x) = |x|−ae−|x|2 ∈ R#(Rn) ⇐⇒ a < n.

16. Theorem 4.9, relating multiple integrals and iterated integrals, played the followingrole in the proof of the change of variable formula (4.47). Namely, it was used to establishthe identity (4.50) for the volume of the parallelipiped Hα, via an appeal to Corollary 4.12,hence to Proposition 4.10, whose proof relied on Theorem 4.9.

Try to establish Corollary 4.12 directly, without using Theorem 4.9, in the case whenΣ is either a cell or the image of a cell under an element of Gl(n,R).

69

17. Assume f ∈ R(R), |f | ≤ M , and let ϕ : [−M, M ] → R be Lipschitz and monotone.Show directly from the definition that ϕ f ∈ R(R).

18. If ϕ : [−M, M ] → R is continuous and piecewise linear, show that you can writeϕ = ϕ1 − ϕ2 with ϕj Lipschitz and monotone. Deduce that f ∈ R(R) ⇒ ϕ f ∈ R(R)when ϕ is piecewise linear.

19. Assume uν ∈ R(R) and that uν → u uniformly on R. Show that u ∈ R(R). Deducethat if f ∈ R(R), |f | ≤ M , and ψ : [−M,M ] → R is continuous, then ψ f ∈ R(R).

Exercises on row reduction and matrix products

We consider the following three types of row operations on an n× n matrix A = (ajk).If σ is a permutation of 1, . . . , n, let

ρσ(A) = (aσ(j)k).

If c = (c1, . . . , cj), and all cj are nonzero, set

µc(A) = (c−1j ajk).

Finally, if c ∈ R and µ 6= ν, define

εµνc(A) = (bjk), bνk = aνk − caµk, bjk = ajk for j 6= ν.

We relate these operations to left multiplication by matrices Pσ,Mc, and Eµνc, defined bythe following actions on the standard basis e1, . . . , en of Rn:

Pσej = eσ(j), Mcej = cjej ,

andEµνceµ = eµ + ceν , Eµνcej = ej for j 6= µ.

1. Show thatA = Pσρσ(A), A = Mcµc(A), A = Eµνcεµνc(A).

2. Show that P−1σ = Pσ−1 .

3. Show that, if µ 6= ν, then Eµνc = P−1σ E21cPσ, for some permutation σ.

4. If B = ρσ(A) and C = µc(B), show that A = PσMcC. Generalize this to other caseswhere a matrix C is obtained from a matrix A via a sequence of row operations.

5. If A is an invertible, real n × n matrix (i.e., A ∈ Gl(n,R)), then the rows of A forma basis of Rn. Use this to show that A can be transformed to the identity matrix viaa sequence of row operations. Deduce that any A ∈ Gl(nR) can be written as a finiteproduct of matrices of the form Pσ, Mc and Eµνc, hence as a finite product of matrices ofthe form listed in (4.36).

70

5. Integration on surfaces

A smooth m-dimensional surface M ⊂ Rn is characterized by the following property.Given p ∈ M, there is a neighborhood U of p in M and a smooth map ϕ : O → U, from anopen set O ⊂ Rm bijectively to U, with injective derivative at each point. Such a map ϕis called a coordinate chart on M. We call U ⊂ M a coordinate patch. If all such maps ϕare smooth of class Ck, we say M is a surface of class Ck. In §8 we will define analogousnotions of a Ck surface with boundary, and of a Ck surface with corners.

There is an abstraction of the notion of a surface, namely the notion of a “manifold,”which is useful in more advanced studies. A treatment of calculus on manifolds can befound in [Spi].

If ϕ : O → U is a Ck coordinate chart, such as described above, and ϕ(x0) = p, we set

(5.1) TpM = Range Dϕ(x0),

a linear subspace of Rn of dimension m, and we denote by NpM its orthogonal complement.It is useful to consider the following map. Pick a linear isomorphism A : Rn−m → NpM ,and define

(5.2) Φ : O × Rn−m −→ Rn, Φ(x, z) = ϕ(x) + Az.

Thus Φ is a Ck map defined on an open subset of Rn. Note that

(5.3) DΦ(x0, 0)(

vw

)= Dϕ(x0)v + Aw,

so DΦ(x0, 0) : Rn → Rn is surjective, hence bijective, so the Inverse Function Theoremapplies; Φ maps some neighborhood of (x0, 0) diffeomorphically onto a neighborhood ofp ∈ Rn.

Suppose there is another Ck coordinate chart, ψ : Ω → U . Since ϕ and ψ are byhypothesis one-to-one and onto, it follows that F = ψ−1 ϕ : O → Ω is a well definedmap, which is one-to-one and onto. See Fig. 5.1. In fact, we can say more.

Lemma 5.1. Under the hypotheses above, F is a Ck diffeomorphism.

Proof. It suffices to show that F and F−1 are Ck on a neighborhood of x0 and y0, re-spectively, where ϕ(x0) = ψ(y0) = p. Let us define a map Ψ in a fashion similar to (5.2).To be precise, we set TpM = Range Dψ(y0), and let NpM be its orthogonal complement.(Shortly we will show that TpM = TpM , but we are not quite ready for that.) Then pick alinear isomorphism B : Rn−m → NpM and set Ψ(y, z) = ψ(y)+Bz, for (y, z) ∈ Ω×Rn−m.Again, Ψ is a Ck diffeomorphism from a neighborhood of (y0, 0) onto a neighborhood of p.

It follows that Ψ−1 Φ is a Ck diffeomeophism from a neighborhood of (x0, 0) onto aneighborhood of (y0, 0). Now note that, for x close to x0 and y close to y0,

(5.4) Ψ−1 Φ(x, 0) =(F (x), 0

), Φ−1 Ψ(y, 0) =

(F−1(y), 0

).

71

These identities imply that F and F−1 have the desired regularity.

Thus, when there are two such coordinate charts, ϕ : O → U, ψ : Ω → U , we have aCk diffeomorphism F : O → Ω such that

(5.5) ϕ = ψ F.

By the chain rule,

(5.6) Dϕ(x) = Dψ(y) DF (x), y = F (x).

In particular this implies that Range Dϕ(x0) = Range Dψ(y0), so TpM in (5.1) is inde-pendent of the choice of coordinate chart. It is called the tangent space to M at p.

We next define an object called the metric tensor on M . Given a coordinate chartϕ : O → U , there is associated an m × m matrix G(x) =

(gjk(x)

)of functions on O,

defined in terms of the inner product of vectors tangent to M :

(5.7) gjk(x) = Dϕ(x)ej ·Dϕ(x)ek =∂ϕ

∂xj· ∂ϕ

∂xk=

n∑

`=1

∂ϕ`

∂xj

∂ϕ`

∂xk,

where ej : 1 ≤ j ≤ m is the standard orthonormal basis of Rm. Equivalently,

(5.8) G(x) = Dϕ(x)t Dϕ(x).

We call (gjk) the metric tensor of M, on U , with respect to the coordinate chart ϕ : O → U .Note that this matrix is positive-definite. From a coordinate-independent point of view,the metric tensor on M specifies inner products of vectors tangent to M , using the innerproduct of Rn.

If we take another coordinate chart ψ : Ω → U, we want to compare (gjk) with H =(hjk), given by

(5.9) hjk(y) = Dψ(y)ej ·Dψ(y)ek, i.e., H(y) = Dψ(y)t Dψ(y).

As seen above we have a diffeomorphism F : O → Ω such that (5.5)–(5.6) hold. Conse-quently,

(5.10) G(x) = DF (x)t H(y) DF (x),

or equivalently,

(5.11) gjk(x) =∑

i,`

∂Fi

∂xj

∂F`

∂xkhi`(y).

We now define the notion of surface integral on M . If f : M → R is a continuousfunction supported on U, we set

(5.12)∫

M

f dS =∫

O

f ϕ(x)√

g(x) dx,

72

where

(5.13) g(x) = det G(x).

We need to know that this is independent of the choice of coordinate chart ϕ : O → U. Thus,if we use ψ : Ω → U instead, we want to show that (5.12) is equal to

∫Ω

f ψ(y)√

h(y) dy,where h(y) = det H(y). Indeed, since f ψ F = f ϕ, we can apply the change of variableformula, Theorem 4.14, to get

(5.14)∫

Ω

f ψ(y)√

h(y) dy =∫

O

f ϕ(x)√

h(F (x)) |det DF (x)| dx.

Now, (5.10) implies that

(5.15)√

g(x) = |det DF (x)|√

h(y),

so the right side of (5.14) is seen to be equal to (5.12), and our surface integral is welldefined, at least for f supported in a coordinate patch. More generally, if f : M → R hascompact support, write it as a finite sum of terms, each supported on a coordinate patch,and use (5.12) on each patch.

Let us pause to consider the special cases m = 1 and m = 2. For m = 1, we areconsidering a curve in Rn, say ϕ : [a, b] → Rn. Then G(x) is a 1 × 1 matrix, namelyG(x) = |ϕ′(x)|2. If we denote the curve in Rn by γ, rather than M, the formula (5.12)becomes

(5.16)∫

γ

f ds =∫ b

a

f ϕ(x) |ϕ′(x)| dx.

In case m = 2, let us consider a surface M ⊂ R3, with a coordinate chart ϕ : O → U ⊂ M.For f supported in U, an alternative way to write the surface integral is

(5.17)∫

M

f dS =∫

O

f ϕ(x) |∂1ϕ× ∂2ϕ| dx1dx2,

where u × v is the cross product of vectors u and v in R3. To see this, we compare thisintegrand with the one in (5.12). In this case,

(5.18) g = det(

∂1ϕ · ∂1ϕ ∂1ϕ · ∂2ϕ∂2ϕ · ∂1ϕ ∂2ϕ · ∂2ϕ

)= |∂1ϕ|2|∂2ϕ|2 − (∂1ϕ · ∂2ϕ)2.

Recall from (1.47) that |u × v| = |u| |v| | sin θ|, where θ is the angle between u and v.Equivalently, since u · v = |u| |v| cos θ,

(5.19) |u× v|2 = |u|2|v|2(1− cos2 θ)

= |u|2|v|2 − (u · v)2.

73

Thus we see that |∂1ϕ× ∂2ϕ| = √g, in this case, and (5.17) is equivalent to (5.12).

An important class of surfaces is the class of graphs of smooth functions. Let u ∈ C1(Ω),for an open Ω ⊂ Rn−1, and let M be the graph of z = u(x). The map ϕ(x) =

(x, u(u)

)provides a natural coordinate system, in which the metric tensor is given by

(5.20) gjk(x) = δjk +∂u

∂xj

∂u

∂xk.

If u is C1, we see that gjk is continuous. To calculate g = det(gjk), at a given point p ∈ Ω,if ∇u(p) 6= 0, rotate coordinates so that ∇u(p) is parallel to the x1 axis. We see that

(5.21)√

g =(1 + |∇u|2)1/2

.

In particular, the (n− 1)-dimensional volume of the surface M is given by

(5.22) Vn−1(M) =∫

M

dS =∫

Ω

(1 + |∇u(x)|2)1/2

dx.

Particularly important examples of surfaces are the unit spheres Sn−1 in Rn,

(5.23) Sn−1 = x ∈ Rn : |x| = 1.

Spherical polar coordinates on Rn are defined in terms of a smooth diffeomorphism

(5.24) R : (0,∞)× Sn−1 −→ Rn \ 0, R(r, ω) = rω.

Let (h`m) denote the metric tensor on Sn−1 (induced from its inclusion in Rn) with respectto some coordinate chart ϕ : O → U ⊂ Sn−1. Then, with respect to the coordinate chartΦ : (0,∞) × O → U ⊂ Rn given by Φ(r, y) = rϕ(y), the Euclidean metric tensor can bewritten

(5.25)(ejk

)=

(1

r2h`m

).

To see that the blank terms vanish, i.e., ∂rΦ · ∂xj Φ = 0, note that ϕ(x) · ϕ(x) = 1 ⇒∂xj ϕ(x) · ϕ(x) = 0. Now (5.25) yields

(5.26)√

e = rn−1√

h.

We therefore have the following result for integrating a function in spherical polar coordi-nates.

(5.27)∫

Rn

f(x) dx =∫

Sn−1

[∫ ∞

0

f(rω)rn−1 dr]dS(ω).

74

We next compute the (n − 1)-dimensional area An−1 of the unit sphere Sn−1 ⊂ Rn,using (5.27) together with the computation

(5.28)∫

Rn

e−|x|2

dx = πn/2,

from (4.69). First note that, whenever f(x) = ϕ(|x|), (5.27) yields

(5.29)∫

Rn

ϕ(|x|) dx = An−1

∫ ∞

0

ϕ(r)rn−1 dr.

In particular, taking ϕ(r) = e−r2and using (5.28), we have

(5.30) πn/2 = An−1

∫ ∞

0

e−r2rn−1 dr = 1

2An−1

∫ ∞

0

e−ssn/2−1 ds,

where we used the substitution s = r2 to get the last identity. We hence have

(5.31) An−1 =2πn/2

Γ(n2 )

,

where Γ(z) is Euler’s Gamma function, defined for z > 0 by

(5.32) Γ(z) =∫ ∞

0

e−ssz−1 ds.

We need to complement (5.31) with some results on Γ(z) allowing a computation of Γ(n/2)in terms of more familiar quantities. Of course, setting z = 1 in (5.32), we immediatelyget

(5.33) Γ(1) = 1.

Also, setting n = 1 in (5.30), we have

π1/2 = 2∫ ∞

0

e−r2dr =

∫ ∞

0

e−ss−1/2 ds,

or

(5.34) Γ(1

2

)= π1/2.

We can proceed inductively from (5.33)–(5.34) to a formula for Γ(n/2) for any n ∈ Z+,using the following.

75

Lemma 5.2. For all z > 0,

(5.35) Γ(z + 1) = zΓ(z).

Proof. We can write

Γ(z + 1) = −∫ ∞

0

( d

dse−s

)sz ds =

∫ ∞

0

e−s d

ds

(sz

)ds,

the last identity by integration by parts. The last expression here is seen to equal the rightside of (5.35).

Consequently, for k ∈ Z+,

(5.36) Γ(k) = (k − 1)!, Γ(k +

12

)=

(k − 1

2

)· · ·

(12

)π1/2.

Thus (5.31) can be rewritten

(5.37) A2k−1 =2πk

(k − 1)!, A2k =

2πk

(k − 1

2

) · · · ( 12

) .

We discuss another important example of a smooth surface, in the space M(n) ≈ Rn2

of real n × n matrices, namely SO(n), the set of matrices T ∈ M(n) satisfying T tT = Iand det T > 0 (hence det T = 1). The exponential map Exp: M(n) → M(n) defined byExp(A) = eA has the property

(5.38) Exp : Skew(n) −→ SO(n),

where Skew(n) is the set of skew-symmetric matrices in M(n). Also D Exp(0)A = A; hence

(5.39) D Exp(0) = ι : Skew(n) → M(n).

It follows from the Inverse Function Theorem that there is a neighborhood O of 0 inSkew(n) which is mapped by Exp diffeomorphically onto a smooth surface U ⊂ M(n), ofdimension m = n(n− 1)/2. Furthermore, U is a neighborhood of I in SO(n). For generalT ∈ SO(n), we can define maps

(5.40) ϕT : O −→ SO(n), ϕT (A) = T Exp(A),

and obtain coordinate charts in SO(n), which is consequently a smooth surface of dimen-sion n(n− 1)/2 in M(n). Note that SO(n) is a closed bounded subset of M(n); hence it iscompact.

We use the inner product on M(n) computed componentwise; equivalently,

(5.41) 〈A,B〉 = Tr (BtA) = Tr (BAt).

This produces a metric tensor on SO(n). The surface integral over SO(n) has the followingimportant invariance property.

76

Proposition 5.3. Given f ∈ C(SO(n)

), if we set

(5.42) ρT f(X) = f(XT ), λT f(X) = f(TX),

for T,X ∈ SO(n), we have

(5.43)∫

SO(n)

ρT f dS =∫

SO(n)

λT f dS =∫

SO(n)

f dS.

Proof. Given T ∈ SO(n), the maps RT , LT : M(n) → M(n) defined by RT (X) =XT, LT (X) = TX are easily seen from (5.41) to be isometries. Thus they yield maps ofSO(n) to itself which preserve the metric tensor, proving (5.43).

Since SO(n) is compact, its total volume V(SO(n)

)=

∫SO(n)

1 dS is finite. We definethe integral with respect to “Haar measure”

(5.44)∫

SO(n)

f(g) dg =1

V(SO(n)

)∫

SO(n)

f dS.

This is used in many arguments involving “averaging over rotations.”

Exercises

1. Define ϕ : [0, θ] → R2 to be ϕ(t) = (cos t, sin t). Show that, if 0 < θ ≤ 2π, the image of[0, θ] under ϕ is an arc of the unit circle, of length θ. Deduce that the unit circle in R2 hastotal length 2π. This result follows also from (5.37).Remark. Use the definition of π given in the auxiliary problem set after §3.This length formula provided the original definition of π, in ancient Greek geometry.

2. Compute the volume of the unit ball Bn = x ∈ Rn : |x| ≤ 1.Hint. Apply (5.29) with ϕ = χ[0,1].

3. Taking the upper half of the sphere Sn to be the graph of xn+1 = (1 − |x|2)1/2, forx ∈ Bn, the unit ball in Rn, deduce from (5.22) and (5.29) that

An = 2An−1

∫ 1

0

rn−1

√1− r2

dr = 2An−1

∫ π/2

0

(sin θ)n−1 dθ.

Use this to get an alternative derivation of the formula (5.37) for An.Hint. Rewrite this formula as

An = An−1bn−1, bk =∫ π

0

sink θ dθ.

77

To analyze bk, you can write, on the one hand,

bk+2 = bk −∫ π

0

sink θ cos2 θ dθ,

and on the other, using integration by parts,

bk+2 =∫ π

0

cos θd

dθsink+1 θ dθ.

Deduce thatbk+2 =

k + 1k + 2

bk.

4. Suppose M is a surface in Rn of dimension 2, and ϕ : O → U ⊂ M is a coordinatechart, with O ⊂ R2. Set ϕjk(x) =

(ϕj(x), ϕk(x)

), so ϕjk : O → R2. Show that the formula

(5.12) for the surface integral is equivalent to

∫

M

f dS =∫

O

f ϕ(x)√∑

j<k

(det Dϕjk(x)

)2

dx.

Hint. Show that the quantity under √ is equal to (5.18).

5. If M is an m-dimensional surface, ϕ : O → M ⊂ M a coordinate chart, for J =(j1, . . . , jm) set

ϕJ(x) =(ϕj1(x), . . . , ϕjm(x)

), ϕJ : O → Rm.

Show that the formula (5.12) is equivalent to

∫

M

f dS =∫

O

f ϕ(x)√ ∑

j1<···<jm

(det DϕJ(x)

)2

dx.

Hint. Reduce to the following. For fixed x0 ∈ O, the quantity under √ is equal to g(x)at x = x0, in the case Dϕ(x0) =

(Dϕ1(x0), . . . , Dϕm(x0), 0, . . . , 0

).

Reconsider this problem when working on the exercises for §6.

6. Let M be the graph in Rn+1 of xn+1 = u(x), x ∈ O ⊂ Rn. Show that, for p = (x, u(x)) ∈M, TpM (given as in (5.1)) has a 1-dimensional orthogonal complement NpM , spannedby (−∇u(x), 1). We set N =

(1 + |∇u|2)−1/2(−∇u, 1), and call it the (upward-pointing)

unit normal to M .

7. Let M be as in Exercise 6, and define N as done there. Show that, for a continuousfunction f : M → Rn+1,

∫

M

f ·N dS =∫

O

f(x, u(x)

) · (−∇u(x), 1)dx.

78

The left side is often denoted∫

Mf · dS.

8. Let M be a 2-dimensional surface in R3, covered by a single coordinate chart, ϕ : O →M . Suppose f : M → R3 is continuous. Show that, if

∫M

f · dS is defined as in Exercise6, then ∫

M

f · dS =∫

O

f(ϕ(x)

) · (∂1ϕ× ∂2ϕ) dx.

9. Consider a symmetric n × n matrix A = (ajk) of the form ajk = vjvk. Show that therange of A is the one-dimensional space spanned by v = (v1, . . . , vn) (if this is nonzero).Deduce that A has exactly one nonzero eigenvalue, namely λ = |v|2. Use this to giveanother derivation of (5.21) from (5.20).

10. Let Ω ⊂ Rn be open and u : Ω → R be a Ck map. Fix c ∈ R and consider

S = x ∈ Ω : u(x) = c.

Assume S 6= ∅ and that ∇u(x) 6= 0 for all x ∈ S. Show that S is a Ck surface of dimensionn − 1. Show that, for each p ∈ S, TpS has a 1-dimensional orthogonal complement NpSspanned by ∇u(p).Hint. Use the Implicit Function Theorem.

11. Let S be as in Exercise 10. Assume moreover that there is a Ck map ϕ : O → R,with O ⊂ Rn−1 open, such that u(x′, ϕ(x′)) = c, and that x′ 7→ (x′, ϕ(x′)) parametrizesS. Show that ∫

S

f dS =∫

O

f|∇u||∂nu| dx′,

where the functions in the integrand on the right are evaluated at (x′, ϕ(x′)).Hint. Compare the formula in Exercise 6 for N with the fact that ±N = ∇u/|∇u|, andkeep in mind the formula (5.22).

79

6. Differential forms

It is very desirable to be able to make constructions which depend as little as possibleon a particular choice of coordinate system. The calculus of differential forms, whose studywe now take up, is one convenient set of tools for this purpose.

We start with the notion of a 1-form. It is an object that gets integrated over a curve;formally, a 1-form on Ω ⊂ Rn is written

(6.1) α =∑

j

aj(x) dxj .

If γ : [a, b] → Ω is a smooth curve, we set

(6.2)∫

γ

α =∫ b

a

∑aj

(γ(t)

)γ′j(t) dt.

In other words,

(6.3)∫

γ

α =∫

I

γ∗α

where I = [a, b] andγ∗α =

∑

j

aj(γ(t))γ′j(t) dt

is the pull-back of α under the map γ. More generally, if F : O → Ω is a smooth map(O ⊂ Rm open), the pull-back F ∗α is a 1-form on O defined by

(6.4) F ∗α =∑

j,k

aj(F (y))∂Fj

∂ykdyk.

The usual change of variable for integrals gives

(6.5)∫

γ

α =∫

σ

F ∗α

if γ is the curve F σ.If F : O → Ω is a diffeomorphism, and

(6.6) X =∑

bj(x)∂

∂xj

80

is a vector field on Ω, recall from (3.40) that we have the vector field on O :

(6.7) F#X(y) =(DF−1(p)

)X(p), p = F (y).

If we define a pairing between 1-forms and vector fields on Ω by

(6.8) 〈X, α〉 =∑

j

bj(x)aj(x) = b · a,

a simple calculation gives

(6.9) 〈F#X, F ∗α〉 = 〈X, α〉 F.

Thus, a 1-form on Ω is characterized at each point p ∈ Ω as a linear transformation of thespace of vectors at p to R.

More generally, we can regard a k-form α on Ω as a k-multilinear map on vector fields:

(6.10) α(X1, . . . , Xk) ∈ C∞(Ω);

we impose the further condition of anti-symmetry when k ≥ 2:

(6.11) α(X1, . . . , Xj , . . . , X`, . . . , Xk) = −α(X1, . . . , X`, . . . , Xj , . . . , Xk).

Let us note that a 0-form is simply a function.There is a special notation we use for k-forms. If 1 ≤ j1 < · · · < jk ≤ n, j = (j1, . . . , jk),

we set

(6.12) α =∑

j

aj(x) dxj1 ∧ · · · ∧ dxjk

where

(6.13) aj(x) = α(Dj1 , . . . , Djk), Dj = ∂/∂xj .

More generally, we assign meaning to (6.12) summed over all k-indices (j1, . . . , jk), wherewe identify

(6.14) dxj1 ∧ · · · ∧ dxjk= (sgn σ) dxjσ(1) ∧ · · · ∧ dxjσ(k) ,

σ being a permutation of 1, . . . , k. If any jm = j` (m 6= `), then (6.14) vanishes. Acommon notation for the statement that α is a k-form on Ω is

(6.15) α ∈ Λk(Ω).

In particular, we can write a 2-form β as

(6.16) β =∑

bjk(x) dxj ∧ dxk

81

and pick coefficients satisfying bjk(x) = −bkj(x). According to (6.12)–(6.13), if we setU =

∑uj(x)∂/∂xj and V =

∑vj(x)∂/∂xj , then

(6.17) β(U, V ) = 2∑

bjk(x)uj(x)vk(x).

If bjk is not required to be antisymmetric, one gets β(U, V ) =∑

(bjk − bkj)ujvk.If F : O → Ω is a smooth map as above, we define the pull-back F ∗α of a k-form α,

given by (6.12), to be

(6.18) F ∗α =∑

j

aj

(F (y)

)(F ∗dxj1) ∧ · · · ∧ (F ∗dxjk

)

where

(6.19) F ∗dxj =∑

`

∂Fj

∂y`dy`,

the algebraic computation in (6.18) being performed using the rule (6.14). Extending (6.9),if F is a diffeomorphism, we have

(6.20) (F ∗α)(F#X1, . . . , F#Xk) = α(X1, . . . , Xk) F.

If B = (bjk) is an n× n matrix, then, by (6.14),

(6.21)

(∑

k

b1kdxk

)∧

(∑

k

b2kdxk

)∧ · · · ∧

(∑

k

bnkdxk

)

=(∑

σ

(sgn σ)b1σ(1)b2σ(2) · · · bnσ(n)

)dx1 ∧ · · · ∧ dxn

=(det B

)dx1 ∧ · · · ∧ dxn.

Hence, if F : O → Ω is a C1 map between two domains of dimension n, and

(6.22) α = A(x) dx1 ∧ · · · ∧ dxn

is an n-form on Ω, then

(6.23) F ∗α = det DF (y) A(F (y)) dy1 ∧ · · · ∧ dyn.

Comparison with the change of variable formula for multiple integrals suggests that onehas an intrinsic definition of

∫Ω

α when α is an n-form on Ω, n = dim Ω. To implementthis, we need to take into account that det DF (y) rather than |det DF (y)| appears in(6.21). We say a smooth map F : O → Ω between two open subsets of Rn preservesorientation if det DF (y) is everywhere positive. The object called an “orientation” onΩ can be identified as an equivalence class of nowhere vanishing n-forms on Ω, two suchforms being equivalent if one is a multiple of another by a positive function in C∞(Ω);

82

the standard orientation on Rn is determined by dx1 ∧ · · · ∧ dxn. If S is an n-dimensionalsurface in Rn+k, an orientation on S can also be specified by a nowhere vanishing formω ∈ Λn(S). If such a form exists, S is said to be orientable. The equivalence class ofpositive multiples a(x)ω is said to consist of “positive” forms. A smooth map ψ : S → Mbetween oriented n-dimensional surfaces preserves orientation provided ψ∗σ is positive onS whenever σ ∈ Λn(M) is positive. If S is oriented, one can choose coordinate chartswhich are all orientation preserving. We mention that there exist surfaces that cannot beoriented, such as the famous “Mobius strip.”

We define the integral of an n-form over an oriented n-dimensional surface as follows.First, if α is an n-form supported on an open set Ω ⊂ Rn, given by (6.22), then we set

(6.24)∫

Ω

α =∫

Ω

A(x) dV (x),

the right side defined as in §4. If O is also open in Rn and F : O → Ω is an orientationpreserving diffeomorphism, we have

(6.25)∫

O

F ∗α =∫

Ω

α,

as a consequence of (6.23) and the change of variable formula (4.47). More generally, if Sis an n-dimensional surface with an orientation, say the image of an open set O ⊂ Rn byϕ : O → S, carrying the natural orientation of O, we can set

(6.26)∫

S

α =∫

O

ϕ∗α

for an n-form α on S. If it takes several coordinate patches to cover S, define∫

Sα by

writing α as a sum of forms, each supported on one patch.We need to show that this definition of

∫S

α is independent of the choice of coordinatesystem on S (as long as the orientation of S is respected). Thus, suppose ϕ : O → U ⊂ Sand ψ : Ω → U ⊂ S are both coordinate patches, so that F = ψ−1 ϕ : O → Ω is anorientation-preserving diffeomorphism, as in Fig. 5.1 of the last section. We need to checkthat, if α is an n-form on S, supported on U, then

(6.27)∫

O

ϕ∗α =∫

Ω

ψ∗α.

To establish this, we first show that, for any form α of any degree,

(6.28) ψ F = ϕ =⇒ ϕ∗α = F ∗ψ∗α.

It suffices to check (6.28) for α = dxj . Then (6.19) gives ψ∗ dxj =∑

(∂ψj/∂x`) dx`, so

(6.29) F ∗ψ∗ dxj =∑

`,m

∂F`

∂xm

∂ψj

∂x`dxm, ϕ∗ dxj =

∑m

∂ϕj

∂xmdxm;

83

but the identity of these forms follows from the chain rule:

(6.30) Dϕ = (Dψ)(DF ) =⇒ ∂ϕj

∂xm=

∑

`

∂ψj

∂x`

∂F`

∂xm.

Now that we have (6.28), we see that the left side of (6.27) is equal to

(6.31)∫

O

F ∗(ψ∗α),

which is equal to the right side of (6.27), by (6.25). Thus the integral of an n-form overan oriented n-dimensional surface is well defined.

Exercises

1. If F : U0 → U1 and G : U1 → U2 are smooth maps and α ∈ Λk(U2), then (6.26) implies

(G F )∗α = F ∗(G∗α) in Λk(U0).

In the special case that Uj = Rn and F and G are linear maps, and k = n, show that thisidentity implies

det(GF ) = (det F )(det G).

2. Let ΛkRn denote the space of k-forms (6.12) with constant coefficients. Show that

dimR ΛkRn =(

n

k

).

If T : Rm → Rn is linear, then T ∗ preserves this class of spaces; we denote the map

ΛkT ∗ : ΛkRn −→ ΛkRm.

Similarly, replacing T by T ∗ yields

ΛkT : ΛkRm −→ ΛkRn.

3. Show that ΛkT is uniquely characterized as a linear map from ΛkRm to ΛkRn whichsatisfies

(ΛkT )(v1 ∧ · · · ∧ vk) = (Tv1) ∧ · · · ∧ (Tvk), vj ∈ Rm.

4. If e1, . . . , en is the standard orthonormal basis of Rn, define an inner product onΛkRn by declaring an orthonormal basis to be

ej1 ∧ · · · ∧ ejk: 1 ≤ j1 < · · · < jk ≤ n.

Show that, if u1, . . . , un is any other orthonormal basis of Rn, then the set uj1∧· · ·∧ujk:

1 ≤ j1 < · · · < jk ≤ n is an orthonormal basis of ΛkRn.

5. Let ϕ : O → Rn be smooth, with O ⊂ Rm open. Show that, for each x ∈ O,

‖ΛmDϕ(x)ω‖2 = det Dϕ(x)tDϕ(x),

where ω = e1 ∧ · · · ∧ em. Take another look at Exercise 5 of §5.

84

7. Products and exterior derivatives of forms

Having discussed the notion of a differential form as something to be integrated, wenow consider some operations on forms. There is a wedge product, or exterior product,characterized as follows. If α ∈ Λk(Ω) has the form (6.12) and if

(7.1) β =∑

i

bi(x) dxi1 ∧ · · · ∧ dxi`∈ Λ`(Ω),

define

(7.2) α ∧ β =∑

j,i

aj(x)bi(x) dxj1 ∧ · · · ∧ dxjk∧ dxi1 ∧ · · · ∧ dxi`

in Λk+`(Ω). A special case of this arose in (6.18)–(6.21). We retain the equivalence (6.14).It follows easily that

(7.3) α ∧ β = (−1)k`β ∧ α.

In addition, there is an interior product if α ∈ Λk(Ω) with a vector field X on Ω,producing ιXα = αcX ∈ Λk−1(Ω), defined by

(7.4) (αcX)(X1, . . . , Xk−1) = α(X,X1, . . . , Xk−1).

Consequently, if α = dxj1 ∧ · · · ∧ dxjk, Di = ∂/∂xi, then

(7.5) αcDj`= (−1)`−1dxj1 ∧ · · · ∧ dxj`

∧ · · · ∧ dxjk

where dxj`denotes removing the factor dxj`

. Furthermore,

i /∈ j1, . . . , jk =⇒ αcDi = 0.

If F : O → Ω is a diffeomorphism and α, β are forms and X a vector field on Ω, it isreadily verified that

(7.6) F ∗(α ∧ β) = (F ∗α) ∧ (F ∗β), F ∗(αcX) = (F ∗α)c(F#X).

We make use of the operators ∧k and ιk on forms:

(7.7) ∧kα = dxk ∧ α, ιkα = αcDk.

There is the following useful anticommutation relation:

(7.8) ∧kι` + ι`∧k = δk`,

85

where δk` is 1 if k = `, 0 otherwise. This is a fairly straightforward consequence of (7.5).We also have

(7.9) ∧j ∧k + ∧k ∧j = 0, ιjιk + ιkιj = 0.

From (7.8)–(7.9) one says that the operators ιj ,∧j : 1 ≤ j ≤ n generate a “Cliffordalgebra.”

Another important operator on forms is the exterior derivative:

(7.10) d : Λk(Ω) −→ Λk+1(Ω),

defined as follows. If α ∈ Λk(Ω) is given by (6.12), then

(7.11) dα =∑

j,`

∂aj

∂x`dx` ∧ dxj1 ∧ · · · ∧ dxjk

.

Equivalently,

(7.12) dα =n∑

`=1

∂` ∧` α

where ∂` = ∂/∂x` and ∧` is given by (7.7). The antisymmetry dxm ∧ dx` = −dx` ∧ dxm,together with the identity ∂2aj/∂x`∂xm = ∂2aj/∂xm∂x`, implies

(7.13) d(dα) = 0,

for any differential form α. We also have a product rule:

(7.14) d(α ∧ β) = (dα) ∧ β + (−1)kα ∧ (dβ), α ∈ Λk(Ω), β ∈ Λj(Ω).

The exterior derivative has the following important property under pull-backs:

(7.15) F ∗(dα) = dF ∗α,

if α ∈ Λk(Ω) and F : O → Ω is a smooth map. To see this, extending (7.14) to a formulafor d(α ∧ β1 ∧ · · · ∧ β`) and using this to apply d to F ∗α, we have

(7.16)

dF ∗α =∑

j,`

∂

∂x`

(aj F (x)

)dx` ∧

(F ∗dxj1

) ∧ · · · ∧ (F ∗dxjk

)

+∑

j,ν

(±)aj

(F (x)

)(F ∗dxj1

) ∧ · · · ∧ d(F ∗dxjν

) ∧ · · · ∧ (F ∗dxjk

).

Now the definition (6.18)–(6.19) of pull-back gives directly that

(7.17) F ∗dxi =∑

`

∂Fi

∂x`dx` = dFi,

86

and hence d(F ∗dxi) = ddfi = 0, so only the first sum in (7.16) contributes to dF ∗α.Meanwhile,

(7.18) F ∗dα =∑

j,m

∂aj

∂xm

(F (x)

)(F ∗dxm) ∧ (

F ∗dxj1

) ∧ · · · ∧ (F ∗dxjk

),

so (7.15) follows from the identity

∑

`

∂

∂x`

(aj F (x)

)dx` =

∑m

∂aj

∂xm

(F (x)

)F ∗dxm,

which in turn follows from the chain rule.If dα = 0, we say α is closed; if α = dβ for some β ∈ Λk−1(Ω), we say α is exact. Formula

(7.13) implies that every exact form is closed. The converse is not always true globally.Consider the multi-valued angular coordinate θ on R2 \ (0, 0); dθ is a single valued closedform on R2 \ (0, 0) which is not globally exact. An important result of H. Poincare, is thatevery closed form is locally exact. A proof can be found in [T:1]; here we discuss one basiccase, which will be used in §10.

Proposition 7.1. If α is a 1-form on B = x ∈ R2 : |x| < 1 and dα = 0, then thereexists a real valued u ∈ C∞(B) such that α = du.

In fact, let us set

(7.19) uj(x) =∫

γj(x)

α,

where γj(x) is a path from 0 to x = (x1, x2) which consists of two line segments. Thepath first goes from 0 to (0, x2) and then from (0, x2) to x, if j = 1, while if j = 2 it firstgoes from 0 to (x1, 0) and then from (x1, 0) to x. See Fig. 7.1. It is easy to verify that∂uj/∂xj = αj(x). We claim that u1 = u2, or equivalently that

(7.20)∫

σ(x)

α = 0,

where σ(x) is a closed path consisting of γ2(x) followed by γ1(x), in reverse. In fact, Stokes’formula, which we will establish in the next section, implies that

(7.21)∫

σ(x)

α =∫

R(x)

dα,

where R(x) is the rectangle whose boundary is σ(x). If dα = 0, then (7.21) vanishes, andwe have the desired function u : u = u1 = u2.

87

Exercises

1. If α is a k-form, verify the formula (7.14), i.e., d(α ∧ β) = (dα) ∧ β + (−1)kα ∧ dβ.If α is closed and β is exact, show that α ∧ β is exact.

2. Let F be a vector field on U, open in R3, F =∑3

1 fj(x)∂/∂xj . Consider the 1-formϕ =

∑31 fj(x)dxj . Show that dϕ and curl F are related in the following way:

curl F =3∑1

gj(x) ∂/∂xj ,

dϕ = g1(x) dx2 ∧ dx3 + g2(x) dx3 ∧ dx1 + g3(x) dx1 ∧ dx2.

3. If F and ϕ are related as in Exercise 2, show that curl F is uniquely specified by therelation

dϕ ∧ α = 〈curl F, α〉ωfor all 1-forms α on U ⊂ R3, where ω = dx1 ∧ dx2 ∧ dx3 is the volume form.

4. Let B be a ball in R3, F a smooth vector field on B. Show that

∃ u ∈ C∞(B) s.t. F = grad u =⇒ curl F = 0.

Hint. Compare F = grad u with ϕ = du.

5. Let B be a ball in R3 and G a smooth vector field on B. Show that

∃ vector field F s.t. G = curl F =⇒ div G = 0.

Hint. If G =∑3

1 gj(x)dxj , consider ψ = g1(x) dx2∧dx3 +g2(x) dx3∧dx1 +g3(x) dx1∧dx2.Compute dψ.

6. Show that the 1-form dθ mentioned just before Proposition 7.1 is given by

dθ =x dy − y dx

x2 + y2.

For the next set of exercises, let Ω be a planar domain, X = f(x, y) ∂/∂x + g(x, y) ∂/∂y anonvanishing vector field on Ω. Consider the 1-form α = g(x, y) dx− f(x, y) dy.

7. Let γ : I → Ω be a smooth curve, I = (a, b). Show that the image C = γ(I) is theimage of an integral curve of X if and only if γ∗α = 0. Consequently, with slight abuse ofnotation, one describes the integral curves by g dx− f dy = 0.If α is exact, i.e., α = du, conclude the level curves of u are the integral curves of X.

8. A function ϕ is called an integrating factor if α = ϕα is exact, i.e., if d(ϕα) = 0,provided Ω is simply connected. Show that an integrating factor always exists, at leastlocally. Show that ϕ = ev is an integrating factor if and only if Xv = − div X.Find an integrating factor for α = (x2 + y2 − 1) dx− 2xy dy.

88

8. The general Stokes formula

The Stokes formula involves integrating a k-form over a k-dimensional surface withboundary. We first define that concept. Let S be a smooth k-dimensional surface (say inRN ), and let M be an open subset of S, such that its closure M (in RN ) is contained in S.Its boundary is ∂M = M \M . We say M is a smooth surface with boundary if also ∂Mis a smooth (k − 1)-dimensional surface. In such a case, any p ∈ ∂M has a neighborhoodU ⊂ S with a coordinate chart ϕ : O → U , where O is an open neighborhood of 0 in Rk,such that ϕ(0) = p and ϕ maps x ∈ O : x1 = 0 onto U ∩ ∂M .

If S is oriented, then M is oriented, and ∂M inherits an orientation, uniquely determinedby the following requirement: if

(8.1) M = Rk− = x ∈ Rk : x1 ≤ 0,

then ∂M = (x2, . . . , xk) has the orientation determined by dx2 ∧ · · · ∧ dxk.We can now state the Stokes formula.

Proposition 8.1. Given a compactly supported (k−1)-form β of class C1 on an orientedk-dimensional surface M (of class C2) with boundary ∂M, with its natural orientation,

(8.2)∫

M

dβ =∫

∂M

β.

Proof. Using a partition of unity and invariance of the integral and the exterior derivativeunder coordinate transformations, it suffices to prove this when M has the form (8.1). Inthat case, we will be able to deduce (8.2) from the Fundamental Theorem of Calculus.Indeed, if

(8.3) β = bj(x) dx1 ∧ · · · ∧ dxj ∧ · · · ∧ dxk,

with bj(x) of bounded support, we have

(8.4) dβ = (−1)j−1 ∂bj

∂xjdx1 ∧ · · · ∧ dxk.

If j > 1, we have

(8.5)∫

M

dβ = (−1)j−1

∫ ∫ ∞

−∞

∂bj

∂xjdxj

dx′ = 0,

89

and also κ∗β = 0, where κ : ∂M → M is the inclusion. On the other hand, for j = 1, wehave

(8.6)

∫

M

dβ =∫ ∫ 0

−∞

∂b1

∂x1dx1

dx2 · · · dxk

=∫

b1(0, x′) dx′

=∫

∂M

β.

This proves Stokes’ formula (8.2).

It is useful to allow singularities in ∂M. We say a point p ∈ M is a corner of dimension νif there is a neighborhood U of p in M and a C2 diffeomorphism of U onto a neighborhoodof 0 in

(8.7) K = x ∈ Rk : xj ≤ 0, for 1 ≤ j ≤ k − ν,

where k is the dimension of M. If M is a C2 surface and every point p ∈ ∂M is a corner (ofsome dimension), we say M is a C2 surface with corners. In such a case, ∂M is a locallyfinite union of C2 surfaces with corners. The following result extends Proposition 8.1.

Proposition 8.2. If M is a C2 surface of dimension k, with corners, and β is a compactlysupported (k − 1)-form of class C1 on M, then (8.2) holds.

Proof. It suffices to establish this when β is supported on a small neighborhood of a cornerp ∈ ∂M, of the form U described above. Hence it suffices to show that (8.3) holds wheneverβ is a (k− 1)-form of class C1, with compact support on K in (8.7); and we can take β tohave the form (8.3). Then, for j > k− ν, (8.5) still holds, while, for j ≤ k− ν, we have, asin (8.6),

(8.8)

∫

K

dβ = (−1)j−1

∫ ∫ 0

−∞

∂bj

∂xjdxj

dx1 · · · dxj · · · dxk

= (−1)j−1

∫bj(x1, . . . , xj−1, 0, xj+1, . . . , xk) dx1 · · · dxj · · · dxk

=∫

∂K

β.

The reason we required M to be a surface of class C2 (with corners) in Propositions8.1 and 8.2 is the following. Due to the formulas (6.18)–(6.19) for a pull-back, if β is ofclass Cj and F is of class C`, then F ∗β is generally of class Cµ, with µ = min(j, ` − 1).Thus, if j = ` = 1, F ∗β might be only of class C0, so there is not a well-defined notionof a differential form of class C1 on a C1 surface, though such a notion is well defined on

90

a C2 surface. This problem can be overcome, and one can extend Propositions 8.1–8.2 tothe case where M is a C1 surface (with corners), and β is a (k−1)-form with the propertythat both β and dβ are continuous. We will not go into the details. Substantially moresophisticated generalizations are given in [Fed].

Exercises

1. Consider the region Ω ⊂ R2 defined by

Ω = (x, y) : 0 ≤ y ≤ x2, 0 ≤ x ≤ 1.

Show that the boundary points (1, 0) and (1, 1) are corners, but (0, 0) is not a corner.The boundary of Ω is too sharp at (0, 0) to be a corner; it is called a “cusp.” ExtendProposition 8.2. to treat this region.

2. Suppose U ⊂ Rn is an open set with smooth boundary M = ∂U, and U has thestandard orientation, determined by dx1∧· · ·∧dxn. (See the paragraph above (6.23).) Letϕ ∈ C1(Rn) satisfy ϕ(x) < 0 for x ∈ U, ϕ(x) > 0 for x ∈ Rn \ U, and grad ϕ(x) 6= 0 forx ∈ ∂U, so grad ϕ points out of U. Show that the natural orientation on ∂U, as definedjust before Proposition 8.1, is the same as the following. The equivalence class of formsβ ∈ Λn−1(∂U) defining the orientation on ∂U satisfies the property that dϕ∧β is a positivemultiple of dx1 ∧ · · · ∧ dxn, on ∂U.

3. Suppose U = x ∈ Rn : xn < 0. Show that the orientation on ∂U described above isthat of (−1)n−1 dx1 ∧ · · · ∧ dxn−1. If V = x ∈ Rn : xn > 0, what orientation does ∂Vinherit?

4. Extend the special case of Poincare’s Lemma given in Proposition 7.1 to the case whereα is a closed 1-form on B = x ∈ Rn : |x| < 1, i.e., from the case dim B = 2 to higherdimensions.

91

9. The classical Gauss, Green, and Stokes formulas

The case of (8.1) where S = Ω is a region in R2 with smooth boundary yields theclassical Green Theorem. In this case, we have

(9.1) β = f dx + g dy, dβ =(∂g

∂x− ∂f

∂y

)dx ∧ dy,

and hence (8.1) becomes the following

Proposition 9.1. If Ω is a region in R2 with smooth boundary, and f and g are smoothfunctions on Ω, which vanish outside some compact set in Ω, then

(9.2)∫∫

Ω

(∂g

∂x− ∂f

∂y

)dx dy =

∫

∂Ω

(f dx + g dy).

Note that, if we have a vector field X = X1∂/∂x + X2∂/∂y on Ω, then the integrandon the left side of (9.2) is

(9.3)∂X1

∂x+

∂X2

∂y= div X,

provided g = X1, f = −X2. We obtain

(9.4)∫∫

Ω

(div X) dx dy =∫

∂Ω

(−X2 dx + X1 dy).

If ∂Ω is parametrized by arc-length, as γ(s) =(x(s), y(s)

), with orientation as defined

for Proposition 8.1, then the unit normal ν, to ∂Ω, pointing out of Ω, is given by ν(s) =(y(s),−x(s)

), and (9.4) is equivalent to

(9.5)∫∫

Ω

(div X) dx dy =∫

∂Ω

〈X, ν〉 ds.

This is a special case of Gauss’ Divergence Theorem. We now derive a more generalform of the Divergence Theorem. We begin with a definition of the divergence of a vectorfield on a surface M.

Let M be a region in Rn, or an n-dimensional surface in Rn+k, provided with a volumeform

(9.6) ωM ∈ ΛnM.

92

Let X be a vector field on M. Then the divergence of X, denoted div X, is a function onM given by

(9.7) (div X)ωM = d(ωMcX).

If M = Rn, with the standard volume element

(9.8) ω = dx1 ∧ · · · ∧ dxn,

and if

(9.9) X =∑

Xj(x)∂

∂xj,

then

(9.10) ωcX =n∑

j=1

(−1)j−1Xj(x)dx1 ∧ · · · ∧ dxj ∧ · · · ∧ dxn.

Hence, in this case, (9.7) yields the familiar formula

(9.11) div X =n∑

j=1

∂jXj ,

where we use the notation

(9.12) ∂jf =∂f

∂xj.

Suppose now that M is endowed with both an orientation and a metric tensor gjk(x).Then M carries a natural volume element ωM , determined by the condition that, if onehas an orientation-preserving coordinate system in which gjk(p0) = δjk, then ωM (p0) =dx1∧· · ·∧dxn. This condition produces the following formula, in any orientation-preservingcoordinate system:

(9.13) ωM =√

g dx1 ∧ · · · ∧ dxn, g = det(gjk),

by the same sort of calculations as done in (5.10)–(5.15).We now compute div X when the volume element on M is given by (9.13). We have

(9.14) ωMcX =∑

j

(−1)j−1Xj√g dx1 ∧ · · · ∧ dxj ∧ · · · ∧ dxn

and hence

(9.15) d(ωMcX) = ∂j(√

gXj) dx1 ∧ · · · ∧ dxn.

93

Here, as below, we use the summation convention. Hence the formula (9.7) gives

(9.16) div X = g−12 ∂j(g

12 Xj).

We now derive the Divergence Theorem, as a consequence of Stokes’ formula, which werecall is

(9.17)∫

M

dα =∫

∂M

α,

for an (n−1)-form on M, assumed to be a smooth compact oriented surface with boundary.If α = ωMcX, formula (9.7) gives

(9.18)∫

M

(div X)ωM =∫

∂M

ωMcX.

This is one form of the Divergence Theorem. We will produce an alternative expressionfor the integrand on the right before stating the result formally.

Given that ωM is the volume form for M determined by a Riemannian metric, we canwrite the interior product ωMcX in terms of the volume element ω∂M on ∂M, with itsinduced orientation and Riemannian metric, as follows. Pick coordinates on M, centeredat p0 ∈ ∂M, such that ∂M is tangent to the hyperplane x1 = 0 at p0 = 0 (with M tothe left of ∂M), and such that gjk(p0) = δjk, so ωM (p0) = dx1 ∧ · · · ∧ dxn. Consequently,ω∂M (p0) = dx2 ∧ · · · ∧ dx2. It follows that, at p0,

(9.19) j∗(ωMcX) = 〈X, ν〉ω∂M ,

where ν is the unit vector normal to ∂M, pointing out of M and j : ∂M → M the naturalinclusion. The two sides of (9.19), which are both defined in a coordinate independentfashion, are hence equal on ∂M, and the identity (9.18) becomes

(9.20)∫

M

(div X)ωM =∫

∂M

〈X, ν〉ω∂M .

Finally, we adopt the notation of the sort used in §§4–5. We denote the volume elementon M by dV and that on ∂M by dS, obtaining the Divergence Theorem:

Theorem 9.2. If M is a compact surface with boundary, X a smooth vector field on M,then

(9.21)∫

M

(divX) dV =∫

∂M

〈X, ν〉 dS,

where ν is the unit outward-pointing normal to ∂M.

94

The only point left to mention here is that M need not be orientable. Indeed, we cantreat the integrals in (9.21) as surface integrals, as in §5, and note that all objects in (9.21)are independent of a choice of orientation. To prove the general case, just use a partitionof unity supported on orientable pieces.

We obtain some further integral identities. First, we apply (9.21) with X replaced byuX. We have the following “derivation” identity:

(9.22) div uX = u div X + 〈du,X〉 = u divX + Xu,

which follows easily from the formula (9.16). The Divergence Theorem immediately gives

(9.23)∫

M

(div X)u dV +∫

M

XudV =∫

∂M

〈X, ν〉u dS.

Replacing u by uv and using the derivation identity X(uv) = (Xu)v + u(Xv), we have

(9.24)∫

M

[(Xu)v + u(Xv)

]dV = −

∫

M

(div X)uv dV +∫

∂M

〈X, ν〉uv dS.

It is very useful to apply (9.23) to a gradient vector field X. If v is a smooth functionon M, grad v is a vector field satisfying

(9.25) 〈grad v, Y 〉 = 〈Y, dv〉,for any vector field Y, where the brackets on the left are given by the metric tensor onM and those on the right by the natural pairing of vector fields and 1-forms. Hence gradv = X has components Xj = gjk∂kv, where (gjk) is the matrix inverse of (gjk).

Applying div to grad v, we have the Laplace operator:

(9.26) ∆v = div grad v = g−1/2∂j

(gjkg1/2∂kv

).

When M is a region in Rn and we use the standard Euclidean metric, so div X is givenby (9.11), we have the Laplace operator on Euclidean space:

(9.27) ∆v =∂2v

∂x21

+ · · ·+ ∂2v

∂x2n

.

Now, setting X = grad v in (9.23), we have Xu = 〈grad u, grad v〉, and 〈X, ν〉 =〈ν, grad v〉, which we call the normal derivative of v, and denote ∂v/∂ν. Hence

(9.28)∫

M

u(∆v) dV = −∫

M

〈grad u, grad v〉 dV +∫

∂M

u∂v

∂νdS.

If we interchange the roles of u and v and subtract, we have

(9.29)∫

M

u(∆v) dV =∫

M

(∆u)v dV +∫

∂M

[u

∂v

∂ν− ∂u

∂νv]dS.

95

Formulas (9.28)–(9.29) are also called Green formulas. We will make further use of themin §10.

We return to the Green formula (9.2), and give it another formulation. Consider avector field Z = (f, g, h) on a region in R3 containing the planar surface U = (x, y, 0) :(x, y) ∈ Ω. If we form

(9.30) curl Z = det

i j k∂x ∂y ∂z

f g h

we see that the integrand on the left side of (9.2) is the k-component of curl Z, so (9.2)can be written

(9.31)∫∫

U

(curl Z) · k dA =∫

∂U

(Z · T ) ds,

where T is the unit tangent vector to ∂U. To see how to extend this result, note that k isa unit normal field to the planar surface U.

To formulate and prove the extension of (9.31) to any compact oriented surface withboundary in R3, we use the relation between curl and exterior derivative discussed inExercises 2–3 of §7. In particular, if we set

(9.32) F =3∑

j=1

fj(x)∂

∂xj, ϕ =

3∑

j=1

fj(x) dxj ,

then curl F =∑3

1 gj(x) ∂/∂xj where

(9.33) dϕ = g1(x) dx2 ∧ dx3 + g2(x) dx3 ∧ dx1 + g3(x) dx1 ∧ dx2.

Now Suppose M is a smooth oriented (n−1)-dimensional surface with boundary in Rn.Using the orientation of M, we pick a unit normal field N to M as follows. Take a smoothfunction v which vanishes on M but such that ∇v(x) 6= 0 on M. Thus ∇v is normal to M.Let σ ∈ Λn−1(M) define the orientation of M. Then dv ∧ σ = a(x) dx1 ∧ · · · ∧ dxn, wherea(x) is nonvanishing on M. For x ∈ M, we take N(x) = ∇v(x)/|∇v(x)| if a(x) > 0, andN(x) = −∇v(x)/|∇v(x)| if a(x) < 0. We call N the “positive” unit normal field to theoriented surface M, in this case. Part of the motivation for this characterization of N isthat, if Ω ⊂ Rn is an open set with smooth boundary M = ∂Ω, and we give M the inducedorientation, as described in §8, then the positive normal field N just defined coincides withthe unit normal field pointing out of Ω. Compare Exercises 2–3 of §8.

Now, if G = (g1, . . . , gn) is a vector field defined near M, then

(9.34)∫

M

(N ·G) dS =∫

M

( n∑

j=1

(−1)j−1gj(x) dx1 ∧ · · · dxj · · · ∧ dxn

).

96

This result follows from (9.19). When n = 3 and G = curl F, we deduce from (9.32)–(9.33)that

(9.35)∫∫

M

dϕ =∫∫

M

(N · curl F ) dS.

Furthermore, in this case we have

(9.36)∫

∂M

ϕ =∫

∂M

(F · T ) ds,

where T is the unit tangent vector to ∂M, specied as follows by the orientation of ∂M ; ifτ ∈ Λ1(∂M) defines the orientation of ∂M, then 〈T, τ〉 > 0 on ∂M. We call T the “forward”unit tangent vector field to the oriented curve ∂M. By the calculations above, we have theclassical Stokes formula:

Proposition 9.3. If M is a compact oriented surface with boundary in R3, and F is aC1 vector field on a neighborhood of M, then

(9.37)∫∫

M

(N · curl F ) dS =∫

∂M

(F · T ) ds,

where N is the positive unit normal field on M and T the forward unit tangent field to∂M.

Exercises

1. Given a “Hamiltonian vector field”

Hf =n∑

j=1

[ ∂f

∂ξj

∂

∂xj− ∂f

∂xj

∂

∂ξj

],

calculate div Hf from (9.11).

2. Show that, if F : R3 → R3 is a linear rotation, then, for a C1 vector field Z on R3,

(9.38) F#(curl Z) = curl(F#Z).

3. Let M be the graph in R3 of a smooth function, z = u(x, y), (x, y) ∈ O ⊂ R2, abounded region with smooth boundary (maybe with corners). Show that

(9.39)

∫

M

(curl F ·N) dS =∫∫

O

[(∂F3

∂y− ∂F2

∂z

)(−∂u

∂x

)+

(∂F1

∂z− ∂F3

∂x

)(−∂u

∂y

)

+(∂F2

∂x− ∂F1

∂y

)]dx dy,

97

where ∂Fj/∂x and ∂Fj/∂y are evaluated at(x, y, u(x, y)

). Show that

(9.40)∫

∂M

(F · T ) ds =∫

∂O

(F1 + F3

∂u

∂x

)dx +

(F2 + F3

∂u

∂y

)dy,

where Fj(x, y) = Fj

(x, y, u(x, y)

). Apply Green’s Theorem, with f = F1 + F3(∂u/∂x),

g = F2 + F3(∂u/∂y), to show that the right sides of (9.39) and (9.40) are equal, henceproving Stokes’ Theorem in this case.

4. Let M ⊂ Rn be the graph of a function xn = u(x′), x′ = (x1, . . . , xn−1). If

β =n∑

j=1

(−1)j−1gj(x) dx1 ∧ · · · ∧ dxj ∧ · · · ∧ dxn,

as in (9.34), and ϕ(x′) =(x′, u(x′)

), show that

ϕ∗β = (−1)n

n−1∑

j=1

gj

(x′, u(x′)

) ∂u

∂xj− gn

(x′, u(x′)

) dx1 ∧ · · · ∧ dxn−1

= (−1)n−1G · (−∇u, 1) dx1 ∧ · · · ∧ dxn−1,

where G = (g1, . . . , gn), and verify the identity (9.34) in this case.Hint. For the last part, recall Exercises 2–3 of §8, regarding the orientation of M.

5. Let S be a smooth oriented 2-dimensional surface in R3, and M an open subset of S,with smooth boundary; see Fig. 9.1. Let N be the positive unit normal field to S, definedby its orientation. For x ∈ ∂M, let ν(x) be the unit vector, tangent to M, normal to ∂M,and pointing out of M, and let T be the forward unit tangent vector field to ∂M. Showthat, on ∂M,

N × ν = T, ν × T = N.

6. If M is an oriented (n − 1)-dimensional surface in Rn, with positive unit normal fieldN , show that the volume element ωM on M is given by

ωM = ωcN,

where ω = dx1 ∧ · · · ∧ dxn is the standard volume form on Rn. Deduce that the volumeelement on the unit sphere Sn−1 ⊂ Rn is given by

ωSn−1 =n∑

j=1

(−1)j−1xj dx1 ∧ · · · ∧ dxj ∧ · · · ∧ dxn,

if Sn−1 inherits the orientation as the boundary of the unit ball.

98

9B. Direct proof of the Divergence Theorem

Let Ω be a bounded open subset of Rn, with a smooth boundary ∂Ω. Hence, for eachp ∈ ∂Ω, there is a neighborhood U of p in Rn, a rotation of coordinate axes, and a C1

function u : O → R, defined on an open set O ⊂ Rn−1, such that

Ω ∩ U = x ∈ Rn : xn ≤ u(x′), x′ ∈ O ∩ U,

where x = (x′, xn), x′ = (x1, . . . , xn−1).We aim to prove that, given f ∈ C1(Ω), and any constant vector e ∈ Rn,

(9.41)∫

Ω

e · ∇f(x) dx =∫

∂Ω

(e ·N)f dS,

where dS is surface measure on ∂Ω and N(x) is the unit normal to ∂Ω, pointing out of Ω.At x = (x′, u(x′)) ∈ ∂Ω, we have

(9.42) N = (1 + |∇u|2)−1/2(−∇u, 1).

To prove (9.41), we may as well suppose f is supported in such a neighborhood U. Thenwe have

(9.43)

∫

Ω

∂f

∂xndx =

∫

O

( ∫

xn≤u(x′)

∂nf(x′, xn) dxn

)dx′

=∫

O

f(x′, u(x′)

)dx′

=∫

∂Ω

(en ·N)f dS.

The first identity in (9.43) follows from Theorem 4.9, the second identity from the Funda-mental Theorem of Calculus, and the third identity from the identification

dS =(1 + |∇u|2)1/2

dx′,

established in (5.21). We use the standard basis e1, . . . , en of Rn.Such an argument works when en is replaced by any constant vector e with the property

that we can represent ∂Ω∩U as the graph of a function yn = u(y′), with the yn-axis parallelto e. In particular, it works for e = en + aej , for 1 ≤ j ≤ n − 1 and for |a| sufficientlysmall. Thus, we have

(9.44)∫

Ω

(en + aej) · ∇f(x) dx =∫

∂Ω

(en + aej) ·N f dS.

99

If we subtract (9.43) from this and divide the result by a, we obtain (9.41) for e = ej , forall j, and hence (9.41) holds in general.

Note that replacing e by ej and f by fj in (9.41), and summing over 1 ≤ j ≤ n, yields

(9.45)∫

Ω

(div F ) dx =∫

∂Ω

N · F dS,

for the vector field F = (f1, . . . , fn). This is the usual statement of Gauss’ DivergenceTheorem, as given in Theorem 9.2 (specialized to domains in Rn).

Reversing the argument leading from (9.2) to (9.5), we also have another proof of Green’sTheorem, in the form (9.2).

100

10. Holomorphic functions and harmonic functions

Let f be a complex-valued C1 function on a region Ω ⊂ R2. We identify R2 with C,via z = x + iy, and write f(z) = f(x, y). We say f is holomorphic on Ω if it satisfies theCauchy-Riemann equation:

(10.1)∂f

∂x=

1i

∂f

∂y.

Note that f(z) = zk has this property, for any k ∈ Z, via an easy calculation of the partialderivatives of (x + iy)k. Thus zk is holomorphic on C if k ∈ Z+, and on C \ 0, if k ∈ Z−.Another example is the exponential function, ez = exeiy. On the other hand, x2 + y2 isnot holomorphic.

The following is often useful for producing more holomorphic functions:

Lemma 10.1. If f and g are holomorphic on Ω, so is fg.

Proof. We have

(10.2)∂

∂x(fg) =

∂f

∂xg + f

∂g

∂x,

∂

∂y(fg) =

∂f

∂yg + f

∂g

∂y,

so if f and g satisfy the Cauchy-Riemann equation, so does fg.

We next apply Green’s theorem to the line integral∫

∂Ωf dz =

∫∂Ω

f(dx + i dy). Clearly(9.2) applies to complex-valued functions, and if we set g = if, we get

(10.3)∫

∂Ω

f dz =∫∫

Ω

(i∂f

∂x− ∂f

∂y

)dx dy.

Whenever f is holomorphic, the integrand on the right side of (10.3) vanishes, so we havethe following result, known as Cauchy’s Integral Theorem:

Theorem 10.2. If f ∈ C1(Ω) is holomorphic, then

(10.4)∫

∂Ω

f(z) dz = 0.

Until further notice, we assume Ω is a bounded region in C, with smooth boundary.Using (10.4), we can establish Cauchy’s Integral Formula:

101

Theorem 10.3. If f ∈ C1(Ω) is holomorphic and z0 ∈ Ω, then

(10.5) f(z0) =1

2πi

∫

∂Ω

f(z)z − z0

dz.

Proof. Note that g(z) = f(z)/(z − z0) is holomorphic on Ω \ z0. Let Dr be the disk ofradius r centered at z0. Pick r so small that Dr ⊂ Ω. Then (10.4) implies

(10.6)∫

∂Ω

f(z)z − z0

dz =∫

∂Dr

f(z)z − z0

dz.

To evaluate the integral on the right, parametrize the curve ∂Dr by γ(θ) = z0+reiθ. Hencedz = ireiθ dθ, so the integral on the right is equal to

(10.7)∫ 2π

0

f(z0 + reiθ)reiθ

ireiθ dθ = i

∫ 2π

0

f(z0 + reiθ) dθ.

As r → 0, this tends in the limit to 2πif(z0), so (10.5) is established.

Suppose f ∈ C1(Ω) is holomorphic, z0 ∈ Dr ⊂ Ω, where Dr is the disk of radius rcentered at z0, and suppose z ∈ Dr. Then Theorem 10.3 implies

(10.8) f(z) =1

2πi

∫

∂Ω

f(ζ)(ζ − z0)− (z − z0)

dζ.

We have the infinite series expansion

(10.9)1

(ζ − z0)− (z − z0)=

1ζ − z0

∞∑n=0

(z − z0

ζ − z0

)n

,

valid as long as |z − z0| < |ζ − z0|. Hence, given |z − z0| < r, this series is uniformlyconvergent for ζ ∈ ∂Ω, and we have

(10.10) f(z) =1

2πi

∞∑n=0

∫

∂Ω

f(ζ)ζ − z0

(z − z0

ζ − z0

)n

dζ.

Thus, for z ∈ Dr, f(z) has the convergent power series expansion

(10.11) f(z) =∞∑

n=0

an(z − z0)n, an =1

2πi

∫

∂Ω

f(ζ)(ζ − z0)n+1

dζ.

Note that, when (10.5) is applied to Ω = Dr, the disk of radius r centered at z0, thecomputation (10.7) yields

(10.12) f(z0) =12π

∫ 2π

0

f(z0 + reiθ) dθ =1

`(∂Dr)

∫

∂Dr

f(z) ds(z),

102

when f is holomorphic and C1 on Dr, and `(∂Dr) = 2πr is the length of the circle ∂Dr.This is a mean value property, which extends to harmonic functions on domains in Rn, aswe will see below.

Note that we can write (10.1) as (∂x + i∂y)f = 0; applying the operator ∂x− i∂y to thisgives

(10.13)∂2f

∂x2+

∂2f

∂y2= 0

for any holomorphic function. A general C2 solution to (10.13) on a region Ω ⊂ R2 is calleda harmonic function. More generally, if O is an open set in Rn, a function f ∈ C2(Ω) iscalled harmonic if ∆f = 0 on O, where, as in (9.27),

(10.14) ∆f =∂2f

∂x21

+ · · ·+ ∂2f

∂x2n

.

Generalizing (10.12), we have the following, known as the mean value property of harmonicfunctions:

Proposition 10.4. Let Ω ⊂ Rn be open, u ∈ C2(Ω) be harmonic, p ∈ Ω, and BR(p) =x ∈ Ω : |x− p| ≤ R ⊂ Ω. Then

(10.15) u(p) =1

A(∂BR(p)

)∫

∂BR(p)

u(x) dS(x).

For the proof, set

(10.16) ψ(r) =1

A(Sn−1)

∫

Sn−1

u(p + rω) dS(ω),

for 0 < r ≤ R. We have ψ(R) equal to the right side of (10.15), while clearly ψ(r) → u(p)as r → 0. Now

(10.17) ψ′(r) =1

A(Sn−1)

∫

Sn−1

ω · ∇u(p + rω) dS(ω) =1

A(∂Br(p)

)∫

∂Br(p)

∂u

∂νdS(x).

At this point, we establish:

Lemma 10.5. If O ⊂ Rn is a bounded domain with smooth boundary and u ∈ C2(O) isharmonic in O, then

(10.18)∫

∂O

∂u

∂ν(x) dS(x) = 0.

Proof. Apply the Green formula (9.29), with M = O and v = 1. If ∆u = 0, every integrandin (9.29) vanishes, except the one appearing in (10.18), so this integrates to zero.

103

It follows from this lemma that (10.17) vanishes, so ψ(r) is constant. This completesthe proof of (10.15).

We can integrate the identity (10.15), to obtain

(10.19) u(p) =1

V(BR(p)

)∫

BR(p)

u(x) dV (x),

where u ∈ C2(BR(p)

)is harmonic. This is another expression of the mean value property.

The mean value property of harmonic functions has a number of important conse-quences. Here we mention one result, known as Liouville’s Theorem.

Proposition 10.5. If u ∈ C2(Rn) is harmonic on all of Rn and bounded, then u isconstant.

Proof. Pick any two points p, q ∈ Rn. We have, for any r > 0,

(10.20) u(p)− u(q) =1

V(Br(0)

)

∫

Br(p)

u(x) dx−∫

Br(q)

u(x) dx

.

Note that V(Br(0)

)= Cnrn, where Cn is evaluated in problem 2 of §5. Thus

(10.21) |u(p)− u(q)| ≤ Cn

rn

∫

∆(p,q,r)

|u(x)| dx,

where

(10.22) ∆(p, q, r) = Br(p)4Br(q) =(Br(p) \Br(q)

)∪

(Br(q) \Br(p)

).

Note that, if a = |p− q|, then ∆(p, q, r) ⊂ Br+a(p) \Br−a(p); hence

(10.23) V(∆(p, q, r)

) ≤ C(p, q) rn−1, r ≥ 1.

It follows that, if |u(x)| ≤ M for all x ∈ Rn, then

(10.24) |u(p)− u(q)| ≤ MCnC(p, q) r−1, ∀ r ≥ 1.

Taking r →∞, we obtain u(p)− u(q) = 0, so u is constant.

We will now use Liouville’s Theorem to prove the Fundamental Theorem of Algebra:

Theorem 10.6. If p(z) = anzn + an−1zn−1 + · · · + a1z + a0 is a polynomial of degree

n ≥ 1 (an 6= 0), then p(z) must vanish somewhere in C.

Proof. Consider

(10.25) f(z) =1

p(z).

104

If p(z) does not vanish anywhere in C, then f(z) is holomorphic on all of C. On the otherhand,

(10.26) f(z) =1zn

1an + an−1z−1 + · · ·+ a0z−n

,

so

(10.27) |f(z)| −→ 0, as |z| → ∞.

Thus f is bounded on C, if p(z) has no roots. By Proposition 10.5, f(z) must be constant,which is impossible, so p(z) must have a complex root.

From the fact that every holomorphic function f on O ⊂ R2 is harmonic, it follows thatits real and imaginary parts are harmonic. This result has a converse. Let u ∈ C2(O) beharmonic. Consider the 1-form

(10.28) α = −∂u

∂ydx +

∂u

∂xdy.

We have dα = −(∆u)dx ∧ dy, so α is closed if and only if u is harmonic. Now, if O isdiffeomorphic to a disk, it follows from Proposition 7.1 that α is exact on O, whenever itis closed, so, in such a case,

(10.29) ∆u = 0 on O =⇒ ∃ v ∈ C1(O) s.t. α = dv.

In other words,

(10.30)∂u

∂x=

∂v

∂y,

∂u

∂y= −∂v

∂x.

This is precisely the Cauchy-Riemann equation (10.1) for f = u + iv, so we have:

Proposition 10.7. If O ⊂ R2 is diffeomorphic to a disk and u ∈ C2(O) is harmonic,then u is the real part of a holomorphic function on O.

The function v (which is unique up to an additive constant) is called the harmonicconjugate of u.

We close this section with a brief mention of holomorphic functions on a domainO ⊂ Cn.We say f ∈ C1(O) is holomorphic provided it satisfies

(10.31)∂f

∂xj=

1i

∂f

∂yj, 1 ≤ j ≤ n.

Suppose z ∈ O, z = (z1, . . . , zn). Suppose ζ ∈ O whenever |z−ζ| < r. Then, by successivelyapplying Cauchy’s integral formula (10.5) to each complex variable zj , we have that

(10.32) f(z) = (2πi)−n

∫

γn

· · ·∫

γ1

f(ζ)(ζ1 − z1) · · · (ζn − zn)−1 dζ1 · · · dζn,

105

where γj is any simple counterclockwise curve about zj in C with the property that|ζj − zj | < r/

√n for all ζj ∈ γj .

Consequently, if p ∈ Cn and O contains the “polydisc” D = z ∈ Cn : |z − p| ≤ √nδ,

then, for z ∈ D, the interior of D, we have

(10.33)f(z) = (2πi)−n

∫

Cn

· · ·∫

C1

f(ζ)[(ζ1 − p1)− (z1 − p1)

]−1 · · ·

[(ζn − pn)− (zn − pn)

]−1dζ1 · · · dζn,

where Cj = ζ ∈ C : |ζ − pj | = δ. Then, parallel to (10.8)–(10.11), we have

(10.34) f(z) =∑

α≥0

cα(z − p)α,

for z ∈ D, where α = (α1, . . . , αn) is a multi-index, (z − p)α = (z1 − p1)α1 · · · (zn − pn)αn ,as in (1.13), and

(10.35) cα = (2πi)−n

∫

Cn

· · ·∫

C1

f(ζ)(ζ1 − p1)−α1−1 · · · (ζn − pn)−αn−1 dζ1 · · · dζn.

Thus holomorphic functions on open domains in Cn have convergent power series expan-sions.

We refer to [Ahl] and [Hil] for more material on holomorphic functions of one complexvariable, and to [Kr] for material on holomorphic functions of several complex variables. Asource of much information on harmonic functions is [Kel]. Also further material on thesesubjects can be found in [T].

Exercises

1. Look again at Exercise 4 in §1.

2. Look again at Exercises 3–5 in §2.

3. Let O, Ω be open in C. If f is holomorphic on O, with range in Ω, and g is holomorphicon Ω, show that h = g f is holomorphic on O.Hint. Chain rule.

4. If f(x) = ϕ(|x|) on Rn, show that

∆f(x) = ϕ′′(|x|) +n− 1|x| ϕ′(|x|).

In particular, show that|x|−(n−2) is harmonic on Rn \ 0,

106

if n ≥ 3, andlog |x| is harmonic on R2 \ 0.

If O,Ω are open in Rn, a smooth map ϕ : O → Ω is said to be conformal provided thematrix function G(x) = Dϕ(x)tDϕ(x) is a multiple of the identity, G(x) = γ(x)I. Recallformula (5.2).

5. Suppose n = 2 and ϕ preserves orientation. Show that ϕ (pictured as a functionϕ : O → C) is conformal if and only if it is holomorphic. If ϕ reverses orientation, ϕ isconformal ⇔ ϕ is holomorphic (we say ϕ is anti-holomorphic).

6. If O and Ω are open in R2 and u is harmonic on Ω, show that u ϕ is harmonic on O,whenever ϕ : O → Ω is a smooth conformal map.Hint. Use Exercise 3.

7. Let ez be defined by

ez = exeiy, z = x + iy, x, y ∈ R,

where ex and eiy are defined by (3.20), (3.25). Show that ez is holomorphic in z ∈ C, and

ez =∞∑

k=0

zk

k!.

8. For z ∈ C, setcos z = 1

2

(eiz + e−iz

), sin z = 1

2i

(eiz − e−iz

).

Show that these functions agree with the definitions of cos t and sin t given in (3.25)–(3.26),for z = t ∈ R. Show that cos z and sin z are holomorphic in z ∈ C.

107

11. The Brouwer fixed-point theorem

Here we make use of the calculus of differential forms to provide simple proofs of someimportant topological results of Brouwer. The first two results concern retractions. If Yis a subset of X, by definition a retraction of X onto Y is a map ϕ : X → Y such thatϕ(x) = x for all x ∈ Y .

Proposition 11.1. There is no smooth retraction ϕ : B → Sn−1 of the closed unit ball Bin Rn onto its boundary Sn−1.

In fact, it is just as easy to prove the following more general result. The approach weuse is adapted from [Kan].

Proposition 11.2. If M is a compact oriented n-dimensional surface with nonemptyboundary ∂M, there is no smooth retraction ϕ : M → ∂M.

Proof. Pick ω ∈ Λn−1(∂M) to be the volume form on ∂M , so∫

∂Mω > 0. Now apply

Stokes’ theorem to β = ϕ∗ω. If ϕ is a retraction, then ϕ j(x) = x, where j : ∂M → M isthe natural inclusion. Hence j∗ϕ∗ω = ω, so we have

(11.1)∫

∂M

ω =∫

M

dϕ∗ω.

But dϕ∗ω = ϕ∗dω = 0, so the integral (11.1) is zero. This is a contradiction, so there canbe no retraction.

A simple consequence of this is the famous Brouwer Fixed-Point Theorem. We firstpresent the smooth case.

Theorem 11.3. If F : B → B is a smooth map on the closed unit ball in Rn, then F hasa fixed point.

Proof. We are claiming that F (x) = x for some x ∈ B. If not, define ϕ(x) to be theendpoint of the ray from F (x) to x, continued until it hits ∂B = Sn−1. It is clear that ϕwould be a smooth retraction, contradicting Proposition 11.1.

Now we give the general case, using the Stone-Weierstrass theorem (discussed in Ap-pendix E) to reduce it to Theorem 11.3.

Theorem 11.4. If G : B → B is a continuous map on the closed unit ball in Rn, then Ghas a fixed point.

Proof. If not, theninfx∈B

|G(x)− x| = δ > 0.

The Stone-Weierstrass theorem (Appendix E) implies there exists a polynomial P suchthat |P (x)−G(x)| < δ/8 for all x ∈ B. Set

F (x) =(1− δ

8

)P (x).

108

Then F : B → B and |F (x)−G(x)| < δ/2 for all x ∈ B, so

infx∈B

|F (x)− x| > δ

2.

This contradicts Theorem 11.3.

109

A. Metric spaces, convergence, and compactness

A metric space is a set X, together with a distance function d : X×X → [0,∞), havingthe properties that

(A.1)

d(x, y) = 0 ⇐⇒ x = y,

d(x, y) = d(y, x),

d(x, y) ≤ d(x, z) + d(y, z).

The third of these properties is called the triangle inequality. An example of a metric spaceis the set of rational numbers Q, with d(x, y) = |x− y|. Another example is X = Rn, with

d(x, y) =√

(x1 − y1)2 + · · ·+ (xn − yn)2.

If (xν) is a sequence in X, indexed by ν = 1, 2, 3, . . . , i.e., by ν ∈ Z+, one says xν → y ifd(xν , y) → 0, as ν →∞. One says (xν) is a Cauchy sequence if d(xν , xµ) → 0 as µ, ν →∞.One says X is a complete metric space if every Cauchy sequence converges to a limit inX. Some metric spaces are not complete; for example, Q is not complete. You can take asequence (xν) of rational numbers such that xν →

√2, which is not rational. Then (xν) is

Cauchy in Q, but it has no limit in Q.

If a metric space X is not complete, one can construct its completion X as follows. Letan element ξ of X consist of an equivalence class of Cauchy sequences in X, where we say(xν) ∼ (yν) provided d(xν , yν) → 0. We write the equivalence class containing (xν) as [xν ].If ξ = [xν ] and η = [yν ], we can set d(ξ, η) = limν→∞ d(xν , yν), and verify that this is welldefined, and makes X a complete metric space.

If the completion of Q is constructed by this process, you get R, the set of real numbers.This construction provides a good way to develop the basic theory of the real numbers.

There are a number of useful concepts related to the notion of closeness. We definesome of them here. First, if p is a point in a metric space X and r ∈ (0,∞), the set

(A.2) Br(p) = x ∈ X : d(x, p) < ris called the open ball about p of radius r. Generally, a neighborhood of p ∈ X is a setcontaining such a ball, for some r > 0.

A set U ⊂ X is called open if it contains a neighborhood of each of its points. Thecomplement of an open set is said to be closed. The following result characterizes closedsets.

Proposition A.1. A subset K ⊂ X of a metric space X is closed if and only if

(A.3) xj ∈ K, xj → p ∈ X =⇒ p ∈ K.

Proof. Assume K is closed, xj ∈ K, xj → p. If p /∈ K, then p ∈ X \K, which is open, sosome Bε(p) ⊂ X \K, and d(xj , p) ≥ ε for all j. This contradiction implies p ∈ K.

Conversely, assume (A.3) holds, and let q ∈ U = X \K. If B1/n(q) is not contained inU for any n, then there exists xn ∈ K ∩B1/n(q), hence xn → q, contradicting (A.3). Thiscompletes the proof.

The following is straightforward.

110

Proposition A.2. If Uα is a family of open sets in X, then ∪αUα is open. If Kα is afamily of closed subsets of X, then ∩αKα is closed.

Given S ⊂ X, we denote by S (the closure of S) the smallest closed subset of Xcontaining S, i.e., the intersection of all the closed sets Kα ⊂ X containing S. Thefollowing result is straightforward.

Proposition A.3. Given S ⊂ X, p ∈ S if and only if there exist xj ∈ S such that xj → p.

Given S ⊂ X, p ∈ X, we say p is an accumulation point of S if and only if, for eachε > 0, there exists q ∈ S ∩Bε(p), q 6= p. It follows that p is an accumulation point of S ifand only if each Bε(p), ε > 0, contains infinitely many points of S. One straightforwardobservation is that all points of S \ S are accumulation points of S.

The interior of a set S ⊂ X is the largest open set contained in S, i.e., the union of allthe open sets contained in S. Note that the complement of the interior of S is equal tothe closure of X \ S.

We now turn to the notion of compactness. We say a metric space X is compact providedthe following property holds:

(A) Each sequence (xk) in X has a convergent subsequence.

We will establish various properties of compact metric spaces, and provide various equiv-alent characterizations. For example, it is easily seen that (A) is equivalent to:

(B) Each infinite subset S ⊂ X has an accumulation point.

The following property is known as total boundedness:

Proposition A.4. If X is a compact metric space, then

(C) Given ε > 0, ∃ finite set x1, . . . , xN such that Bε(x1), . . . , Bε(xN ) covers X.

Proof. Take ε > 0 and pick x1 ∈ X. If Bε(x1) = X, we are done. If not, pick x2 ∈X \Bε(x1). If Bε(x1)∪Bε(x2) = X, we are done. If not, pick x3 ∈ X \ [Bε(x1)∪Bε(x2)].Continue, taking xk+1 ∈ X \ [Bε(x1) ∪ · · · ∪ Bε(xk)], if Bε(x1) ∪ · · · ∪ Bε(xk) 6= X. Notethat, for 1 ≤ i, j ≤ k,

i 6= j =⇒ d(xi, xj) ≥ ε.

If one never covers X this way, consider S = xj : j ∈ N. This is an infinite set with noaccumulation point, so property (B) is contradicted.

Corollary A.5. If X is a compact metric space, it has a countable dense subset.

Proof. Given ε = 2−n, let Sn be a finite set of points xj such that Bε(xj) covers X.Then C = ∪nSn is a countable dense subset of X.

Here is another useful property of compact metric spaces, which will eventually begeneralized even further, in (E) below.

111

Proposition A.6. Let X be a compact metric space. Assume K1 ⊃ K2 ⊃ K3 ⊃ · · · forma decreasing sequence of closed subsets of X. If each Kn 6= ∅, then ∩nKn 6= ∅.Proof. Pick xn ∈ Kn. If (A) holds, (xn) has a convergent subsequence, xnk

→ y. Sincexnk

: k ≥ ` ⊂ Kn`, which is closed, we have y ∈ ∩nKn.

Corollary A.7. Let X be a compact metric space. Assume U1 ⊂ U2 ⊂ U3 ⊂ · · · form anincreasing sequence of open subsets of X. If ∪nUn = X, then UN = X for some N .

Proof. Consider Kn = X \ Un.

The following is an important extension of Corollary A.7.

Proposition A.8. If X is a compact metric space, then it has the property:

(D) Every open cover Uα : α ∈ A of X has a finite subcover.

Proof. Each Uα is a union of open balls, so it suffices to show that (A) implies the following:

(D’) Every cover Bα : α ∈ A of X by open balls has a finite subcover.

Let C = zj : j ∈ N ⊂ X be a countable dense subset of X, as in Corollary A.2. Each Bα

is a union of balls Brj (zj), with zj ∈ C ∩Bα, rj rational. Thus it suffices to show that

(D”) Every countable cover Bj : j ∈ N of X by open balls has a finite subcover.

For this, we setUn = B1 ∪ · · · ∪Bn

and apply Corollary A.7.

The following is a convenient alternative to property (D):

(E) If Kα ⊂ X are closed and⋂α

Kα = ∅, then some finite intersection is empty.

Considering Uα = X \Kα, we see that

(D) ⇐⇒ (E).

The following result completes Proposition A.8.

Theorem A.9. For a metric space X,

(A) ⇐⇒ (D).

Proof. By Proposition A.8, (A) ⇒ (D). To prove the converse, it will suffice to show that(E) ⇒ (B). So let S ⊂ X and assume S has no accumulation point. We claim:

Such S must be closed.

112

Indeed, if z ∈ S and z /∈ S, then z would have to be an accumulation point. Say S = xα :α ∈ A. Set Kα = S \ xα. Then each Kα has no accumulation point, hence Kα ⊂ X isclosed. Also ∩αKα = ∅. Hence there exists a finite set F ⊂ A such that ∩α∈FKα = ∅, if(E) holds. Hence S = ∪α∈Fxα is finite, so indeed (E) ⇒ (B).

Remark. So far we have that for every metric space X,

(A) ⇐⇒ (B) ⇐⇒ (D) ⇐⇒ (E) =⇒ (C).

We claim that (C) implies the other conditions if X is complete. Of course, compactnessimplies completeness, but (C) may hold for incomplete X, e.g., X = (0, 1) ⊂ R.

Proposition A.10. If X is a complete metric space with property (C), then X is compact.

Proof. It suffices to show that (C) ⇒ (B) if X is a complete metric space. So let S ⊂ Xbe an infinite set. Cover X by balls B1/2(x1), . . . , B1/2(xN ). One of these balls containsinfinitely many points of S, and so does its closure, say X1 = B1/2(y1). Now cover X byfinitely many balls of radius 1/4; their intersection with X1 provides a cover of X1. Onesuch set contains infinitely many points of S, and so does its closure X2 = B1/4(y2) ∩X1.Continue in this fashion, obtaining

X1 ⊃ X2 ⊃ X3 ⊃ · · · ⊃ Xk ⊃ Xk+1 ⊃ · · · , Xj ⊂ B2−j (yj),

each containing infinitely many points of S. One sees that (yj) forms a Cauchy sequence.If X is complete, it has a limit, yj → z, and z is seen to be an accumulation point of S.

If Xj , 1 ≤ j ≤ m, is a finite collection of metric spaces, with metrics dj , we can definea Cartesian product metric space

(A.4) X =m∏

j=1

Xj , d(x, y) = d1(x1, y1) + · · ·+ dm(xm, ym).

Another choice of metric is δ(x, y) =√

d1(x1, y1)2 + · · ·+ dm(xm, ym)2. The metrics d andδ are equivalent, i.e., there exist constants C0, C1 ∈ (0,∞) such that

(A.5) C0δ(x, y) ≤ d(x, y) ≤ C1δ(x, y), ∀ x, y ∈ X.

A key example is Rm, the Cartesian product of m copies of the real line R.We describe some important classes of compact spaces.

Proposition A.11. If Xj are compact metric spaces, 1 ≤ j ≤ m, so is X =∏m

j=1 Xj .

Proof. If (xν) is an infinite sequence of points in X, say xν = (x1ν , . . . , xmν), pick aconvergent subsequence of (x1ν) in X1, and consider the corresponding subsequence of(xν), which we relabel (xν). Using this, pick a convergent subsequence of (x2ν) in X2.Continue. Having a subsequence such that xjν → yj in Xj for each j = 1, . . . , m, we thenhave a convergent subsequence in X.

The following result is useful for calculus on Rn.

113

Proposition A.12. If K is a closed bounded subset of Rn, then K is compact.

Proof. The discussion above reduces the problem to showing that any closed interval I =[a, b] in R is compact. This compactness is a corollary of Proposition A.10. For pedagogicalpurposes, we redo the argument here, since in this concrete case it can be streamlined.

Suppose S is a subset of I with infinitely many elements. Divide I into 2 equal subin-tervals, I1 = [a, b1], I2 = [b1, b], b1 = (a+b)/2. Then either I1 or I2 must contain infinitelymany elements of S. Say Ij does. Let x1 be any element of S lying in Ij . Now divide Ij intwo equal pieces, Ij = Ij1 ∪ Ij2. One of these intervals (say Ijk) contains infinitely manypoints of S. Pick x2 ∈ Ijk to be one such point (different from x1). Then subdivide Ijk

into two equal subintervals, and continue. We get an infinite sequence of distinct pointsxν ∈ S, and |xν − xν+k| ≤ 2−ν(b− a), for k ≥ 1. Since R is complete, (xν) converges, sayto y ∈ I. Any neighborhood of y contains infinitely many points in S, so we are done.

If X and Y are metric spaces, a function f : X → Y is said to be continuous providedxν → x in X implies f(xν) → f(x) in Y. An equivalent condition, which the reader isinvited to verify, is

(A.6) U open in Y =⇒ f−1(U) open in X.

Proposition A.13. If X and Y are metric spaces, f : X → Y continuous, and K ⊂ Xcompact, then f(K) is a compact subset of Y.

Proof. If (yν) is an infinite sequence of points in f(K), pick xν ∈ K such that f(xν) = yν .If K is compact, we have a subsequence xνj → p in X, and then yνj → f(p) in Y.

If F : X → R is continuous, we say f ∈ C(X). A useful corollary of Proposition A.13 is:

Proposition A.14. If X is a compact metric space and f ∈ C(X), then f assumes amaximum and a minimum value on X.

Proof. We know from Proposition A.13 that f(X) is a compact subset of R. Hence f(X)is bounded, say f(X) ⊂ I = [a, b]. Repeatedly subdividing I into equal halves, as in theproof of Proposition A.12, at each stage throwing out intervals that do not intersect f(X),and keeping only the leftmost and rightmost interval amongst those remaining, we obtainpoints α ∈ f(X) and β ∈ f(X) such that f(X) ⊂ [α, β]. Then α = f(x0) for some x0 ∈ Xis the minimum and β = f(x1) for some x1 ∈ X is the maximum.

At this point, the reader might take a look at the proof of the Mean Value Theorem,given in §0, which applies this result.

If S ⊂ R is a nonempty, bounded set, Proposition A.12 implies S is compact. Thefunction η : S → R, η(x) = x is continuous, so by Proposition A.14 it assumes a maximumand a minimum on S. We set

(A.7) sup S = maxs∈S

x, inf S = minx∈S

x,

when S is bounded. More generally, if S ⊂ R is nonempty and bounded from above, sayS ⊂ (−∞, B], we can pick A < B such that S ∩ [A,B] is nonempty, and set

(A.8) sup S = sup S ∩ [A,B].

114

Similarly, if S ⊂ R is nonempty and bounded from below, say S ⊂ [A,∞), we can pickB > A such that S ∩ [A, B] is nonempty, and set

(A.9) inf S = inf S ∩ [A,B].

If X is a nonempty set and f : X → R is bounded from above, we set

(A.10) supx∈X

f(x) = sup f(X),

and if f : X → R is bounded from below, we set

(A.11) infx∈X

f(x) = inf f(X).

If f is not bounded from above, we set sup f = +∞, and if f is not bounded from below,we set inf f = −∞.

Given a set X, f : X → R, and xn → x, we set

(A.11A) lim supn→∞

f(xn) = limn→∞

(supk≥n

f(xk)),

and

(A.11B) lim infn→∞

f(xn) = limn→∞

(infk≥n

f(xk)).

We return to the notion of continuity. A function f ∈ C(X) is said to be uniformlycontinuous provided that, for any ε > 0, there exists δ > 0 such that

(A.12) x, y ∈ X, d(x, y) ≤ δ =⇒ |f(x)− f(y)| ≤ ε.

An equivalent condition is that f have a modulus of continuity, i.e., a monotonic functionω : [0, 1) → [0,∞) such that δ 0 ⇒ ω(δ) 0, and such that

(A.13) x, y ∈ X, d(x, y) ≤ δ ≤ 1 =⇒ |f(x)− f(y)| ≤ ω(δ).

Not all continuous functions are uniformly continuous. For example, if X = (0, 1) ⊂ R,then f(x) = sin 1/x is continuous, but not uniformly continuous, on X. The followingresult is useful, for example, in the development of the Riemann integral in §1.

Proposition A.15. If X is a compact metric space and f ∈ C(X), then f is uniformlycontinuous.

Proof. If not, there exist xν , yν ∈ X and ε > 0 such that d(xν , yν) ≤ 2−ν but

(A.14) |f(xν)− f(yν)| ≥ ε.

Taking a convergent subsequence xνj → p, we also have yνj → p. Now continuity of f atp implies f(xνj ) → f(p) and f(yνj ) → f(p), contradicting (A.14).

If X and Y are metric spaces, the space C(X,Y ) of continuous maps f : X → Y has anatural metric structure, under some additional hypotheses. We use

(A.15) D(f, g) = supx∈X

d(f(x), g(x)

).

This sup exists provided f(X) and g(X) are bounded subsets of Y, where to say B ⊂ Y isbounded is to say d : B × B → [0,∞) has bounded image. In particular, this supremumexists if X is compact. The following result is useful in the proof of the fundamental localexistence result for ODE, in §3.

115

Proposition A.16. If X is a compact metric space and Y is a complete metric space,then C(X, Y ), with the metric (A.9), is complete.

Proof. That D(f, g) satisfies the conditions to define a metric on C(X, Y ) is straightfor-ward. We check completeness. Suppose (fν) is a Cauchy sequence in C(X, Y ), so, asν →∞,

(A.16) supk≥0

supx∈X

d(fν+k(x), fν(x)

) ≤ εν → 0.

Then in particular (fν(x)) is a Cauchy sequence in Y for each x ∈ X, so it converges, sayto g(x) ∈ Y . It remains to show that g ∈ C(X, Y ) and that fν → g in the metric (A.9).

In fact, taking k →∞ in the estimate above, we have

(A.17) supx∈X

d(g(x), fν(x)

) ≤ εν → 0,

i.e., fν → g uniformly. It remains only to show that g is continuous. For this, let xj → xin X and fix ε > 0. Pick N so that εN < ε. Since fN is continuous, there exists J suchthat j ≥ J ⇒ d(fN (xj), fN (x)) < ε. Hence

j ≥ J ⇒ d(g(xj), g(x)

) ≤ d(g(xj), fN (xj)

)+ d

(fN (xj), fN (x)

)+ d

(fN (x), g(x)

)< 3ε.

This completes the proof.

In case Y = R, C(X,R) = C(X), introduced earlier in this appendix. The distancefunction (A.15) can be written

D(f, g) = ‖f − g‖sup, ‖f‖sup = supx∈X

|f(x)|.

‖f‖sup is a norm on C(X).Generally, a norm on a vector space V is an assignment f 7→ ‖f‖ ∈ [0,∞), satisfying

‖f‖ = 0 ⇔ f = 0, ‖af‖ = |a| ‖f‖, ‖f + g‖ ≤ ‖f‖+ ‖g‖,

given f, g ∈ V and a a scalar (in R or C). A vector space equipped with a norm is called anormed vector space. It is then a metric space, with distance function D(f, g) = ‖f − g‖.If the space is complete, one calls V a Banach space.

In particular, by Proposition A.16, C(X) is a Banach space, when X is a compactmetric space.

We next give a couple of slightly more sophisticated results on compactness. The fol-lowing extension of Proposition A.11 is a special case of Tychonov’s Theorem.

Proposition A.17. If Xj : j ∈ Z+ are compact metric spaces, so is X =∏∞

j=1 Xj .

Here, we can make X a metric space by setting

(A.18) d(x, y) =∞∑

j=1

2−j dj(pj(x), pj(y))1 + dj(pj(x), pj(y))

,

116

where pj : X → Xj is the projection onto the jth factor. It is easy to verify that, if xν ∈ X,then xν → y in X, as ν →∞, if and only if, for each j, pj(xν) → pj(y) in Xj .

Proof. Following the argument in Proposition A.11, if (xν) is an infinite sequence of pointsin X, we obtain a nested family of subsequences

(A.19) (xν) ⊃ (x1ν) ⊃ (x2

ν) ⊃ · · · ⊃ (xjν) ⊃ · · ·

such that p`(xjν) converges in X`, for 1 ≤ ` ≤ j. The next step is a diagonal construction.

We set

(A.20) ξν = xνν ∈ X.

Then, for each j, after throwing away a finite number N(j) of elements, one obtains from(ξν) a subsequence of the sequence (xj

ν) in (A.19), so p`(ξν) converges in X` for all `.Hence (ξν) is a convergent subsequence of (xν).

The next result is a special case of Ascoli’s Theorem.

Proposition A.18. Let X and Y be compact metric spaces, and fix a modulus of conti-nuity ω(δ). Then

(A.21) Cω =f ∈ C(X, Y ) : d

(f(x), f(x′)

) ≤ ω(d(x, x′)

) ∀x, x′ ∈ X

is a compact subset of C(X, Y ).

Proof. Let (fν) be a sequence in Cω. Let Σ be a countable dense subset of X, as in CorollaryA.5. For each x ∈ Σ, (fν(x)) is a sequence in Y, which hence has a convergent subsequence.Using a diagonal construction similar to that in the proof of Proposition A.17, we obtaina subsequence (ϕν) of (fν) with the property that ϕν(x) converges in Y, for each x ∈ Σ,say

(A.22) ϕν(x) → ψ(x),

for all x ∈ Σ, where ψ : Σ → Y.So far, we have not used (A.21). This hypothesis will now be used to show that ϕν

converges uniformly on X. Pick ε > 0. Then pick δ > 0 such that ω(δ) < ε/3. Since X iscompact, we can cover X by finitely many balls Bδ(xj), 1 ≤ j ≤ N, xj ∈ Σ. Pick M solarge that ϕν(xj) is within ε/3 of its limit for all ν ≥ M (when 1 ≤ j ≤ N). Now, for anyx ∈ X, picking ` ∈ 1, . . . , N such that d(x, x`) ≤ δ, we have, for k ≥ 0, ν ≥ M,

(A.23)

d(ϕν+k(x), ϕν(x)

) ≤ d(ϕν+k(x), ϕν+k(x`)

)+ d

(ϕν+k(x`), ϕν(x`)

)

+ d(ϕν(x`), ϕν(x)

)

≤ ε/3 + ε/3 + ε/3.

Thus (ϕν(x)) is Cauchy in Y for all x ∈ X, hence convergent. Call the limit ψ(x), so wenow have (A.22) for all x ∈ X. Letting k →∞ in (A.23) we have uniform convergence ofϕν to ψ. Finally, passing to the limit ν →∞ in

(A.24) d(ϕν(x), ϕν(x′)) ≤ ω(d(x, x′))

117

gives ψ ∈ Cω.

We want to re-state Proposition A.18, bringing in the notion of equicontinuity. Givenmetric spaces X and Y , and a set of maps F ⊂ C(X, Y ), we say F is equicontinuous at apoint x0 ∈ X provided

(A.25)∀ ε > 0, ∃ δ > 0 such that ∀x ∈ X, f ∈ F ,

dX(x, x0) < δ =⇒ dY (f(x), f(x0)) < ε.

We say F is equicontinuous on X if it is equicontinuous at each point of X. We say F isuniformly equicontinuous on X provided

(A.26)∀ ε > 0, ∃ δ > 0 such that ∀x, x′ ∈ X, f ∈ F ,

dX(x, x′) < δ =⇒ dY (f(x), f(x′)) < ε.

Note that (A.26) is equivalent to the existence of a modulus of continuity ω such thatF ⊂ Cω, given by (A.21). It is useful to record the following result.

Proposition A.19. Let X and Y be metric spaces, F ⊂ C(X, Y ). Assume X is compact.then

(A.27) F equicontinuous =⇒ F is uniformly equicontinuous.

Proof. The argument is a variant of the proof of Proposition A.15. In more detail, supposethere exist xν , x′ν ∈ X, ε > 0, and fν ∈ F such that d(xν , x′ν) ≤ 2−ν but

(A.28) d(fν(xν), fν(x′ν)) ≥ ε.

Taking a convergent subsequence xνj → p ∈ X, we also have x′νj→ p. Now equicontinuity

of F at p implies that there esists N < ∞ such that

(A.29) d(g(xνj ), g(p)) <ε

2, ∀ j ≥ N, g ∈ F ,

contradicting (A.28).

Putting together Propositions A.18 and A.19 then gives the following.

Proposition A.20. Let X and Y be compact metric spaces. If F ⊂ C(X, Y ) is equicon-tinuous on X, then it has compact closure in C(X, Y ).

We next define the notion of a connected space. A metric space X is said to be connectedprovided that it cannot be written as the union of two disjoint nonempty open subsets.The following is a basic class of examples.

118

Proposition A.21. Each interval I in R is connected.

Proof. Suppose A ⊂ I is nonempty, with nonempty complement B ⊂ I, and both sets areopen. Take a ∈ A, b ∈ B; we can assume a < b. Let ξ = supx ∈ [a, b] : x ∈ A Thisexists, as a consequence of the basic fact that R is complete.

Now we obtain a contradiction, as follows. Since A is closed ξ ∈ A. But then, since Ais open, there must be a neighborhood (ξ − ε, ξ + ε) contained in A; this is not possible.

We say X is path-connected if, given any p, q ∈ X, there is a continuous map γ : [0, 1] →X such that γ(0) = p and γ(1) = q. It is an easy consequence of Proposition A.21 that Xis connected whenever it is path-connected.

The next result, known as the Intermediate Value Theorem, is frequently useful.

Proposition A.22. Let X be a connected metric space and f : X → R continuous.Assume p, q ∈ X, and f(p) = a < f(q) = b. Then, given any c ∈ (a, b), there exists z ∈ Xsuch that f(z) = c.

Proof. Under the hypotheses, A = x ∈ X : f(x) < c is open and contains p, whileB = x ∈ X : f(x) > c is open and contains q. If X is connected, then A ∪B cannot beall of X; so any point in its complement has the desired property.

Exercises

1. If X is a metric space, with distance function d, show that

|d(x, y)− d(x′, y′)| ≤ d(x, x′) + d(y, y′),

and henced : X ×X −→ [0,∞) is continuous.

2. Let ϕ : [0,∞) → [0,∞) be a C2 function. Assume

ϕ(0) = 0, ϕ′ > 0, ϕ′′ < 0.

Prove that if d(x, y) is symmetric and satisfies the triangle inequality, so does

δ(x, y) = ϕ(d(x, y)).

Hint. Show that such ϕ satisfies ϕ(s + t) ≤ ϕ(s) + ϕ(t), for s, t ∈ R+.

3. Show that the function d(x, y) defined by (A.18) satisfies (A.1).Hint. Consider ϕ(r) = r/(1 + r).

4. Let X be a compact metric space. Assume fj , f ∈ C(X) and

fj(x) f(x), ∀x ∈ X.

119

Prove that fj → f uniformly on X. (This result is called Dini’s theorem.)Hint. For ε > 0, let Kj(ε) = x ∈ X : f(x)−fj(x) ≥ ε. Note that Kj(ε) ⊃ Kj+1(ε) ⊃ · · · .

Given a metric space X and f : X → [−∞,∞], we say f is lower semicontinuous at x ∈ Xprovided

f−1((c,∞]) ⊂ X is open, ∀ c ∈ R.

We say f is upper semicontinuous provided

f−1([−∞, c)) is open, ∀ c ∈ R.

5. Show that

f is lower semicontinuous ⇐⇒ f−1([−∞, c]) is closed, ∀ c ∈ R,

andf is upper semicontinuous ⇐⇒ f−1([c,∞]) is closed, ∀ c ∈ R.

6. Show that

f is lower semicontinuous ⇐⇒ xn → x implies lim inf f(xn) ≥ f(x).

Show that

f is upper semicontinuous ⇐⇒ xn → x implies lim sup f(xn) ≤ f(x).

7. Given S ⊂ X, show that

χS is lower semicontinuous ⇐⇒ S is open.χS is upper semicontinuous ⇐⇒ S is closed.

8. If X is a compact metric space, show that

f : X → R is lower semicontinuous =⇒ min f is achieved.

9. In the setting of (A.4), let

δ(x, y) =

d1(x1, y1)2 + · · ·+ dm(xm, ym)21/2

.

Show thatδ(x, y) ≤ d(x, y) ≤ √

m δ(x, y).

120

10. Let X and Y be compact metric spaces. Show that if F ⊂ C(X,Y ) is compact, thenF is equicontinuous. (This is a converse to Proposition A.20.)

11. Recall that a Banach space is a complete normed linear space. Consider C1(I), whereI = [0, 1], with norm

‖f‖C1 = supI|f |+ sup

I|f ′|.

Show that C1(I) is a Banach space.

12. Let F = f ∈ C1(I) : ‖f‖C1 ≤ 1. Show that F has compact closure in C(I). Find afunction in the closure of F that is not in C1(I).

121

B. Partitions of unity

In the text we have made occasional use of partitions of unity, and we include somematerial on this topic here. We begin by defining and constructing a continuous partitionof unity on a compact metric space, subordinate to a open cover Uj : 1 ≤ j ≤ N of X.By definition, this is a family of continuous functions ϕj : X → R such that

(B.1) ϕj ≥ 0, supp ϕj ⊂ Uj ,∑

j

ϕj = 1.

To construct such a partition of unity, we do the following. First, it can be shown thatthere is an open cover Vj : 1 ≤ j ≤ N of X and open sets Wj such that

(B.2) V j ⊂ Wj ⊂ W j ⊂ Uj .

Given this, let ψj(x) = dist (x,X \Wj). Then ψj is continuous, supp ψj ⊂ W j ⊂ Uj , andψj is strictly positive on V j . Hence Ψ =

∑j ψj is continuous and strictly positive on X,

and we see that

(B.3) ϕj(x) = Ψ(x)−1ψj(x)

yields such a partition of unity.We indicate how to construct the sets Vj and Wj used above, starting with V1 and W1.

Note that the set K1 = X \ (U2 ∪ · · · ∪ UN ) is a compact subset of U1. Assume it isnonempty; otherwise just throw U1 out and relabel the sets Uj . Now set

V1 = x ∈ U1 : dist (x,K1) < 13dist (x,X \ U1),

andW1 = x ∈ U1 : dist (x,K1) < 2

3dist (x,X \ U1).To construct V2 and W2, proceed as above, but use the cover U2, . . . , UN , V1. Continueuntil done.

Given a smooth compact surface M (perhaps with boundary), covered by coordinatepatches Uj (1 ≤ j ≤ N), one can construct a smooth partition of unity on M , subordinate tothis cover. The main additional tool for this is the construction of a function ψ ∈ C∞0 (Rn)such that

(B.4) ψ(x) = 1 for |x| ≤ 12, ψ(x) = 0 for |x| ≥ 1.

One way to get this is to start with the function on R given by

(B.5)f0(x) = e−1/x for x > 0,

0 for x ≤ 0.

122

It is an exercise to show thatf0 ∈ C∞(R).

Now the functionf1(x) = f0(x)f0( 1

2 − x)

belongs to C∞(R) and is zero outside the interval [0, 1/2]. Hence the function

f2(x) =∫ x

−∞f1(s) ds

belongs to C∞(R), is zero for x ≤ 0, and equals some positive constant (say C2) forx ≥ 1/2. Then

ψ(x) =1C2

f2(1− |x|)

is a function on Rn with the desired properties.With this function in hand, to construct the smooth partition of unity mentioned above

is an exercise we recommend to the reader.

123

C. Differential forms and the change of variable formula

The change of variable formula for one-variable integrals,

(C.1)∫ t

a

f(ϕ(x)

)ϕ′(x) dx =

∫ ϕ(t)

ϕ(a)

f(x) dx,

given f continuous and ϕ of class C1, is easily established, via the fundamental theoremof calculus and the chain rule. We recall how this was done in §0. If we denote the leftside of (C.1) by A(t) and the right by B(t), we apply these results to get

(C.2) A′(t) = f(ϕ(t)

)ϕ′(t) = B′(t),

and since A(a) = B(a) = 0, another application of the fundamental theorem of calculus(or simply the mean value theorem) gives A(t) = B(t).

For multiple integrals, the change of variable formula takes the following form, given inProposition 4.13:

Theorem C.1. Let O, Ω be open sets on Rn and ϕ : O → Ω be a C1 diffeomorphism.Given f continuous on Ω, with compact support, we have

(C.3)∫

O

f(ϕ(x)

)∣∣det Dϕ(x)∣∣ dx =

∫

Ω

f(x) dx.

There are many variants of Theorem C.1. In particular one wants to extend the class offunctions f for which (C.3) holds, but once one has Theorem C.1 as stated, such extensionsare relatively painless. See the derivation of Theorem 4.15.

Let’s face it; the proof of Theorem C.1 given in §4 was a grim affair, involving carefulestimates of volumes of images of small cubes under the map ϕ and numerous pesky details.Recently, P. Lax [L] found a fresh approach to the proof of the multidimensional change ofvariable formula. More precisely, [L] established the following result, from which TheoremC.1 is an easy consequence.

Theorem C.2. Let ϕ : Rn → Rn be a C1 map. Assume ϕ(x) = x for |x| ≥ R. Let f bea continuous function on Rn with compact support. Then

(C.4)∫

f(ϕ(x)

)det Dϕ(x) dx =

∫f(x) dx.

We will give a variant of the proof of [L]. One difference between this proof and that of[L] is that we use the language of differential forms.

Proof of Theorem C.2. Via standard approximation arguments, it suffices to prove thiswhen ϕ is C2 and f ∈ C1

0 (Rn), which we will assume from here on.

124

To begin, pick A > 0 such that f(x − Ae1) is supported in x : |x| > R, wheree1 = (1, 0, . . . , 0). Also take A large enough that the image of x : |x| ≤ R under ϕ doesnot intersect the support of f(x−Ae1). We can set

(C.5) F (x) = f(x)− f(x−Ae1) =∂ψ

∂x1(x),

where

(C.6) ψ(x) =∫ A

0

f(x− se1) ds, ψ ∈ C10 (Rn).

Then we have the following identities involving n-forms:

(C.7)α = F (x) dx1 ∧ · · · ∧ dxn =

∂ψ

∂x1dx1 ∧ · · · ∧ dxn

= dψ ∧ dx2 ∧ · · · ∧ dxn

= d(ψ dx2 ∧ · · · ∧ dxn),

i.e., α = dβ, with β = ψ dx2 ∧ · · · ∧ dxn a compactly supported (n − 1)-form of class C1.Now the pull-back of α under ϕ is given by

(C.8) ϕ∗α = F(ϕ(x)

)detDϕ(x) dx1 ∧ · · · ∧ dxn.

Furthermore, the right side of (C.8) is equal to

(C.9) f(ϕ(x)

)det Dϕ(x) dx1 ∧ · · · ∧ dxn − f(x−Ae1) dx1 ∧ · · · ∧ dxn.

Hence we have

(C.10)

∫f(ϕ(x)

)det Dϕ(x) dx1 · · · dxn −

∫f(x) dx1 · · · dxn

=∫

ϕ∗α =∫

ϕ∗dβ =∫

d(ϕ∗β),

where we use the general identity

(C.11) ϕ∗dβ = d(ϕ∗β),

a consequence of the chain rule. On the other hand, a very special case of Stokes’ theoremapplies to

(C.12) ϕ∗β = γ =∑

j

γj(x) dx1 ∧ · · · ∧ dxj ∧ · · · ∧ dxn,

with γj ∈ C10 (Rn). Namely

(C.13) dγ =∑

j

(−1)j−1 ∂γj

∂xjdx1 ∧ · · · ∧ dxn,

125

and hence, by the fundamental theorem of calculus,

(C.14)∫

dγ = 0.

This gives the desired identity (C.4), from (C.10).

We make some remarks on Theorem C.2. Note that ϕ is not assumed to be one-to-oneor onto. In fact, as noted in [L], the identity (C.4) implies that such ϕ must be onto, andthis has important topological implications. We mention that, if one puts absolute valuesaround det Dϕ(x) in (C.4), the appropriate formula is

(C.15)∫

f(ϕ(x)

) ∣∣detDϕ(x)∣∣ dx =

∫f(x)n(x) dx,

where n(x) = #y : ϕ(y) = x. A proof of (C.15) can be found in texts on geometricalmeasure theory.

As noted in [L], Theorem C.2 was proven in [B-D]. The proof there makes use of dif-ferential forms and Stokes’ theorem, but it is quite different from the proof given here. Acrucial difference is that the proof in [B-D] requires that one knows the change of variableformula as formulated in Theorem C.1.

126

D. Remarks on power series

If a function f is sufficiently differentiable on an interval in R containing x and y, theTaylor expansion about y reads

(D.1) f(x) = f(y) + f ′(y)(x− y) + · · ·+ 1n!

f (n)(y)(x− y)n + Rn(x, y).

Here, Tn(x, y) = f(y) + · · ·+ f (n)(y)(x− y)n/n! is that polynomial of degree n in x all ofwhose x-derivatives of order ≤ n, evaluated at y, coincide with those of f . This prescriptionmakes the formula for Tn(x, y) easy to derive. The analysis of the remainder term Rn(x, y)is more subtle. One useful result about this remainder is the following. Say x > y, and forsimplicity assume f (n+1) is continuous on [y, x]; we say f ∈ Cn+1([y, x]). Then

(D.2) m ≤ f (n+1)(ξ) ≤ M, ∀ ξ ∈ [y, x] =⇒ m(x− y)n+1

(n + 1)!≤ Rn(x, y) ≤ M

(x− y)n+1

(n + 1)!.

Under our hypotheses, this result is equivalent to the Lagrange form of the remainder:

(D.3) Rn(x, y) =1

(n + 1)!(x− y)n+1f (n+1)(ζn),

for some ζn between x and y. There are various proofs of (D.3). One will be given below.One of our purposes here is to comment on how effective estimates on Rn(x, y) are in

determining the convergence of the infinite series

(D.4)∞∑

k=0

f (k)(y)k!

(x− y)k

to f(x). That is to say, we want to perceive that Rn(x, y) → 0 as n →∞, in appropriatecircumstances. Before we look at how effective the estimate (D.2) is at this job, we wantto introduce another player, and along the way discuss the derivation of various formulasfor the remainder in (D.1).

A simple formula for Rn(x, y) follows upon taking the y-derivative of both sides of (D.1);we are assuming that f is at least (n + 1)-fold differentiable. When we do this (applyingthe Leibniz formula to those terms that are products) an enormous amount of cancellationarises, and the formula collapses to

(D.5)∂Rn

∂y= − 1

n!f (n+1)(y)(x− y)n, Rn(x, x) = 0.

If we concentrate on Rn(x, y) as a function of y and look at the difference quotient[Rn(x, y) − Rn(x, x)]/(y − x), an immediate consequence of the mean value theorem isthat

(D.6) Rn(x, y) =1n!

(x− y) (x− ξn)n f (n+1)(ξn),

127

for some ξn between x and y. This result, known as Cauchy’s formula for the remainder,has a slightly more complicated appearance than (D.3), but as we will see it has advantagesover Lagrange’s formula. The application of the mean value theorem to obtain (D.6) doesnot require the continuity of f (n+1), but we do not want to dwell on that point.

If f (n+1) is continuous, we can apply the Fundamental Theorem of Calculus to (D.5),in the y-variable, and obtain the basic integral formula

(D.7) Rn(x, y) =1n!

∫ x

y

(x− s)n f (n+1)(s) ds.

Another proof of (D.7) is indicated in Exercise 9 of §0. If we think of the integral in (D.7)as (x − y) times the mean value of the integrand, we see (D.6) as a consequence. On theother hand, if we want to bring a factor of (x − y)n+1 outside the integral in (D.7), thechange of variable x− s = t(x− y) gives the integral formula

(D.8) Rn(x, y) =1n!

(x− y)n+1

∫ 1

0

tn f (n+1)(ty + (1− t)x

)dt.

If we think of this integral as 1/(n + 1) times a weighted mean value of f (n+1), we recoverthe Lagrange formula (D.3).

From the Lagrange form (D.3) of the remainder in the Taylor series (D.1) we have theestimate

(D.9) |Rn(x, y)| ≤ |x− y|n+1

(n + 1)!sup

ζ∈I(x,y)

|f (n+1)(ζ)|,

where I(x, y) is the open interval from x to y (either (x, y) or (y, x), disregarding thetrivial case x = y). Meanwhile, from the Cauchy form (D.6) of the remainder we have theestimate

(D.10) |Rn(x, y)| ≤ |x− y|n!

supξ∈I(x,y)

|(x− ξ)nf (n+1)(ξ)|.

We now study how effective these estimates are in determining that various power seriesconverge.

We begin with a look at these remainder estimates for the power series expansion aboutthe origin of the simple function

(D.11) f(x) =1

1− x.

We have, for x 6= 1,

(D.12) f (k)(x) = k! (1− x)−k−1,

and formula (D.1) becomes

(D.13)1

1− x= 1 + x + · · ·+ xn + Rn(x, 0).

128

Of course, everyone knows that the infinite series

(D.14) 1 + x + · · ·+ xn + · · ·

converges to f(x) in (D.11), precisely for x ∈ (−1, 1). What we are interested in is whatcan be deduced from the estimate (D.9), which, for the function (D.11), takes the form

(D.15) |Rn(x, 0)| ≤ |x|n+1 · supζ∈I(x,0)

|1− ζ|−n−2.

We consider two cases. First, if x ≤ 0, then |1− ζ| ≥ 1 for ζ ∈ I(x, 0), so

(D.16) x ≤ 0 =⇒ |Rn(x, 0)| ≤ |x|n+1.

Thus the estimate (D.9) implies that Rn(x, 0) → 0 in (D.13), for all x ∈ (−1, 0]. Supposehowever that x ≥ 0. What we have from (D.15) is

(D.17)

x ≥ 0 =⇒ |Rn(x, 0)| ≤ |x|n+1 sup0≤ζ≤x

|1− ζ|−n−2

=1

1− x

( x

1− x

)n+1

.

This tends to 0 as n → ∞ if and only if x < 1 − x, i.e., if and only if x < 1/2. What wehave is the following:

Conclusion. The estimate (D.9) implies the convergence of the Taylor series (about theorigin) for the function f(x) = 1/(1− x), only for −1 < x < 1/2.

This example points to a weakness in the estimate (D.9). Now let us see how well wecan do with the estimate (D.10). For the function (D.11), this takes the form

(D.18) |Rn(x, 0)| ≤ (n + 1) |x| supξ∈I(x,0)

|x− ξ|n|1− ξ|n+2

.

For −1 < x ≤ 0 one has an estimate like (D.16), with a harmless factor of (n + 1) thrownin. On the other hand, one readily verifies that

0 ≤ ξ ≤ x < 1 =⇒ x− ξ

1− ξ≤ x,

so we deduce from (D.18) that

(D.19) 0 ≤ x < 1 =⇒ |Rn(x, 0)| ≤ (n + 1)xn+1

1− x,

which does tend to 0 for all x ∈ [0, 1).One might be wondering if one could come up with some more complicated example, for

which Cauchy’s form is effective only on an interval shorter than the interval of convergence.

129

In fact, you can’t. Cauchy’s form of the remainder is always effective in the interior of theinterval of convergence. A proof of this, making use of some complex analysis, is given in[T3].

We look at some more power series, and see when convergence can be established at anendpoint of an interval of convergence, using the estimate (D.10), i.e.,

(D.20) |Rn(x, y)| ≤ Cn(x, y), Cn(x, y) =|x− y|

n!sup

ξ∈I(x,y)

|(x− ξ)nf (n+1)(ξ)|.

We consider the following family of examples:

(D.21) f(x) = (1− x)a, a > 0.

The power series expansion has radius of convergence 1 (if a is not an integer) and, as wewill see, one has convergence at both endpoints, +1 and −1, whenever a > 0. Let us seewhen Cn(±1, 0) → 0. We have

(D.22) f (n+1)(x) = (−1)n+1a(a− 1) · · · (a− n) (1− x)a−n−1.

Hence

(D.23)

Cn(−1, 0) =∣∣∣a(a− 1) · · · (a− n)

n!

∣∣∣ sup−1<ξ<0

| − 1− ξ|n|1− ξ|n+1−a

=∣∣∣a(1− a)

(1− a

2

)· · ·

(1− a

n

)∣∣∣= O(n−a),

as one can see by applying the log, and using log(1 − a/k) ≤ −a/k for k > a. (Comparethe proof of Proposition D.1.) Hence Cn(−1, 0) → 0 as n →∞, whenever a > 0 in (D.21).On the other hand,

(D.24) Cn(1, 0) =∣∣∣a(a− 1) · · · (a− n)

n!

∣∣∣ sup0<ξ<1

(1− ξ)a−1.

If a ∈ (0, 1), this is identically +∞, while if a ≥ 1 it is O(n−a), as above.

Conclusion. The estimate (D.20) is successful at establishing the convergence of the Tay-lor series (about the origin) for the function f(x) = (1− x)a, at x = −1, whenever a > 0.It fails to establish the convergence at x = +1, when 0 < a < 1, but it is successful whena ≥ 1.

We mention that convergence for x ∈ (−1, 1) is easily checked for the power series ofthe functions (D.21). The failure of (D.20) to establish convergence at x = +1 does notimply failure of such convergence. In fact we have the following result, which will be usefulin Appendix E.

130

Proposition D.1. Given a > 0, the Taylor series about the origin for the function f(x) =(1− x)a converges absolutely and uniformly to f(x) on the closed interval [−1, 1].

Proof. As noted, the series is

(D.25)∞∑

n=0

cn xn, cn = (−1)n a(a− 1) · · · (a− n + 1)n!

,

and, by an analysis parallel to (D.23),

(D.26) |cn| ≤ C n−1−a.

In more detail, if n− 1 > a,

(D.26A) cn = −a

n

∏

1≤k≤a

(1− a

k

) ∏

a<k≤n−1

(1− a

k

),

which we can write as cn = (A/n)bn, where bn denotes the last product in (D.26A). Then

log bn ≤ −∑

a<k≤n−1

a

k≤ −a log n + β,

sobn ≤ e−a log n+β = γn−a,

giving (D.26).Since the right side of (D.26) is summable (by the integral test) whenever a > 0, we see

that the series (D.25) does converge absolutely and uniformly on [−1, 1]; so its limit is acontinuous function fa on [−1, 1]. The remark above has shown that fa(x) = (1− x)a forx ∈ [−1, 1); by continuity this identity also holds at x = 1.

The material above has emphasized the study of the expansion (D.1) for infinitelysmooth f , concentrating on the issue of convergence as n → ∞. The behavior for fixedn (e.g., n = 2) as x → y is also of great interest, and in this connection it is importantto note that (D.1) holds, with a useful formula for Rn(x, y), when f is merely Cn, notnecessarily Cn+1. So suppose f ∈ Cn, i.e., f, f ′, . . . , f (n) are continuous on an interval Iabout y. Then the result (D.7) holds, with n replaced by n− 1; i.e., for x ∈ I we have

(D.27) f(x) = f(y) + f ′(y)(x− y) + · · ·+ 1(n− 1)!

f (n−1)(y)(x− y)n−1 + Rn−1(x, y),

with

(D.28) Rn−1(x, y) =1

(n− 1)!

∫ x

y

(x− s)n−1f (n)(s) ds.

Now we can add and subtract f (n)(y) to the factor f (n)(s) in the integrand above, andobtain

(D.29) Rn−1(x, y) =1n!

f (n)(y)(x− y)n +1

(n− 1)!

∫ x

y

(x− s)n−1[f (n)(s)− f (n)(y)

]ds.

This establishes the following.

131

Proposition D.2. Assume f has n continuous derivatives on an interval I containing y.Then, for x ∈ I, the formula (D.1) holds, with

(D.30) Rn(x, y) =1

(n− 1)!

∫ x

y

(x− s)n−1[f (n)(s)− f (n)(y)

]ds.

Note that since the integral in (D.30) equals x − y times the value of the integrand atsome point s = ξ between x and y, we can write a “Cauchy form” of the remainder (D.30)as

(D.31) Rn(x, y) =1

(n− 1)![f (n)(ξ)− f (n)(y)

](x− ξ)n−1(x− y).

Alternatively, parallel to (D.8), we can write

(D.32) Rn(x− y) =(x− y)n

(n− 1)!

∫ 1

0

[f (n)(sx + (1− s)y)− f (n)(y)

](1− s)n−1 ds,

and obtain a “Lagrange form”:

(D.33) Rn(x, y) =(x− y)n

n![f (n)(ζ)− f (n)(y)

],

for some ζ between x and y. Note that (D.33) also follows by replacing n by n−1 in (D.3).

132

E. The Weierstrass theorem and the Stone-Weierstrass theorem

The following result of Weierstrass is a very useful tool in analysis.

Theorem E.1. Given a compact interval I, any continuous function f on I is a uniformlimit of polynomials.

Otherwise stated, our goal is to prove that the space C(I) of continuous (real valued)functions on I is equal to P(I), the uniform closure in C(I) of the space of polynomials.Our starting point will be the result that the power series for (1−x)a converges uniformlyon [−1, 1], for any a > 0. This was established in §D, and we will use it, with a = 1/2.

From the identity x1/2 = (1− (1− x))1/2, we have x1/2 ∈ P([0, 2]). More to the point,from the identity

(E.1) |x| = (1− (1− x2)

)1/2,

we have |x| ∈ P([−√2,√

2]). Using |x| = b−1|bx|, for any b > 0, we see that |x| ∈ P(I) forany interval I = [−c, c], and also for any closed subinterval, hence for any compact intervalI. By translation, we have

(E.2) |x− a| ∈ P(I)

for any compact interval I. Using the identities

(E.3) max(x, y) =12(x + y) +

12|x− y|, min(x, y) =

12(x + y)− 1

2|x− y|,

we see that for any a ∈ R and any compact I,

(E.4) max(x, a), min(x, a) ∈ P(I).

We next note that P(I) is an algebra of functions, i.e.,

(E.5) f, g ∈ P(I), c ∈ R =⇒ f + g, fg, cf ∈ P(I).

Using this, one sees that, given f ∈ P(I), with range in a compact interval J , one hash f ∈ P(I) for all h ∈ P(J). Hence f ∈ P(I) ⇒ |f | ∈ P(I), and, via (E.3), we deducethat

(E.6) f, g ∈ P(I) =⇒ max(f, g), min(f, g) ∈ P(I).

Suppose now that I ′ = [a′, b′] is a subinterval of I = [a, b]. With the notation x+ =max(x, 0), we have

(E.7) fII′(x) = min((x− a′)+, (b′ − x)+

) ∈ P(I).

133

This is a piecewise linear function, equal to zero off I \ I ′, with slope 1 from a′ to themidpoint m′ of I ′, and slope −1 from m′ to b′.

Now if I is divided into N equal subintervals, any continuous function on I that is linearon each such subinterval can be written as a linear combination of such “tent functions,”so it belongs to P(I). Finally, any f ∈ C(I) can be uniformly approximated by suchpiecewise linear functions, so we have f ∈ P(I), proving the theorem.

A far reaching extension of Theorem E.1, due to M. Stone, is the following result, knownas the Stone-Weierstrass theorem.

Theorem E.2. Let X be a compact metric space, A a subalgebra of CR(X), the algebraof real valued continuous functions on X. Suppose 1 ∈ A and that A separates points ofX, i.e., for distinct p, q ∈ X, there exists hpq ∈ A with hpq(p) 6= hpq(q). Then the closureA is equal to CR(X).

We present the proof in eight steps.

Step 1. By Theorem E.1, if f ∈ A and ϕ : R→ R is continuous, then ϕ f ∈ A.

Step 2. Consequently, if fj ∈ A, then

(E.8) max(f1, f2) =12|f1 − f2|+ 1

2(f1 + f2) ∈ A,

and similarly min(f1, f2) ∈ A.

Step 3. It follows from the hypotheses that if p, q ∈ X and p 6= q, then there existsfpq ∈ A, equal to 1 at p and to 0 at q.

Step 4. Apply an appropriate continuous ϕ : R→ R to get gpq = ϕ fpq ∈ A, equal to 1on a neighborhood of p and to 0 on a neighborhood of q, and satisfying 0 ≤ gpq ≤ 1 on X.

Step 5. Fix p ∈ X and let U be an open neighborhood of p. By Step 4, given q ∈ X \ U ,there exists gpq ∈ A such that gpq = 1 on a neighborhood Oq of p, equal to 0 on aneighborhood Ωq of q, satisfying 0 ≤ gpq ≤ 1 on X.

Now Ωq is an open cover of X \U , so there exists a finite subcover Ωq1 , . . . , ΩqN. Let

(E.9) gpU = min1≤j≤N

gpqj ∈ A.

Then gpU = 1 on O = ∩N1 Oqj , an open neighborhood of p, gpU = 0 on X \ U , and

0 ≤ gpU ≤ 1 on X.

Step 6. Take K ⊂ U ⊂ X, K closed, U open. By Step 5, for each p ∈ K, there existsgpU ∈ A, equal to 1 on a neighborhood Op of p, and equal to 0 on X \ U .

Now Op covers K, so there exists a finite subcover Op1 , . . . ,Opm . Let

(E.10) gKU = max1≤j≤M

gpjU ∈ A.

134

We have

(E.11) gKU = 1 on K, 0 on X \ U, and 0 ≤ gKU ≤ 1 on X.

Step 7. Take f ∈ CR(X) such that 0 ≤ f ≤ 1 on X. Fix k ∈ N and set

(E.12) K` =

x ∈ X : f(x) ≥ `

k

,

so X = K0 ⊃ · · · ⊃ K` ⊃ K`+1 ⊃ · · ·Kk ⊃ Kk+1 = ∅. Define open U` ⊃ K` by

(E.13) U` =

x ∈ X : f(x) >`− 1

k

, so X \ U` =

x ∈ X : f(x) ≤ `− 1

k

.

By Step 6, there exist ψ` ∈ A such that

(E.14) ψ` = 1 on K`, ψ` = 0 on X \ U`, and 0 ≤ ψ` ≤ 1 on X.

Let

(E.15) fk = max0≤`≤k

`

kψ` ∈ A.

It follows that fk ≥ `/k on K` and fk ≤ (`−1)/k on X \U`, for all `. Hence fk ≥ (`−1)/kon K`−1 and fk ≤ `/k on U`+1. In other words,

(E.16)`− 1

k≤ f(x) ≤ `

k=⇒ `− 1

k≤ fk(x) ≤ `

k,

so

(E.17) |f(x)− fk(x)| ≤ 1k

, ∀x ∈ X.

Step 8. It follows from Step 7 that if f ∈ CR(X) and 0 ≤ f ≤ 1 on X, then f ∈ A. It isan easy final step to see that f ∈ CR(X) ⇒ f ∈ A.

Theorem E.2 has a complex analogue. In that case, we add the assumption that f ∈A ⇒ f ∈ A, and conclude that A = C(X). This is easily reduced to the real case.

Here are a couple of applications of Theorem E.2, in its real and complex forms:

Corollary E.3. If X is a compact subset of Rn, then every f ∈ C(X) is a uniform limitof polynomials on Rn.

Corollary E.4. The space of trigonometric polynomials, given by

(E.18)N∑

k=−N

ak eikθ,

is dense in C(S1).

135

F. Convolution approach to the Weierstrass approximation theorem

If u is bounded and continuous on R and f is integrable (say f ∈ R(R)) we define theconvolution f ∗ u by

(F.1) f ∗ u(x) =∫ ∞

−∞f(y)u(x− y) dy.

Clearly

(F.2)∫|f | dx = A, |u| ≤ M on R =⇒ |f ∗ u| ≤ AM on R.

Also, a change of variable gives

(F.3) f ∗ u(x) =∫ ∞

−∞f(x− y)u(y) dy.

We want to analyze the convolution action of a sequence of integrable functions fn onR that satisfy the following conditions:

(F.4) fn ≥ 0,

∫fn dx = 1,

∫

R\In

fn dx = εn → 0,

where

(F.5) In = [−δn, δn], δn → 0.

Let f ∈ C(R) be supported on a bounded interval [−A,A], or more generally, assume

(F.6) u ∈ C(R), |u| ≤ M on R,

and u is uniformly continuous on R, so with δn as in (F.5),

(F.7) |x− x′| ≤ 2δn =⇒ |u(x)− u(x′)| ≤ εn → 0.

We aim to prove the following.

Proposition F.1. If fn ∈ R(R) satisfy (F.4)–(F.5) and if u ∈ C(R) is bounded anduniformly continuous (satisfying (F.6)–(F.7)), then

(F.8) un = fn ∗ u −→ u, uniformly on R, as n →∞.

136

Proof. To start, write

(F.9)

un(x) =∫

fn(y)u(x− y) dy

=∫

In

fn(y)u(x− y) dy +∫

R\In

fn(y)u(x− y) dy

= vn(x) + rn(x).

Clearly

(F.10) |rn(x)| ≤ Mεn, ∀x ∈ R.

Next,

(F.11) vn(x)− u(x) =∫

In

fn(y)[u(x− y)− u(x)] dy − εnu(x),

so

(F.12) |vn(x)− u(x)| ≤ εn + Mεn, ∀x ∈ R,

hence

(F.13) |un(x)− u(x)| ≤ εn + 2Mεn, ∀x ∈ R,

yielding (F.8).

Here is a sequence of functions (fn) satisfying (F.4)–(F.5). First, set

(F.14) gn(x) =1

An(x2 − 1)n, An =

∫ 1

−1

(x2 − 1)n dx,

and then set

(F.15)fn(x) = gn(x), |x| ≤ 1,

0, |x| ≥ 1.

It is readily verified that such (fn) satisfy (F.4)–(F.5). We will use this sequence inProposition F.1 to prove the Weierstrass approximation theorem, whose statement werecall:

Proposition F.2. If I = [a, b] ⊂ R is a closed, bounded interval, and f ∈ C(I), thenthere exist polynomials pn(x) such that

(F.16) pn −→ f, uniformly on I.

137

Proof. To start, we note that by an affine change of variable, there is no loss of generalityin assuming that

(F.17) I =[−1

4,14

].

Next, given I as in (F.17) and f ∈ C(I), it is easy to extend f to a function

(F.18) u ∈ C(R), u(x) = 0 for |x| ≥ 12.

Now, with fn as in (F.14)–(F.15), we can apply Proposition F.1 to deduce that

(F.19) un(x) =∫

fn(y)u(x− y) dy =⇒ un → u uniformly on R.

Now

(F.20)|x| ≤ 1

2=⇒ u(x− y) = 0 for |y| > 1

=⇒ un(x) =∫

gn(y)u(x− y) dy,

that is,

(F.21) |x| ≤ 12

=⇒ un(x) = pn(x),

where

(F.22)pn(x) =

∫gn(y)u(x− y) dy

=∫

gn(x− y)u(y) dy.

The last identity makes it clear that each pn(x) is a polynomial in x. Since (F.19) and(F.21) imply

(F.23) pn −→ u uniformly on[−1

2,12

],

we have (F.16).

138

G. Fourier series – first steps

We work on T1 = R/(2πZ), which under θ 7→ eiθ is equivalent to S1 = z ∈ C : |z| = 1.Given f ∈ C(T1), or more generally f ∈ R(T1), we set, for k ∈ Z,

(G.1) f(k) =12π

∫ 2π

0

f(θ)e−ikθ dθ.

We call f(k) the Fourier coefficients of f . We say

(G.2) f ∈ A(T1) ⇐⇒∞∑

k=−∞|f(k)| < ∞.

We aim to prove the following.

Proposition G.1. Given f ∈ C(T1), if f ∈ A(T1), then

(G.3) f(θ) =∞∑

k=−∞f(k)eikθ.

Proof. Given∑ |f(k)| < ∞, the right side of (G.3) is absolutely and uniformly convergent,

defining

(G.4) g(θ) =∞∑

k=−∞f(k)eikθ, g ∈ C(T1),

and our task is to show that f ≡ g. Making use of the identities

(G.5)12π

∫ 2π

0

ei`θ dθ = 0, if ` 6= 0,

1, if ` = 0,

we get f(k) = g(k), for all k ∈ Z. Let us set u = f − g. We have

(G.6) u ∈ C(T1), u(k) = 0, ∀ k ∈ Z.

It remains to show that this implies u ≡ 0. To prove this, we use Corollary E.4, whichimplies that, for each v ∈ C(T1), there exist trigonometric polynomials, i.e., finite linearcombinations vN of eikθ : k ∈ Z, such that

(G.7) vN −→ v uniformly on T1.

139

Now (G.6) implies ∫

T1

u(θ)vN (θ) dθ = 0, ∀N,

and passing to the limit, using (G.7), gives

(G.8)∫

T1

u(θ)v(θ) dθ = 0, ∀ v ∈ C(T1).

Taking v = u gives

(G.9)∫

T1

|u(θ)|2 dθ = 0,

forcing u ≡ 0, and completing the proof.

Remark. One can extend Proposition G.1 slightly, allowing a priori that f ∈ R(T1) andf ∈ A(T1). In such a case, (G.4)–(G.8) continue to hold, and it can be deduced from (G.8)that u(θ) = 0, except perhaps on a set of upper content 0. Hence (G.3) holds, except ona set of upper content 0. However, this is not an extension worthy of emphasizing. It justmeans that f can be altered on a set of upper content 0 to be continuous.

We seek conditions on f that imply (G.2). Integration by parts for f ∈ C1(T1) gives,for k 6= 0,

(G.10)f(k) =

12π

∫ 2π

0

f(θ)i

k

∂

∂θ(e−ikθ) dθ

=1

2πik

∫ 2π

0

f ′(θ)e−ikθ dθ,

hence

(G.11) |f(k)| ≤ 12π|k|

∫ 2π

0

|f ′(θ)| dθ.

If f ∈ C2(T1), we can integrate by parts a second time, and get

(G.12) f(k) = − 12πk2

∫ 2π

0

f ′′(θ)e−ikθ dθ,

hence

|f(k)| ≤ 12πk2

∫ 2π

0

|f ′′(θ)| dθ.

140

In concert with

(G.13) |f(k)| ≤ 12π

∫ 2π

0

|f(θ)| dθ,

which follows from (G.1), we have

(G.14) |f(k)| ≤ 12π(k2 + 1)

∫ 2π

0

[|f ′′(θ)|+ |f(θ)|] dθ.

Hence

(G.15) f ∈ C2(T1) =⇒∑

|f(k)| < ∞.

We will sharpen this implication below. We start with an interesting example. Consider

(G.16) f(θ) = |θ|, −π ≤ θ ≤ π,

and extend this to be periodic of period 2π, yielding f ∈ C(T1). We have

(G.17)f(k) =

12π

∫ π

−π

|θ|e−ikθ dθ

= −[1− (−1)k]1

πk2,

for k 6= 0, while f(0) = π/2. This is clearly a summable series, so f ∈ A(T1), andProposition G.1 implies that, for −π ≤ θ ≤ π,

(G.18)

|θ| = π

2−

∑

k odd

2πk2

eikθ

=π

2− 4

π

∞∑

`=0

1(2` + 1)2

cos(2` + 1)θ.

Now, evaluating this at θ = 0 yields the identity

(G.19)∞∑

`=0

1(2` + 1)2

=π2

8.

Using this, we can evaluate

(G.20) S =∞∑

k=1

1k2

,

141

as follows. We have

(G.21)

∞∑

k=1

1k2

=∑

k≥1 odd

1k2

+∑

k≥2 even

1k2

=π2

8+

14

∞∑

`=1

1`2

,

hence S − S/4 = π2/8, so

(G.22)∞∑

k=1

1k2

=π2

6.

We see from (G.17) that if f is given by (G.16), then f(k) satisfies

(G.23) |f(k)| ≤ C

k2 + 1.

This is a special case of the following generalization of (G.15).

Proposition G.2. Let f be Lipschitz continuous and piecewise C2 on S1. Then (G.23)holds.

Proof. Here we are assuming f is C2 on S1\p1, . . . , p`, and f ′ and f ′′ have limits at eachof the endpoints of the associated intervals in S1, but f is not assumed to be differentiableat the endpoints p`. We can write f as a sum of functions fν , each of which is Lipschitzon S1, C2 on S1 \ pν , and f ′ν and f ′′ν have limits as one approaches pν from either side.It suffices to show that each fν(k) satisfies (G.23). Now g(θ) = fν(θ + pν − π) is singularonly at θ = π, and g(k) = fν(k)eik(pν−π), so it suffices to prove Proposition G.2 when fhas a singularity only at θ = π. In other words, f ∈ C2([−π, π]), and f(−π) = f(π).

In this case, we still have (G.10), since the endpoint contributions from integration byparts still cancel. A second integration by parts gives, in place of (G.12),

(G.24)f(k) =

12πik

∫ π

−π

f ′(θ)i

k

∂

∂θ(e−ikθ) dθ

= − 12πk2

[∫ π

−π

f ′′(θ)e−ikθ dθ + f ′(π)− f ′(−π)],

which yields (G.23).

We next make use of (G.5) to produce results on∫T1 |f(θ)|2 dθ, starting with the fol-

lowing.

Proposition G.3. Given f ∈ A(T1),

(G.25)∑

|f(k)|2 =12π

∫

T1

|f(θ)|2 dθ.

142

More generally, if also g ∈ A(T1),

(G.26)∑

f(k)g(k) =12π

∫

T1

f(θ)g(θ) dθ.

Proof. Switching order of summation and integration and using (G.5), we have

(G.27)

12π

∫

T1

f(θ)g(θ) dθ =12π

∫

T1

∑

j,k

f(j)g(k)e−i(j−k)θ dθ

=∑

k

f(k)g(k),

giving (G.26). Taking g = f gives (G.25).

We will extend the scope of Proposition G.3 below. Closely tied to this is the issue ofconvergence of SNf to f as N →∞, where

(G.28) SNf(θ) =∑

|k|≤N

f(k)eikθ.

Clearly f ∈ A(S1) ⇒ SNf → f uniformly on T1 as N → ∞. Here, we are interested inconvergence in L2-norm, where

(G.29) ‖f‖2L2 =12π

∫

T1

|f(θ)|2 dθ.

Given f ∈ R(T1), this defines a “norm,” satisfying the following result, called the triangleinequality:

(G.30) ‖f + g‖L2 ≤ ‖f‖L2 + ‖g‖L2 .

See Appendix H for details on this. Behind these results is the fact that

(G.31) ‖f‖2L2 = (f, f)L2 ,

where, when f and g belong to R(T1), we set

(G.32) (f, g)L2 =12π

∫

S1

f(θ)g(θ) dθ.

Thus the content of (G.25) is that

(G.33)∑

|f(k)|2 = ‖f‖2L2 ,

143

and that of (G.26) is that

(G.34)∑

f(k)g(k) = (f, g)L2 .

The left side of (G.33) is the square norm of the sequence (f(k)) in `2. Generally, asequence (ak) (k ∈ Z) belongs to `2 if and only if

(G.35) ‖(ak)‖2`2 =∑

|ak|2 < ∞.

There is an associated inner product

(G.36) ((ak), (bk)) =∑

akbk.

As in (G.30), one has (see Appendix H)

(G.37) ‖(ak) + (bk)‖`2 ≤ ‖(ak)‖`2 + ‖(bk)‖`2 .

As for the notion of L2-norm convergence, we say

(G.38) fν → f in L2 ⇐⇒ ‖f − fν‖L2 → 0.

There is a similar notion of convergence in `2. Clearly

(G.39) ‖f − fν‖L2 ≤ supθ|f(θ)− fν(θ)|.

In view of the uniform convergence SNf → f for f ∈ A(T1) noted above, we have

(G.40) f ∈ A(T1) =⇒ SNf → f in L2, as N →∞.

The triangle inequality implies

(G.41)∣∣∣‖f‖L2 − ‖SNf‖L2

∣∣∣ ≤ ‖f − SNf‖L2 ,

and clearly (by Proposition G.3)

(G.42) ‖SNf‖2L2 =N∑

k=−N

|f(k)|2,

so

(G.43) ‖f − SNf‖L2 → 0 as N →∞ =⇒ ‖f‖2L2 =∑

|f(k)|2.

We now consider more general functions f ∈ R(T1). With f(k) and SNf defined by(G.1) and (G.28), we define RNf by

(G.44) f = SNf + RNf.

144

Note that∫T1 f(θ)e−ikθ dθ =

∫T1 SNf(θ)e−ikθ dθ for |k| ≤ N , hence

(G.45) (f, SNf)L2 = (SNf, SNf)L2 ,

and hence

(G.46) (SNf, RNf)L2 = 0.

Consequently,

(G.47)‖f‖2L2 = (SNf + RNf, SNf + RNf)L2

= ‖SNf‖2L2 + ‖RNf‖2L2 .

In particular,

(G.48) ‖SNf‖L2 ≤ ‖f‖L2 .

We are now in a position to prove the following.

Lemma G.4. Let f, fν be square integrable on S1. Assume

(G.49) limν→∞

‖f − fν‖L2 = 0,

and, for each ν,

(G.50) limN→∞

‖fν − SNfν‖L2 = 0.

Then

(G.51) limN→∞

‖f − SNf‖L2 = 0.

Proof. Writing f − SNf = (f − fν) + (fν − SNfν) + SN (fν − f), and using the triangleinequality, we have, for each ν,

(G.52) ‖f − SNf‖L2 ≤ ‖f − fν‖L2 + ‖fν − SNfν‖L2 + ‖SN (fν − f)‖L2 .

Taking N →∞ and using (G.48), we have

(G.53) lim supN→∞

‖f − SNf‖L2 ≤ 2‖f − fν‖L2 ,

for each ν. Then (G.49) yields the desired conclusion (G.51).

Given f ∈ C(S1), we have trigonometric polynomials fν → f uniformly on T1, andclearly (G.50) holds for each such fν . Thus Lemma G.4 yields the following.

(G.54)f ∈ C(S1) =⇒ SNf → f in L2, and

∑|f(k)|2 = ‖f‖2L2 .

145

Lemma G.4 also applies to many discontinuous functions. Consider, for example

(G.55)f(θ) = 0 for − π < θ < 0,

1 for 0 < θ < π.

We can set, for ν ∈ N,

(G.56)

fν(θ) = 0 for − π < θ < 0,

νθ for 0 ≤ θ ≤ 1ν

,

1 for1ν≤ θ < π.

Then each fν ∈ C(T1). (In fact, fν ∈ A(T1), by Proposition G.2.). Also, one can checkthat ‖f − fν‖2L2 ≤ 1/ν. Thus the conclusion in (G.54) holds for f given by (G.55).

More generally, any piecewise continuous function on T1 is an L2 limit of continuousfunctions, so the conclusion of (G.54) holds for them. To go further, let us recall the classof Riemann integrable functions. A function f : T1 → R is Riemann integrable providedf is bounded (say |f | ≤ M) and, for each δ > 0, there exist piecewise constant functionsgδ and hδ on T1 such that

(G.57) gδ ≤ f ≤ hδ, and∫

T1

(hδ(θ)− gδ(θ)

)dθ < δ.

Then

(G.58)∫

T1

f(θ) dθ = limδ→0

∫

T1

gδ(θ) dθ = limδ→0

∫

T1

hδ(θ) dθ.

Note that we can assume |hδ|, |gδ| < M + 1, and so

(G.59)

12π

∫

T1

|f(θ)− gδ(θ)|2 dθ ≤ M + 1π

∫

T1

|hδ(θ)− gδ(θ)| dθ

<M + 1

πδ,

so gδ → f in L2-norm. A function f : T1 → C is Riemann integrable provided its real andimaginary parts are. In such a case, there are also piecewise constant functions fν → f inL2-norm, so

(G.60)f ∈ R(T1) =⇒ SNf → f in L2, and

∑|f(k)|2 = ‖f‖2L2 .

This is not the end of the story. There are unbounded functions on T1 that are squareintegrable, such as

(G.61) f(θ) = |θ|−α on [−π, π], 0 < α <12.

146

In such a case, one can take fν(θ) = min(f(θ), ν), ν ∈ N. Then each fν is continuous and‖f − fν‖L2 → 0 as ν →∞. Hence the conclusion of (G.60) holds for such f .

The ultimate theory of functions for which the result

(G.62) SNf −→ f in L2-norm

holds was produced by H. Lebesgue in what is now known as the theory of Lebesguemeasure and integration. There is the notion of measurability of a function f : T1 →C. One says f ∈ L2(T1) provided f is measurable and

∫T1 |f(θ)|2 dθ < ∞, the integral

here being the Lebesgue integral. Actually, L2(T1) consists of equivalence classes of suchfunctions, where f1 ∼ f2 if and only if

∫ |f1(θ) − f2(θ)|2 dθ = 0. With `2 as in (G.35), itis then the case that

(G.63) F : L2(T1) −→ `2,

given by

(G.64) (Ff)(k) = f(k),

is one-to-one and onto, with

(G.65)∑

|f(k)|2 = ‖f‖2L2 , ∀ f ∈ L2(T1),

and

(G.66) SNf −→ f in L2, ∀ f ∈ L2(T1).

For the reader who has not seen Lebesgue integration, we refer to books on the subject(e.g., [T2]) for more information.

We mention two key propositions which, together with the arguments given above,establish these results. The fact that Ff ∈ `2 for all f ∈ L2(T1) and (G.65)–(G.66) holdfollows via Lemma G.4 from the following.

Proposition A. Given f ∈ L2(T1), there exist fν ∈ C(T1) such that fν → f in L2.

As for the surjectivity of F in (G.63), note that, given (ak) ∈ `2, the sequence

fν(θ) =∑

|k|≤ν

akeikθ

satisfies, for µ > ν,

‖fµ − fν‖2L2 =∑

ν<|k|≤µ

|ak|2 → 0 as ν →∞.

That is to say, (fν) is a Cauchy sequence in L2(T1). Surjectivity follows from the fact thatCauchy sequences in L2(T1) always converge to a limit:

147

Proposition B. If (fν) is a Cauchy sequence in L2(T1), there exists f ∈ L2(T1) suchthat fν → f in L2-norm.

Proofs of these results can be found in the standard texts on measure theory and inte-gration, such as [T2].

We now establish a sufficient condition for a function f to belong to A(T1), more generalthan that in Proposition G.2.

Proposition G.5. If f is a continuous, piecewise C1 function on S1, then∑ |f(k)| < ∞.

Proof. As in the proof of Proposition G.2, we can reduce the problem to the case f ∈C1([−π, π]), f(−π) = f(π). In such a case, with g = f ′ ∈ C([−π, π]), the integration byparts argument (G.10) gives

(G.67) f(k) =1ik

g(k), k 6= 0.

By (G.60),

(G.68)∑

|g(k)|2 = ‖g‖2L2 .

Also, by Cauchy’s inequality (cf. Appendix H),

(G.69)

∑

k 6=0

|f(k)| ≤(∑

k 6=0

1k2

)1/2(∑

k 6=0

|g(k)|2)1/2

≤ C‖g‖L2 .

This completes the proof.

Exercises

1. Compute f(k) whenf(θ) = 1 for 0 < θ < π,

0 for − π < θ < 0.

Then use (G.60) to obtain another proof of (G.22).

2. Apply (G.60) when f(θ) is given by (G.16). Use this to show that

∞∑

k=1

1k4

=π4

90.

148

H. Inner product spaces

On occasion, particularly in Appendix G, we have looked at norms and inner productson spaces of functions, such as C(S1) and R(R), which are vector spaces. Generally, acomplex vector space V is a set on which there are operations of vector addition:

(H.1) f, g ∈ V =⇒ f + g ∈ V,

and multiplication by an element of C (called scalar multiplication):

(H.2) a ∈ C, f ∈ V =⇒ af ∈ V,

satisfying the following properties. For vector addition, we have

(H.3) f + g = g + f, (f + g) + h = f + (g + h), f + 0 = f, f + (−f) = 0.

For multiplication by scalars, we have

(H.4) a(bf) = (ab)f, 1 · f = f.

Furthermore, we have two distributive laws:

(H.5) a(f + g) = af + ag, (a + b)f = af + bf.

These properties are readily verified for the function spaces mentioned above.An inner product on a complex vector space V assigns to elements f, g ∈ V the quantity

(f, g) ∈ C, in a fashion that obeys the following three rules:

(H.6)

(a1f1 + a2f2, g) = a1(f1, g) + a2(f2, g),

(f, g) = (g, f),

(f, f) > 0 unless f = 0.

A vector space equipped with an inner product is called an inner product space. Forexample,

(H.7) (f, g) =12π

∫

S1

f(θ)g(θ) dθ

defines an inner product on C(S1), and also on R(S1), where we identify two functionsthat differ only on a set of upper content zero. Similarly,

(H.8) (f, g) =∫ ∞

−∞f(x)g(x) dx

149

defines an inner product on R(R) (where, again, we identify two functions that differ onlyon a set of upper content zero).

As another example, in we define `2 to consist of sequences (ak)k∈Z such that

(H.9)∞∑

k=−∞|ak|2 < ∞.

An inner product on `2 is given by

(H.10)((ak), (bk)

)=

∞∑

k=−∞akbk.

Given an inner product on V , one says the object ‖f‖ defined by

(H.11) ‖f‖ =√

(f, f)

is the norm on V associated with the inner product. Generally, a norm on V is a functionf 7→ ‖f‖ satisfying

‖af‖ = |a| · ‖f‖, a ∈ C, f ∈ V,(H.12)

‖f‖ > 0 unless f = 0,(H.13)

‖f + g‖ ≤ ‖f‖+ ‖g‖.(H.14)

The property (H.14) is called the triangle inequality. A vector space equipped with a normis called a normed vector space. We can define a distance function on such a space by

(H.15) d(f, g) = ‖f − g‖.

Properties (H.12)–(H.14) imply that d : V × V → [0,∞) satisfies the properties in (A.1),making V a metric space.

If ‖f‖ is given by (H.11), from an inner product satisfying (H.6), it is clear that (H.12)–(H.13) hold, but (H.14) requires a demonstration. Note that

(H.16)

‖f + g‖2 = (f + g, f + g)

= ‖f‖2 + (f, g) + (g, f) + ‖g‖2= ‖f‖2 + 2 Re(f, g) + ‖g‖2,

while

(H.17) (‖f‖+ ‖g‖)2 = ‖f‖2 + 2‖f‖ · ‖g‖+ ‖g‖2.

Thus to establish (H.17) it suffices to prove the following, known as Cauchy’s inequality.

150

Proposition H.1. For any inner product on a vector space V , with ‖f‖ defined by (H.11),

(H.18) |(f, g)| ≤ ‖f‖ · ‖g‖, ∀ f, g ∈ V.

Proof. We start with

(H.19) 0 ≤ ‖f − g‖2 = ‖f‖2 − 2 Re(f, g) + ‖g‖2,

which implies

(H.20) 2 Re(f, g) ≤ ‖f‖2 + ‖g‖2, ∀ f, g ∈ V.

Replacing f by af for arbitrary a ∈ C of absolute velue 1 yields 2 Re a(f, g) ≤ ‖f‖2 +‖g‖2,for all such a, hence

(H.21) 2|(f, g)| ≤ ‖f‖2 + ‖g‖2, ∀ f, g ∈ V.

Replacing f by tf and g by t−1g for arbitrary t ∈ (0,∞), we have

(H.22) 2|(f, g)| ≤ t2‖f‖2 + t−2‖g‖2, ∀ f, g ∈ V, t ∈ (0,∞).

If we take t2 = ‖g‖/‖f‖, we obtain the desired inequality (H.18). This assumes f and gare both nonzero, but (H.18) is trivial if f or g is 0.

An inner product space V is called a Hilbert space if it is a complete metric space, i.e.,if every Cauchy sequence (fν) in V has a limit in V . The space `2 has this completenessproperty, but C(S1), with inner product (H.7), does not, nor does R(S1). Appendix Adescribes a process of constructing the completion of a metric space. When appied to anincomplete inner product space, it produces a Hilbert space. When this process is appliedto C(S1), the completion is the space L2(S1). An alternative construction of L2(S1) usesthe Lebesgue integral.

151

References

[AM] R. Abraham and J. Marsden, Foundations of Mechanics, Benjamin, Reading, Mass.,1978.

[Ahl] L. Ahlfors, Complex Analysis, McGraw-Hill, New York, 1979.[Ar] V. Arnold, Mathematical Methods of Classical Mechanics, Springer, New York,

1978.[B-D] L. Baez-Duarte, Brouwer’s fixed-point theorem and a generalization of the formula

for change of variables in multiple integrals, J. Math. Anal. Appl. 177 (1993),412–414.

[Bu] R. Buck, Advanced Calculus, McGraw-Hill, New York, 1978.[Fed] H. Federer, Geometric Measure Theory, Springer, New York, 1969.[Fle] W. Fleming, Functions of Several Variables, Addison-Wesley, Reading, Mass., 1965.[Fol] G. Folland, Real Analysis: Modern Techniques and Applications, Wiley-Interscience,

New York, 1984.[Fr] P. Franklin, A Treatise on Advanced Calculus, John Wiley, New York, 1955.[Go] H. Goldstein, Classical Mechanics, Addison-Wesley, New York, 1950.[Hal] J. Hale, Ordinary Differential Equations, Wiley, New York, 1969.[Hil] E. Hille, Analytic Function Theory, Chelsea Publ., New York, 1977.[HS] M. Hirsch and S. Smale, Differential Equations, Dynamical Systems, and Linear

Algebra, Academic Press, New York, 1974.[Kan] Y. Kannai, An elementary proof of the no-retraction theorem, Amer. Math.

Monthly 88 (1981), 264–268.[Kel] O. Kellog, Foundations of Potential Theory, Dover, New York, 1953.[Kr] S. Krantz, Function Theory of Several Complex Variables, Wiley, New York, 1982.[L] P. Lax, Change of variables in multiple integrals, Amer. Math. Monthly 106

(1999), 497–501.[Lef] S. Lefschetz, Differential Equations, Geometric Theory, J. Wiley, New York, 1957.[LS] L. Loomis and S. Sternberg, Advanced Calculus, Addison-Wesley, New York, 1968.

[NSS] H. Nickerson, D. Spencer, and N. Steenrod, Advanced Calculus, Van Nostrand,Princeton, New Jersey, 1959.

[Poi] H. Poincare, Les Methodes Nouvelles de la Mecanique Celeste, Gauthier-Villars,1899.

[Spi] M. Spivak, Calculus on Manifolds, Benjamin, New York, 1965.[Stb] S. Sternberg, Lectures on Differential Geometry, Prentice Hall, New Jersey, 1964.[Sto] J. J. Stoker, Differential Geometry, Wiley-Interscience, New York, 1969.

[T] M. Taylor, Partial Differential Equations, Springer-Verlag, New York, 1996.[T:1] Basic Theory of ODE and Vector Fields, Chapter 1 of [T].

[T2] M. Taylor, Measure Theory and Integration, GSM #76, American MathematicalSociety, Providence RI, 2006.

[T3] M. Taylor, The remainder term in Taylor series, Preprint, 1999.[Wil] E. B. Wilson, Advanced Calculus, Dover, New York, 1958. (Reprint of 1912 ed.)

152

Index

Aalgebra

of sets 51analytic function

complex 24real 35

anticommutation relation 81area

of spheres 69arcsin 43arctan 44Ascoli’s theorem 114autonomous system 39averaging over rotations 71

Bbasis 28, 64Brouwer fixed-point theorem 106boundary 51

CCauchy sequence 33, 108Cauchy-Riemann equations 24, 36, 99, 103Cauchy’s integral formula 99, 104Cauchy’s integral theorem 99chain rule 14, 16, 19, 44, 66, 105change of variable formula 54, 56, 59, 62, 67, 77, 118characteristic function 8, 50closed set 108closure 50, 109cofactor 30compact space 109complete metric space 33, 108completion of a metric space 108conformal map 105connected set 115content 8, 50continuous 6, 49

153

uniformly 6, 112Contraction Mapping Principle 33, 38coordinate chart 65corner 87cos 42, 43, 105Cramer’s formula 26cross product 31, 67curl 84, 93, 94cusp 88

D

Darboux’s Theorem 7derivation 92derivative 9, 16determinant 26, 28, 77diagonal construction 114diffeomorphism 32, 42, 56, 65differential equation 38differential form 75

closed 83exact 83

div 84, 89, 90Divergence Theorem 89, 91, 97

E

Euclidean norm 16Euler’s formula 42exponential 41, 42

of a matrix 70exterior derivative 82

F

fixed point 33, 106flow 45Fubini’s Theorem 53, 60Fundamental Theorem of Algebra 102Fundamental Theorem of Calculus 9, 10, 14, 17, 25, 86, 97

G

Gamma function 69Gaussian integral 60

154

Gl(n,R) 54, 64grad 84Green’s formula 93Green’s Theorem 89, 95, 99

HHaar measure 71harmonic conjugate 103harmonic function 101Hessian 20holomorphic function 24, 36, 99

IImplicit Function Theorem 34, 74initial value problem 41integral curve 45, 85integral equation 38integrating factor 85integration by parts 14, 70interior 51, 109interior product 76Intermediate Value Theorem 115Inverse Function Theorem 32, 65, 70iterated integral 25, 53

LLaplace operator 92Lie bracket 46Liouville’s Theorem 102Lipschitz 9, 38, 52, 63log 42

Mmatrix 16, 21, 23, 54, 64, 66maximum 11, 22, 112mean value property 102Mean Value Theorem 10, 112metric space 108metric tensor 66, 71minimum 11, 22, 112modulus of continuity 112multi-index 19

155

N

neighborhood 108Newton’s method 36nil set 51, 52normal derivative 92normal vector

outward 89, 91, 96positive 93, 94, 96

O

open set 108orbit 45orientation 77, 86, 88, 90

P

partial derivative 16partition 5, 48, 61

of unity 86, 92, 116permutation 28

sign of 28pi (π) 43, 44, 72Picard iteration 38, 41Poincare lemma 83, 88polar coordinates 59, 61, 69positive-definite matrix 22, 32, 66power series 25, 36, 100, 121pull-back 75, 77, 82

R

rational numbers 7, 108real numbers 108Riemann integral 5, 48row reduction 64

S

Saddle point 22sin 42, 43, 105Skew(n) 70SO(n) 31, 70

156

spherearea of 69

spherical polar coordinates 61, 68Stokes formula 86, 94, 95, 106Stone-Weierstrass theorem 128surface 65surface integral 67, 72

T

tan 44tangent space 65tangent vector

forward 94, 96Taylor’s formula with remainder 14, 19, 122triangle inequality 108trigonometric functions 43Tychonov’s theorem 113

V

vector field 45, 89, 90, 95, 96volume 48, 68, 71volume form 90, 96

W

wedge product 81Weierstrass theorem 127

introduction to calculus in several variables

Documents