the calculus of several variables - nagoya universityrichard/teaching/s2016/ref2.pdf · ii di...

The Calculus of Several Variables

Robert C. Rogers

September 29, 2011

It is now known to science that there are many more dimensions thanthe classical four. Scientists say that these don’t normally impinge onthe world because the extra dimensions are very small and curve in onthemselves, and that since reality is fractal most of it is tucked insideitself. This means either that the universe is more full of wonders thanwe can hope to understand or, more probably, that scientists makethings up as they go along.

Terry Pratchett

i

Contents

1 Introduction 1

I Precalculus of Several Variables 5

2 Vectors, Points, Norm, and Dot Product 6

3 Angles and Projections 14

4 Matrix Algebra 19

5 Systems of Linear Equations and Gaussian Elimination 27

6 Determinants 38

7 The Cross Product and Triple Product in R3 47

8 Lines and Planes 55

9 Functions, Limits, and Continuity 60

10 Functions from R to Rn 70

11 Functions from Rn to R 76

12 Functions from Rn to Rm 8112.1 Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8112.2 Parameterized Surfaces . . . . . . . . . . . . . . . . . . . . . . . . 84

13 Curvilinear Coordinates 8613.1 Polar Coordinates in R2 . . . . . . . . . . . . . . . . . . . . . . . 8613.2 Cylindrical Coordinates in R3 . . . . . . . . . . . . . . . . . . . . 8813.3 Spherical Coordinates in R3 . . . . . . . . . . . . . . . . . . . . . 90

ii

II Differential Calculus of Several Variables 93

14 Introduction to Differential Calculus 94

15 Derivatives of Functions from R to Rn 96

16 Derivatives of Functions from Rn to R 10116.1 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 10116.2 Higher Order Partial Derivatives . . . . . . . . . . . . . . . . . . 10216.3 The Chain Rule for Partial Derivatives . . . . . . . . . . . . . . . 105

17 Derivatives of Functions from Rn to Rm 11317.1 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 11317.2 The Total Derivative Matrix . . . . . . . . . . . . . . . . . . . . . 11417.3 The Chain Rule for Mappings . . . . . . . . . . . . . . . . . . . . 118

18 Gradient, Divergence, and Curl 12318.1 The Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12318.2 The Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12718.3 The Curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

19 Differential Operators in Curvilinear Coordinates 13319.1 Differential Operators Polar Coordinates . . . . . . . . . . . . . . 13319.2 Differential Operators in Cylindrical Coordinates . . . . . . . . . 13719.3 Differential Operators in Spherical Coordinates . . . . . . . . . . 139

20 Differentiation Rules 14220.1 Linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14220.2 Product Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14220.3 Second Derivative Rules . . . . . . . . . . . . . . . . . . . . . . . 143

21 Eigenvalues 146

22 Quadratic Approximation and Taylor’s Theorem 15722.1 Quadratic Approximation of Real-Valued Functions . . . . . . . 15722.2 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

23 Max-Min Problems 16523.1 First Derivative Test . . . . . . . . . . . . . . . . . . . . . . . . . 16823.2 Second Derivative Test . . . . . . . . . . . . . . . . . . . . . . . . 17023.3 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . 175

24 Nonlinear Systems of Equations 18124.1 The Inverse Function Theorem . . . . . . . . . . . . . . . . . . . 18124.2 The Implicit Function Theorem . . . . . . . . . . . . . . . . . . . 184

iii

III Integral Calculus of Several Variables 190

25 Introduction to Integral Calculus 191

26 Riemann Volume in Rn 195

27 Integrals Over Volumes in Rn 19927.1 Basic Definitions and Properties . . . . . . . . . . . . . . . . . . 19927.2 Basic Properties of the Integral . . . . . . . . . . . . . . . . . . . 20127.3 Integrals Over Rectangular Regions . . . . . . . . . . . . . . . . . 20327.4 Integrals Over General Regions in R2 . . . . . . . . . . . . . . . . 20527.5 Change of Order of Integration in R2 . . . . . . . . . . . . . . . . 20827.6 Integrals over Regions in R3 . . . . . . . . . . . . . . . . . . . . . 211

28 The Change of Variables Formula 217

29 Hausdorff Dimension and Measure 231

30 Integrals over Curves 23530.1 The Length of a Curve . . . . . . . . . . . . . . . . . . . . . . . . 23530.2 Integrals of Scalar Fields Along Curves . . . . . . . . . . . . . . . 23730.3 Integrals of Vector Fields Along Paths . . . . . . . . . . . . . . . 239

31 Integrals Over Surfaces 24431.1 Regular Regions and Boundary Orientation . . . . . . . . . . . . 24431.2 Parameterized Regular Surfaces and Normals . . . . . . . . . . . 24531.3 Oriented Surfaces with Corners . . . . . . . . . . . . . . . . . . . 25031.4 Surface Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25431.5 Scalar Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . 25631.6 Surface Flux Integrals . . . . . . . . . . . . . . . . . . . . . . . . 25731.7 Generalized (n− 1)-Dimensional Surfaces . . . . . . . . . . . . . 260

IV The Fundamental Theorems of Vector Calculus 263

32 Introduction to the Fundamental Theorem of Calculus 264

33 Green’s Theorem in the Plane 266

34 Fundamental Theorem of Gradients 274

35 Stokes’ Theorem 277

36 The Divergence Theorem 282

37 Integration by Parts 288

38 Conservative vector fields 291

iv

Chapter 1

Introduction

This book is about the calculus of functions whose domain or range or both arevector-valued rather than real-valued. Of course, this subject is much too bigto be covered completely in a single book. The full scope of the topic containsat least all of ordinary differential equations, partial differential equation, anddifferential geometry. The physical applications include thermodynamics, fluidmechanics, elasticity, electromagnetism, and cosmology. Since a comprehensivetreatment is so ambitious, and since few undergraduates devote more than asemester to direct study of this subject, this book focuses on a much morelimited goal. The book will try to develop a series of definitions and results thatare parallel to those in an elementary course in the calculus of functions of asingle variable.

Consider the following “syllabus” for an elementary calculus course.

1. Precalculus

• The arithmetic and algebra of real numbers.

• The geometry of lines in the plane: slopes, intercepts, intersections,angles, trigonometry.

• The concept of a function whose domain and range are both realnumbers and whose graphs are curves in the plane.

• The concepts of limit and continuity

2. The Derivative

• The definition of the derivative as the limit of the slopes of secantlines of a function.

• The interpretation of the derivative as the slope of the tangent line.

• The characterization of the tangent line as the “best linear approxi-mation” of a differentiable function.

• The development of various differentiation rules for products, com-posites, and other combinations of functions.

1

• The calculation of higher order derivatives and their geometric inter-pretation.

• The application of the derivative to max/min problems.

3. The Integral

• The calculation of the area under a curve as the limit of a Riemannsum of the area of rectangles

• The proof that for a continuous function (and a large class of simplediscontinuous functions) the calculation of area is independent of thechoice of partitioning strategy.

4. The Fundamental Theorem of Calculus

• The “fundamental theorem of calculus” - demonstration that thederivative and integral are “inverse operations”

• The calculation of integrals using antiderivatives

• Derivation of “integration by substitution” formulas from the funda-mental theorem and the chain rule

• Derivation of “integration by parts” from the fundamental theoremand the product rule.

Now, this might be an unusual way to present calculus to someone learning itfor the first time, but it is at least a reasonable way to think of the subject inreview. We will use it as a framework for our study of the calculus of severalvariables. This will help us to see some of the interconnections between whatcan seem like a huge body of loosely related definitions and theorems1.

While our structure is parallel to the calculus of functions of a single variable,there are important differences.

1. Precalculus

• The arithmetic and algebra of real numbers is replaced by linearalgebra of vectors and matrices.

• The geometry the plane is replaced by geometry in Rn.

• Graphs in the plane are now graphs in higher dimensions (and maybe difficult to visualize).

2. The Derivative

• Differential calculus for functions whose domain is one-dimensionalturns out to be very similar to elementary calculus no matter howlarge the dimension of the range.

1In fact, the interconnections are even richer than this development indicates. It is im-portant not to get the impression that this is the whole story. It is simply a place to start.Nonetheless, it is a good starting point and will provide a structure firm enough to build on.

2

• For functions with a higher-dimensional domain, there are many waysto think of “the derivative.”

3. The Integral

• We will consider several types of domains over which we will integratefunctions: curves, surfaces, oddly shaped regions in space.

4. The Fundamental Theorem of Calculus

• We will find a whole hierarchy of generalizations of the fundamentaltheorem.

Our general procedure will be to follow the path of an elementary calculus courseand focus on what changes and what stays the same as we change the domainand range of the functions we consider.

Remark 1.1 (On notation). A wise man once said that, “The more importanta mathematical subject is, the more versions of notation will be used for thatsubject.” If the converse of that statement is true, vector calculus must beextremely important. There are many notational schemes for vector calculus.They are used by different groups of mathematicians and in different applicationareas. There is no real hope that their use will be standardized in the nearfuture. This text will use a variety of notations and will use different notationsin different contexts. I will try to be clear about this, but learning how to readand interpret the different notations should be an important goal for studentsof this material.

Remark 1.2 (On prerequisites). Readers are assumed to be familiar with thefollowing subjects.

• Basic notions of algebra and very elementary set theory.

• Integral and differential calculus of a single variable.

• Linear algebra including solution of systems of linear equations, matrixmanipulation, eigenvalues and eigenvectors, and elementary vector spaceconcepts such as basis and dimension.

• Elementary ordinary differential equations.

• Elementary calculations on real-valued functions of two or three variablessuch as partial differentiation, integration, and basic graphing.

Of course, a number of these subjects are reviewed extensively, and I am mindfulof the fact that one of the most important goals of any course is to help thestudent to finally understand the material that was covered in the previouscourse. This study of vector calculus is a great opportunity to gain proficiencyand greater insight into the subjects listed above.

3

Remark 1.3 (On proofs). This text is intended for use by mathematicians andother scientists and engineers. While the primary focus will be on the calculationof various quantities related to the subject, some effort will be made to providea rigorous background for those calculations, particularly in those cases wherethe proofs reveal underlying structure. Indeed, many of the calculations in thissubject can seem like nothing more than complicated recipes if one doesn’tmake an attempt to understand the theory behind them. On the other hand,this subject is full of places where the proofs of general theorems are technicalnightmares that reveal little (at least to me), and this type of proof will beavoided.

Remark 1.4 (On reading this book). My intention in writing this book is toprovide a fairly terse treatment of the subject that can realistically be read cover-to-cover in the course of a busy semester. I’ve tried to cut down on extraneousdetail. However, while most of the exposition is directly aimed at solving theproblems directly posed in the text, there are a number of discussions that areintended to give the reader a glimpse into subjects that will open up in latercourses and texts. (Presenting a student with interesting ideas that he or shewon’t quite understand is another important goal of any course.) Many of theseideas are presented in the problems. I encourage students to read even thoseproblems that they have not been assigned as homework.

4

Part I

Precalculus of SeveralVariables

5

Chapter 2

Vectors, Points, Norm, andDot Product

In this part of the book we study material analogous to that studied in a typi-cal “precalculus” course. While these courses cover some topics like functions,limits, and continuity that are closely tied to the study of calculus, the mostimportant part of such a course is probably the broader topic of algebra. Thatis true in this course as well, but with an added complication. Since we will bedealing with multidimensional objects – vectors – we spend a great deal of timediscussing linear algebra. We cover only relatively elementary aspects of thissubject, and the reader is assumed to be somewhat familiar with them.

Definition 2.1. We define a vector v ∈ Rn to be an n-tuple of real numbers

v = (v1, v2, . . . , vn),

and refer to the numbers vi, i = 1, . . . , n as the components of the vector.We define two operations on the set of vectors: scalar multiplication

cv = c(v1, v2, . . . , vn) = (cv1, cv2, . . . , cvn)

for any real number c ∈ R and vector v ∈ Rn, and vector addition

v + w = (v1, v2, . . . , vn) + (w1, w2, . . . , wn) = (v1 +w1, v2 +w2, . . . , vn +wn)

for any pair of vectors v ∈ Rn and w ∈ Rn.

6

Definition 2.2. If we have a collection of vectors v1,v2, . . .vk and scalarsc1, c2, . . . , ck we refer to

c1v1 + c2v2 + · · ·+ ckvk

as a linear combination of the vectors v1,v2, . . .vk.

Remark 2.3. We typically use boldface, lowercase, Latin letters to representabstract vectors in Rn. Another fairly common notation represents a vector by ageneric component with a “free index,” a subscript (usually, i, j, or k) assumedto range over the values from 1 to n. In this scheme, the vector v would bedenoted by vi, the vector x by xi, etc.

Remark 2.4. At this point we make no distinction between vectors displayedas columns or rows. In most cases, the choice of visual display is merely a matterof convenience. Of course, when we involve vectors in matrix multiplication thedistinction will be important, and we adopt a standard in that context.

Definition 2.5. We say that two vectors are parallel if one is a scalarmultiple of the other. That is, x is parallel to y if there exists c ∈ R suchthat

x = cy.

Remark 2.6. At this juncture, we have given the space of vectors Rn only analgebraic structure. We can add a geometric structure by choosing an originand a set of n perpendicular Cartesian axes for n-dimensional geometric space.With these choices made, every point X can be represented uniquely by itsCartesian coordinates (x1, x2, . . . , xn). We then associate with every orderedpair of points X = (x1, x2, . . . , xn) and Y = (y1, y2, . . . , yn) the vector

−−→XY = (y1 − x1, y2 − x2, . . . , yn − xn).

We think of this vector as a directed line segment or arrow pointing from the tailat X to the head at Y . Note that a vector can be moved by “parallel transport”so that its tail is anywhere in space. For example, the vector v = (1, 1) can berepresented as a line segment with its tail at X = (3, 4) and head at Y = (4, 5)or with tail at X ′ = (−5, 7) and head at Y ′ = (−4, 8).

This geometric structure makes vector addition and subtraction quite inter-esting. Figure 2.1 presents a parallelogram with sides formed by the vectors xand y. The diagonals of the parallelogram represent the sum and difference ofthese vectors.

7

:

:

@@

@@@

@@@

@@I

x

x

y

y

y − xx + y

Figure 2.1: This parallelogram has sides with the vectors x and y. The diagonalsof the parallelogram represent the sum and difference of the vectors. The sumcan be obtained graphically by placing the tail of y at the head of x (or viceversa). The difference of two vectors is a directed line segment connecting theheads of the vectors. Note that the “graphic” sum of x and y − x is y.

Definition 2.7. We define a set of vectors ei ∈ Rn, 1 ≤ i ≤ n called thestandard basis. These have the form

e1 = (1, 0, 0, . . . , 0),e2 = (0, 1, 0, . . . , 0),

...en = (0, 0, 0, . . . , 1).

In component form, these vectors can be written

(ei)j = δij =

0 if i 6= j,1 if i = j.

Here we have defined δij which is called the Kronecker delta function.In the special case of R3 it is common to denote the standard basis by

i = (1, 0, 0),j = (0, 1, 0),k = (0, 0, 1).

Remark 2.8. Note that any vector v = (v1, v2, . . . , vn) ∈ Rn can be written as

8

a linear combination of the standard basis vectors

v =n∑i=1

viei.

Definition 2.9. We define the (Euclidean) norm indexnorm of a vectorx ∈ Rn to be

‖x‖ =√x2

1 + x22 + · · ·+ x2

n =

√√√√ n∑i=1

x2i .

A vector e is called a unit vector if ‖e‖ = 1.The distance between points X and Y (corresponding to the vectors xand y) is given by

‖−−→XY ‖ = ‖y − x‖.

The dot product (or inner product) of two vectors x ∈ Rn and y ∈ Rn isgiven by

x · y = x1y1 + x2y2 + · · ·+ xnyn =n∑i=1

xiyi.

Remark 2.10. Note that for any nonzero vector v we can find a unit vector eparallel to that vector by defining

e =v‖v‖

.

In doing this we say we have normalized v.

Remark 2.11. The standard basis vectors have an important relation to thedot product.

vi = v · ei.

Thus, for any vector v

v =n∑i=1

(v · ei)ei.

Let us now note a few important properties of the dot product

9

Theorem 2.12. For all x,y,w ∈ Rn and every c ∈ R we have the following.

1. (x + y) ·w = x ·w + y ·w.

2. c(x · y) = (cx) · y = x · (cy).

3. x · y = y · x.

4. x · x ≥ 0.

5. x · x = 0 if and only if x = 0 = (0, 0, . . . , 0).

These are easy to prove directly from the formula for the dot product, and weleave the proof to the reader. (See Problem 2.8.)

Of course, there is an obvious relation between the norm and the dot product

‖x‖ =√

x · x. (2.1)

However, we now prove a more subtle and interesting relationship.

Theorem 2.13 (Cauchy-Schwartz inequality). For all x,y ∈ Rn

|x · y| ≤ ‖x‖‖y‖.

Proof. For any real number z ∈ R we compute

0 ≤ ‖x− zy‖2

= (x− zy) · (x− zy)= x · (x− zy)− zy · (x− zy)= x · x− zx · y − zy · x + z2y · y= ‖x‖2 − 2z(x · y) + z2‖y‖2.

We note that quantity on the final line is a quadratic polynomial in the variablez. (It has the form az2 + bz + c.) Since the polynomial is never negative, itsdiscriminant (b2−4ac) must not be positive (or else there would be two distinctreal roots of the polynomial). Thus,

(2x · y)2 − 4‖x‖2‖y‖2 ≤ 0,

or(x · y)2 ≤ ‖x‖2‖y‖2.

Taking the square root of both sides and using the fact that |a| =√a2 for any

real number gives us the Cauchy-Schwartz inequality.

We now note that the norm has the following important properties

10

Theorem 2.14. For all x,y ∈ Rn and every c ∈ R we have the following.

1. ‖x‖ ≥ 0.

2. ‖x‖ = 0 if and only if x = 0 = (0, 0, . . . , 0).

3. ‖cx‖ = |c|‖x‖.

4. ‖x + y‖ ≤ ‖x‖+ ‖y‖ (The triangle inequality).

Proof. One can prove the first three properties directly from the formula forthe norm. These are left to the reader in Problem 2.9. To prove the triangleinequality we use the Cauchy-Schwartz inequality and note that

‖x + y‖2 = (x + y) · (x + y)= x · (x + y) + y · (x + y)= x · x + x · y + y · x + y · y= ‖x‖2 + 2x · y + ‖y‖2

≤ ‖x‖2 + 2|x · y|+ ‖y‖2

≤ ‖x‖2 + 2‖x‖‖y‖+ ‖y‖2

= (‖x‖+ ‖y‖)2.

Taking the square root of both sides gives us the result.

Remark 2.15. While some of the proofs above have relied heavily on thespecific formulas for the norm and dot product, these theorems hold for moreabstract norms and inner products. (See Problem 2.10.) Such concepts areuseful in working with (for instance) spaces of functions in partial differentialequations where a common “inner product” between two functions defined onthe domain Ω is given by the formula

〈f, g〉 =∫

Ω

f(x)g(x) dx.

We will not be working with general inner products in this course, but it isworth noting that the concepts of the dot product and norm can be extended tomore general objects and that these extensions are very useful in applications.

Problems

Problem 2.1. Let x = (2, 5,−1), y = (4, 0, 8), and z = (1,−6, 7).(a) Compute x + y.(b) Compute z− x.(c) Compute 5x.(d) Compute 3z + 6y.(a) Compute 4x− 2y + 3z.

11

Problem 2.2. Let x = (1, 3, 1), y = (2,−1,−3), and z = (5, 1,−2).(a) Compute x + y.(b) Compute z− x.(c) Compute −3x.(d) Compute 4z− 2y.(a) Compute x + 4y − 5z.

Problem 2.3. For the following two-dimensional vectors, create a graph thatrepresents x, y, −x, −y, x− y, and y − x.(a) x = (2, 1), y = (−1, 4).(b) x = (0,−3), y = (3, 4).(c) x = (4, 2), y = (−5, 6).

Problem 2.4. For the points X and Y below compute the vectors−−→XY and−−→

Y X.(a) X = (4, 2, 6), Y = (−2, 3, 1).(b) X = (0, 1,−4), Y = (3, 6, 9).(c) X = (5, 0, 5), Y = (1, 2, 1).

Problem 2.5. Let x = (1,−2, 0) and z = (−1,−4, 3).(a) Compute ‖x‖.(b) Compute ‖z‖ − ‖x‖.(c) Compute ‖z− x‖.(d) Compute x · z.

(e) Computex · z‖z‖2

z.

Problem 2.6. Let x = (2, 0, 1) and y = (1,−3, 2).(a) Compute ‖x‖.(b) Compute ‖y‖ − ‖x‖.(c) Compute ‖y − x‖.(d) Compute x · y.

(e) Computex · y‖y‖2

y.

Problem 2.7. Use graphs of “generic” vectors x, y and x + y in the plane toexplain how the triangle inequality gets its name. Show geometrically the casewhere equality is obtained.

Problem 2.8. Use the formula for the dot product of vectors in Rn to proveTheorem 2.12.

Problem 2.9. Use the formula for the norm of a vector in Rn to prove the firstthree parts of Theorem 2.14.

12

Problem 2.10. Instead of using the formula for the norm of a vector in Rn, use(2.1) and the properties of the dot product given in Theorem 2.12 to prove thefirst three parts of Theorem 2.14. (Note that the proofs of the Cauchy Schwartzinequality and the triangle inequality depended only on Theorem 2.12, not thespecific formulas for the norm or dot product.)

Problem 2.11. Show that

‖x + y‖2 = ‖x‖2 + ‖y‖2

if and only ifx · y = 0.

Problem 2.12. (a) Prove that if x · y = 0 for every y ∈ Rn then x = 0.(b) Prove that if u · y = v · y for every y ∈ Rn then u = v.

Problem 2.13. The idea of a norm can be generalized beyond the particularcase of the Euclidean norm defined above. In more general treatments, anyfunction on a vector space satisfying the four properties of Theorem 2.14 isreferred to as a norm. Show that the following two functions on Rn satisfy thefour properties and are therefore norms.

‖x‖1 = |x1|+ |x2|+ · · ·+ |xn|.

‖x‖∞ = maxi=1,2,...,n

|xi|.

Problem 2.14. In R2 graph the three sets

S1 = x = (x, y) ∈ R2 | ‖x‖ ≤ 1,S2 = x = (x, y) ∈ R2 | ‖x‖1 ≤ 1,S3 = x = (x, y) ∈ R2 | ‖x‖∞ ≤ 1.

Here ‖·‖ is the Euclidean norm and ‖·‖1 and ‖·‖∞ are defined in Problem 2.13.

Problem 2.15. Show that there are constants c1, C1, c∞ and C∞ such thatfor every x ∈ Rn

c1‖x‖1 ≤ ‖x‖ ≤ C1‖x‖1,

c∞‖x‖∞ ≤ ‖x‖ ≤ C∞‖x‖∞.

We say that pairs of norms satisfying this type of relationship are equivalent.

13

Chapter 3

Angles and Projections

While “angle” is a natural concept in R2 or R3, it is much harder to visualize inhigher dimensions. In Problem 3.5, the reader is asked to use the law of cosinesfrom trigonometry to show that if x and y are in the plane (R2) then

x · y = ‖x‖‖y‖ cos θ.

In light of this, we define the angle between two general vectors in Rn byextending the formula above in the following way. We note that if x and y areboth nonzero then the Cauchy-Schwartz inequality gives us

|x · y|‖x‖‖y‖

≤ 1,

or−1 ≤ x · y

‖x‖‖y‖≤ 1.

This tells us that x·y‖x‖‖y‖ is in the domain of the inverse cosine function, so we

define

θ = cos−1

(x · y‖x‖‖y‖

)∈ [0, π]

to be the angle between x and y. This gives us

cos θ =x · y‖x‖‖y‖

.

We state this definition formally and generalize the concept of perpendicularvectors in the following.

14

Definition 3.1. For any two nonzero vectors x and y in Rn we define theangle θ between the two vectors by

θ = cos−1

(x · y‖x‖‖y‖

)∈ [0, π]

We say that x and y are orthogonal if x · y = 0. A set of vectorsv1,v2, . . . ,vk is said to be an orthogonal set if

vi · vj = 0 if i 6= j.

We say that as set of vectors w1,w2, . . . ,wk is orthonormal if it isorthogonal and each vector in the set is a unit vector. That is

wi ·wj = δij =

0 if i 6= j,1 if i = j.

Example 3.2. The standard basis ei is an example of an orthonormal set.

Example 3.3. The set(1, 1), (1,−1)

is an orthogonal set in R2. The set

(1/√

2, 1/√

2), (1/√

2,−1/√

2)

is an orthonormal set in R2.

The following computation is often useful

Definition 3.4. Let y ∈ Rn be nonzero. For any vector x ∈ Rn we definethe orthogonal projection of x onto y by

py(x) =x · y‖y‖2

y.

The projection has the following properties. (See Figure 3.1.)

Lemma 3.5. For any y 6= 0 and x in Rn we have

1. py(x) is parallel to y,

2. py(x) is orthogonal to x− py(x).

The first assertion follows directly from the definition of parallel vectors.The second can be shown by direct computation and is left to the reader. (SeeProblem 3.8.)

15

1

BBBBBBBBBBBBBM

BB

1

y

x

py(x)

x− py(x)

Figure 3.1: Orthogonal projection.

Example 3.6. Let x = (1, 2,−1) and y = (4, 0, 3). Then x ·y = 1 and ‖y‖ = 5,so

py(x) =125

(4, 0, 3).

Note that

x− py(x) =(

2125, 2,

2825

)and that (since py(x) and y are parallel)

py(x) · (x− py(x)) = y · (x− py(x)) = 0.

Problems

Problem 3.1. Compute the angle between the following pairs of vectors.(a) x = (−1, 0, 1, 1), y = (2, 2, 1, 0).(b) x = (3, 0,−1, 0, 1), y = (−1, 1, 2, 1, 0).(c) x = (−1, 0, 1), y = (5, 1, 0).

Problem 3.2. Let x = (1,−2, 0), y = (−3, 0, 1), and z = (−1,−4, 3).(a) Compute py(x).(b) Compute px(y).(c) Compute py(z).(d) Compute pz(x).

16

Problem 3.3. Determine whether each of the following is an orthogonal set(a)

0001

,

1100

,

0011

.

(b) 00−1

1

,

−1

1−1−1

,

2011

.

(c) −1

00−1

,

−1

101

,

0010

.

Problem 3.4. Determine whether each of the following is an orthonormal set(a)

1323023

,

23130− 2

3

,

23− 2

31313

.

(b) 1√2

0− 1√

2

,

1√3

1√3

1√3

,

001

.

(c) 1√2

− 1√2

0

,

1√6

1√6

2√6

,

1√3

1√3

− 1√3

.

Problem 3.5. If x and y are any two vectors in the plane, and θ is the (smallest)angle between them, the law of cosines1 from trigonometry says

‖x− y‖2 = ‖x‖2 + ‖y‖2 − 2‖x‖‖y‖ cos θ.

Use this to derive the identity

x · y = ‖x‖‖y‖ cos θ.

1Note that the law of cosines reduces to the Pythagorean theorem if θ = π/2

17

Problem 3.6. Suppose w1,w2, . . . ,wn is an orthonormal set. Suppose thatfor some constants c1, c2, . . . , cn we have

x = c1w1 + c2w2 + · · ·+ cnwn.

Show that for any i = 1, 2, . . . , n

x ·wi = ci.

Problem 3.7. Let w1,w2, . . . ,wk be an orthonormal set in Rn and let x ∈Rn. Show that

k∑i=1

(x ·wi)2 ≤ ‖x‖2.

Hint: use the fact that

0 ≤

∥∥∥∥∥x−k∑i=1

(x ·wi)wi

∥∥∥∥∥2

.

Problem 3.8. Show that for any vectors y 6= 0 and x in Rn the projectionpy(x) is orthogonal to x− py(x).

Problem 3.9. Show that for any x and nonzero y in Rn

py(py(x)) = py(x),

That is, the projection operator applied twice is the same as the projectionoperator applied once.

18

Chapter 4

Matrix Algebra

In this section we define the most basic notations and computations involvingmatrices.

Definition 4.1. An m×n (read “m by n”) matrix A is a rectangular arrayof mn numbers arranged in m rows and n columns.

A =

a11 a12 · · · a1n

a21 a22 · · · a2n

......

. . ....

am1 am2 · · · amn

.

We call (ai1 ai2 · · · ain

)the ith row of A, (1 ≤ i ≤ m), and we call

a1j

a2j

...amj

the jth column of A, (1 ≤ j ≤ n). We call the number aij in the ith rowand the jth column the ijth entry of the matrix A. The terms elementand component are also used instead of “entry.”An abstract matrix A is often denoted by a typical entry

aij

with two free indices i (assumed to range from 1 to m) and j (assumed torange from 1 to n).

19

Remark 4.2. The entries in matrices are assumed to be real numbers in thistext. Complex entries are considered in more complete treatments and will bementioned briefly in our treatment of eigenvalues.

Definition 4.3. As with vectors in Rn, we can define scalar multiplica-tion of any number c with an m× n matrix A.

cA = c

a11 a12 · · · a1n

a21 a22 · · · a2n

......

. . ....


=

ca11 ca12 · · · ca1n

ca21 ca22 · · · ca2n

......

. . ....

cam1 cam2 · · · camn

.

The ijth entry of cA is given by

c aij .

We can also define matrix addition provided the matrices have the samenumber of rows and the same number of columns. As with vector additionwe simply add corresponding entries

A+B =

a11 a12 · · · a1n

a21 a22 · · · a2n

......

. . ....


+

b11 b12 · · · b1nb21 b22 · · · b2n...

.... . .

...bm1 bm2 · · · bmn

=

a11 + b11 a12 + b12 · · · a1n + b1na21 + b21 a22 + b22 · · · a2n + b2n

......

. . ....

am1 + bm1 am2 + bm2 · · · amn + bmn

.

The ijth entry of A+B is given by

aij + bij .

Remark 4.4. Matrix addition is clearly commutative as defined, i.e.

A+B = B +A.

We define scalar multiplication to be commutative as well:

Ac = cA.

Example 4.5. Let

A =(

3 −2 4−1 5 7

),

20

B =(

0 1 −32 8 −9

),

C =(

6 04 −7

).

Then

A+B =(

3 + 0 −2 + 1 4− 3−1 + 2 5 + 8 7− 9

)=(

3 −1 11 13 −2

),

and

2C =(

2(6) 2(0)2(4) 2(−7)

)=(

12 08 −14

).

The sum A+ C is not well defined since the dimensions of the matrices do notmatch.

Definition 4.6. If A is an m× p matrix and B is a p× n matrix, then thematrix product AB is an m× n matrix C whose ijth entry is given by

cij =p∑k=1

aikbkj .

Remark 4.7. We note that the ijth entry of AB is computed by taking the ith

row of A and the jth row of B (which we have required to be the same length(p)). We multiply the rows term by term and add the products (as we do intaking the dot product of two vectors).

a11 a12 . . . a1p

......

...[ai1 ai2 . . . a1p

]am1 am2 . . . amp

b11 · · ·b21 · · ·...bp1 · · ·

b1j

b2j

...bpj

· · · b1n· · · b2n

...· · · bpn

=

c11 · · · c1j · · · c1nc21 · · · c2j · · · c2n...

......

ci1 · · · [cij] · · · cin...

......

cm1 · · · cmj · · · cmn

.

21

Example 4.8.(6 04 −7

)(0 1 −32 8 −9

)=

(6(0) + 0(2) 6(1) + 0(8) 6(−3) + 0(−9)4(0)− 7(2) 4(1)− 7(8) 4(−3)− 7(−9)

)=

(0 6 −18

−14 −52 51

).

Remark 4.9. Matrix multiplication is associative. That is,

A(BC) = (AB)C

whenever the dimensions of the matrices match appropriately. This is easiest todemonstrate using component notation. We use the associative law for multi-plication of numbers and the fact that finite sums can be taking any order (thecommutative law of addition) to get the following.

n∑j=1

aij(m∑k=1

bjkckl) =n∑j=1

m∑k=1

aijbjkckl =m∑k=1

(n∑j=1

aijbjk)ckl.

Remark 4.10. Matrix multiplication is not commutative. To compute ABwe must match the number of columns on the left matrix A with the numberof rows of the right matrix B. The matrix BA. . .

• might not be defined at all,

• might be defined but of a different size than AB, or

• might be have the same size as AB but have different entries.

We will see examples of this in Problem 4.2 below.

Remark 4.11. When a vector x ∈ Rn is being used in matrix multiplication,we will always regard it as a column vector or n× 1 matrix.

x =

x1

x2

...xn

.

Definition 4.12. If A is an n×n matrix, the elements aii, i = 1, . . . , n arecalled the diagonal elements of the matrix. A is said to be diagonal ifaij = 0 whenever i 6= j.

22

Definition 4.13. The n × n diagonal matrix I with all diagonal elementsaii = 1, i = 1, . . . , n is called the n× n identity matrix.

I =

1 0 · · · 00 1 · · · 0...

.... . .

...0 0 · · · 1

.

The entries of the identity can be represented by the Kronecker delta func-tion, δij .

The identity matrix is a multiplicative identity. For any m×n matrix B wehave

IB = BI = B.

Note that here we are using the same symbol to denote the m × m and then× n identity in the first and second instances respectively. This ambiguity inour notation rarely causes problems. In component form, this equation can bewritten

m∑k=1

δikbkj =n∑k=1

bikδkj = bij .

Definition 4.14. We say that an n × n matrix A is invertible if thereexists a matrix A−1 such that

AA−1 = A−1A = I.

Example 4.15. Suppose that

A =(a bc d

)and ad− bc 6= 0. Then one can check directly that

A−1 =1

ad− bc

(d −b−c a

).

Lemma 4.16. A matrix has at most one inverse.

The proof of this is left to the reader in Problem 4.3.

Lemma 4.17. Suppose A and B are invertible n× n matrices. Then

(AB)−1 = B−1A−1.

23

The proof of this is left to the reader in Problem 4.4.

Definition 4.18. The transpose of an m×n matrix A is the n×m matrixAT obtained by using the rows of A as the columns of AT . That is, the ijth

entry of AT is the jith entry of A.

aTij = aji.

We say that a matrix is symmetric if A = AT and skew if −A = AT .

Example 4.19. (0 1 −32 8 −9

)T=

0 21 8−3 −9

.

The matrix 2 −1 0−1 2 −1

0 −1 2

is symmetric. The matrix 0 2 5

−2 0 −4−5 4 0

is skew.

The next lemma follows immediately from the definition.

Lemma 4.20. For any matrices A and B and scalar c we have

1. (AT )T = A.

2. (A+B)T = AT +BT if A and B are both m× n.

3. (cA)T = c(AT ).

4. (AB)T = BTAT if A is m× p and B is p× n.

5. (A−1)T = (AT )−1 if A is an invertible n× n matrix.

The proof of this is left to the reader in Problem 4.5. We also note thefollowing.

Lemma 4.21. If A is an m× n matrix, x ∈ Rm, and y ∈ Rn then

x · (Ay) = (ATx) · y.

24

Proof. If we look at this equation in component form we see that it followsdirectly from the definition of multiplication by the transpose and the associativeand commutative laws for multiplication of numbers

n∑i=1

xi(m∑j=1

aijyj) =n∑i=1

m∑j=1

xiaijyj =m∑j=1

(n∑i=1

xiaij)yj .

Definition 4.22. An n× n matrix Q is orthogonal if

QQT = QTQ = I.

That is, if QT = Q−1.

Example 4.23. The 2× 2 matrix(cos θ − sin θsin θ cos θ

)is orthogonal since(

cos θ − sin θsin θ cos θ

)(cos θ sin θ− sin θ cos θ

)

=(

cos θ sin θ− sin θ cos θ

)(cos θ − sin θsin θ cos θ

)

=(

cos2 θ + sin2 θ 00 cos2 θ + sin2 θ

)= I.

Problems

Problem 4.1. Let

A =(

2 3−4 1

), B =

(0 71 −5

),

C =

6 17 −8−2 4

, D =(

9 3 00 4 7

).

(a) Compute 2A.(b) Compute 4A− 2B.(c) Compute C − 3DT .(d) Compute 2CT + 5D.

25

Problem 4.2. Let

A =(

2 3−4 1

), B =

(0 71 −5

),

C =

6 17 −8−2 4

, D =(

9 3 00 4 7

).

(a) Compute AB.(b) Compute BA.(c) Compute CD.(d) Compute DC

Problem 4.3. Show that the inverse of an n× n matrix A is unique. That is,show that if

AB = BA = AC = CA = I,

then B = C.

Problem 4.4. Show that if A and B are invertible n× n matrices then

(AB)−1 = B−1A−1.

Problem 4.5. Prove Lemma 4.20.

Problem 4.6. Show that every n× n matrix A can be written uniquely as thesum of a symmetric matrix E and a skew matrix W . Hint: If A = E +W thenAT =? We refer to E as the “symmetric part” of A and W as the “skew part.”

Problem 4.7. While we don’t use it in this text, there is a natural extensionof the dot product for n× n matrices:

〈A,B〉 =n∑i=1

n∑j=1

aijbij .

Show that if A is symmetric and B is skew then 〈A,B〉 = 0.

Problem 4.8. Let A be any n × n matrix and let E be its symmetric part.Show that for any x ∈ Rn we have

xTAx = xTEx.

26

Chapter 5

Systems of LinearEquations and GaussianElimination

One of the most basic problems in linear algebra is the solution of a system mlinear equations in n unknown variables. In this section we give a quick reviewof the method of Gaussian elimination for solving these systems.

The following is a generic linear system.

a11x1 + a12x2 + · · ·+ a1nxn = b1,

a21x1 + a22x2 + · · ·+ a2nxn = b2,

...am1x1 + am2x2 + · · ·+ amnxn = bm.

Here, we assume that aij , i = 1, . . . ,m, j = 1, . . . , n and bi, i = 1, . . . ,m areknown constants. We call the constants aij the coefficients of the system.The constants bi are sometimes referred to as the data of the system. The nvariables xj , j = 1, . . . , n are called the unknowns of the system. Any orderedn-tuple (x1, x2, . . . , xn) ∈ Rn that satisfies each of the m equations in the systemsimultaneously is a solution of the system.

We note that the generic system above can be written in term of matrixmultiplication.

a11 a12 · · · a1n

a21 a22 · · · a2n

......

. . ....


x1

x2

...xn

=

b1b2...bm

,

orAx = b.

27

Here A is the m× n coefficient matrix, x ∈ Rn is the vector of unknowns, andb ∈ Rm is the data vector.

It is worth considering the very simple case where n = m = 1. Our equationreduces to

ax = b

where a, x, and b are all real numbers. (We think of a and b as given; x isunknown.) The only alternatives for solutions of this equation are as follows.

• If a 6= 0 then the equation has the unique solution x = ba .

• If a = 0 then there are two possibilities.

– If b = 0 then the equation 0 · x = 0 is satisfied by any x ∈ R.

– If b 6= 0 then there is no solution.

We will see these three alternatives reflected in our subsequent results, but wecan get at least some information about an important special case immediately.We call a system of m equations in n unknowns (or the equivalent matrix equa-tions Ax = b) homogeneous if bi = 0, i = 1, . . . ,m, (or equivalently, b = 0).We note that every homogeneous system has at least one solution, the trivialsolution xj = 0, j = 1, . . . , n, (x = 0).

More generally, a systematic development of the method of Gaussian elim-ination (which we won’t attempt in this quick review) reveals an importantresult.

Theorem 5.1. For any linear system of m linear equations in n unknowns,exactly one of the three alternatives holds.

1. The system has a unique solution (x1, . . . , xn).

2. The system has an infinite family of solutions.

3. The system has no solution.

The following examples of the three alternatives are simple enough to solve byinspection or by solving the first equation for one variable and substituting thatinto the second. The reader should do so and verify the following.

Example 5.2. The system

2x1 − 3x2 = 1,4x1 + 5x2 = 13,

has only one solution: (x1, x2) = (2, 1).

28


x1 − x2 = 5,2x1 − 2x2 = 10,

(which is really two copies of the “same” equation) has a an infinite collectionof solutions of the form (x1, x2) = (5 + s, s) where s is any real number.


x1 − x2 = 1,2x1 − 2x2 = 7,

has no solutions.

A systematic development of Gaussian elimination:

• shows that the three alternatives are the only possibilities,

• tells us which of the alternatives fits a given system, and

• allows us to compute any solutions that exist.

As noted above, we will not attempt such a development, but we provide enoughdetail that readers can convince themselves of the first assertion and do thecomputations described in the second and third.

Returning to the general problem of Gaussian elimination, we have an abbre-viated way of representing the generic matrix equation with a single m× (n+1)augmented matrix obtained by using the data vector as an additional columnof the coefficient matrix.

a11 a12 · · · a1n b1a21 a22 · · · a2n b2...

.... . .

......

am1 am2 · · · amn bm

The augmented matrix represents the corresponding system of equations ineither its m scalar equation or single matrix equation form.

Gaussian elimination involves manipulating an augmented matrix using thefollowing operations.

Definition 5.5. We call the following elementary row operations of amatrix:

1. Multiplying any row by a nonzero constant,

2. Interchanging any two rows,

3. Adding a multiple of any row to another (distinct) row.

29

Example 5.6. Multiplying the second row of 0 3 −12 −6 10−3 7 9

by 1

2 yields 0 3 −11 −3 5−3 7 9

.

Interchanging the first two rows of this matrix yields 1 −3 50 3 −1−3 7 9

.

Adding three times the first row to the third of this matrix yields 1 −3 50 3 −10 −2 24

.

Multiplying the third row of this matrix by − 12 yields 1 −3 5

0 3 −10 1 −12

.

Interchanging the second and third rows yields 1 −3 50 1 −120 3 −1

.

Adding −3 times the second row of this matrix to the third yields 1 −3 50 1 −120 0 35

.

Dividing the final row by 35 yields 1 −3 50 1 −120 0 1

.

Elementary row operations have an important property: they don’t changethe solution set of the equivalent systems.

30

Theorem 5.7. Suppose the matrix B is obtained from the matrix A by asequence of elementary row operations. Then the linear system representedby the matrix B has exactly the same set of solution as the system representedby matrix A.

The proof is left to the reader. We note that it is obvious that interchangingequations or multiplying both sides of an equation by a nonzero constant (whichare equivalent to the first two operations) doesn’t change the set of solutions.It is less obvious that the third type of operation doesn’t change the set. Is iteasier to show that the operation doesn’t destroy solutions or doesn’t create newsolutions? Can you “undo” a row operation of the third type by doing anotherrow operation?

Example 5.8. If we interpret the matrices in Example 5.6 as augmented ma-trices, the first matrix represents the system

3x2 = −1,2x1 − 6x2 = 10,−3x1 + 7x2 = 9.

while the final matrix represents the system

x1 − 3x2 = 5,x2 = −12,0 = 1.

According to our theorem these systems have exactly the same solution set.While it is not that hard to see that the first system has no solutions, theconclusion is immediate for the second system. That is because we have usedthe elementary row operations to reduce the matrix to a particularly convenientform which we now describe.

Gaussian elimination is the process of using a sequence of elementary rowoperations to reduce an augmented matrix in a standard form called reducedrow echelon form from which it is easy to read the set of solutions. The formhas the following properties.

1. Every row is either a row of zeros or has a one as its first nonzero entry(a “leading one”).

2. Any row of zeros lies below all nonzero rows.

3. Any column containing a leading one contains no other nonzero entries.

4. The leading one in any row must lie to the left of any leading one in therows below it.

31

Example 5.9. The matrix 1 0 0 30 1 0 20 0 1 70 0 0 00 0 0 0

is in reduced row echelon form. If we interpret the matrix as an augmentedmatrix corresponding to a system of five equations in three unknowns it isequivalent to the matrix equation

1 0 00 1 00 0 10 0 00 0 0

x1

x2

x3

=

32700

,

or the system of five scalar equation

x1 = 3,x2 = 2,x3 = 7,0 = 0,0 = 0.

Of course, the unique solution is (x1, x2, x3) = (3, 2, 7).

Example 5.10. Again, the matrix1 0 2 50 1 −4 60 0 0 00 0 0 00 0 0 0

is in reduced row echelon form. If we again interpret the matrix as an augmentedmatrix corresponding to a system of five equations in three unknowns it isequivalent to the matrix equation

1 0 20 1 −40 0 00 0 00 0 0

x1

x2

x3

=

56000

,

32


x1 + 2x3 = 5,x2 − 4x3 = 6,

0 = 0,0 = 0,0 = 0.

This system has an infinite family of solutions and there are many ways ofdescribing them. The most convenient is to allow the variables correspondingto columns without a leading one to take on an arbitrary value and solve for thevariables correspond to columns with a leading one in terms of these. In thissituation we note that the third column has no leading one1 so we take x3 = swhere s is any real number and solve for x1 and x2 to get the solution set

(x1, x2, x3) = (5− 2s, 6 + 4s, s) = (5, 6, 0) + s(−2, 4, 1), s ∈ R.

Example 5.11. Finally, the matrix1 0 0 −10 1 0 00 0 1 80 0 0 10 0 0 0

is in reduced row echelon form. If we interpret the matrix as an augmentedmatrix corresponding to a system of five equations in three unknowns it isequivalent to the matrix equation

1 0 00 1 00 0 10 0 00 0 0

x1

x2

x3

=

−1

0810

,


x1 = −1,x2 = 0,x3 = 0,0 = 1,0 = 0.

If is clear from the fourth equation that there is no solution to this system.1Neither does the fourth column, but it is the “data column” and does not correspond to

an unknown variable.

33

Problem 5.1 gives a number of examples of matrices in reduced row echelonform and asks the reader to give the set of solutions. There is an intermediateform called row echelon form. In this form, columns are allowed to havenonzero entries above the leading ones (though still not below). From this formit is easy to determine which of the three solution alternatives hold. Problem 5.2gives a number of examples of matrices in this form. The reader is asked todetermine the alternative by inspection and then determine all solutions wherethey exist.

Since this is a review, we will not give an elaborate algorithm for usingelementary row operation to reduce an arbitrary matrix to reduced row echelonform. (More information on this is given in the references.) We will contentourselves with the following simple examples.

Example 5.12. The system in Example 5.2 can be represented by the aug-mented matrix (

2 −3 14 5 13

).

In order to “clear out” the first column, let us add −2 times the first row to thesecond to get (

2 −3 10 11 11

).

We now divide the second row by 11 to get(2 −3 10 1 1

).

Adding 3 times the second row to the first gives(2 0 40 1 1

).

Finally, dividing the first row by 2 puts the matrix in reduced row echelon form(1 0 20 1 1

).

This is equivalent to the system

x1 = 2,x2 = 1,

which describes the unique solution.Note that we chose the order of our row operation in order to avoid introduc-

ing fractions. A computer computation would take a more systematic approachand treat all coefficients as floating point numbers, but our approach makessense for hand computations.

34


1 −1 52 −2 10

).

Taking −2 times the first row of this matrix and adding it to the second in orderto clear out the first column yields(

1 −1 50 0 0

).

This is already in reduced row echelon form and is equivalent to the equation

x1 − x2 = 5.

Since the second column has no leading one, we let the corresponding variable,x2 take on an arbitrary value x2 = s ∈ R and solve for the system for thosevariables whose column contains a leading one. (In this case, x1) Our solutionscan be represented as

x1 = s+ 5,x2 = s,

for any s ∈ R.


1 −1 12 −2 7

).

Taking −2 times the first row of this matrix and adding it to the second in orderto clear out the first column yields(

1 −1 10 0 5

).

Without even row reducing further, we can see that the second row representsthe equation

0 = 5.

Therefore, this system can have no solutions.

Problems

Problem 5.1. The following matrices in reduced row echelon form representaugmented matrices of systems of linear equations. Find all solutions of thesystems.

35

(a) 1 0 0 −20 1 0 30 0 1 50 0 0 00 0 0 0

.

(b) 1 4 0 0 −3 −20 0 1 0 2 30 0 0 1 2 50 0 0 0 0 0

.

(c) 1 0 0 −2 00 1 0 4 00 0 1 7 00 0 0 0 1

.

Problem 5.2. The following matrices in row echelon form represent augmentedmatrices of systems of linear equations. Determine by inspection which of thethree alternatives hold: a unique solution, an infinite family of solutions, or nosolution. Find all solutions of the systems that have them.(a)

1 2 2 −2 20 1 5 3 −20 0 1 7 60 0 0 0 1

.

(b) 1 4 0 2 −3 −20 1 1 4 2 30 0 0 1 2 50 0 0 0 1 2

.

(c) 1 1 4 −20 1 3 30 0 1 5

.

Problem 5.3. The following matrices represent augmented matrices of systemsof linear equations. Find all solutions of the systems.(a) 3 1 1 2

4 0 2 21 −3 −2 3

.

(b) (3 −6 1 6 02 −4 −3 −7 0

).

36

(c) (2 −3 2 −11 1 −4 2

).

Problem 5.4. Find all solutions of the following systems of equations.(a)

x1 + x2 + x3 = 0,x1 − x2 = 0,

2x1 + x3 = 0.

(b)

3u+ 2v − w = 7,u− v + 7w = 5,

2u+ 3v − 8w = 6.

(c)

6x− 14y = 28,7x+ 3y = 6.

(d)

c1 − c2 = −6,−c2 + c3 = 5,c3 − c4 = −4,c1 − c4 = −15.

(e)

2x+ 7y + 3z − 2w = 8,x+ 5y + 3z − 3w = 2,−2x+ 3y − z − 2w = 4,

3x− y + z + 3w = 2,2x+ 2y + 3z + w = 1.

37

Chapter 6

Determinants

In this section we examine the formulas for and properties of determinants. Webegin with the determinant of a 2× 2 matrix.

Definition 6.1. The determinant of a 2× 2 matrix A is given by

det(a11 a12

a21 a22

)= a11a22 − a21a12.

We also use the notation

det(a11 a12

a21 a22

)=∣∣∣∣ a11 a12

a21 a22

∣∣∣∣ .Example 6.2.

det(

3 42 7

)=∣∣∣∣ 3 4

2 7

∣∣∣∣ = 3(7)− 2(4) = 13.

Example 6.3. Note that

det(a bc d

)= ad− bc.

Recall that this number figured prominently in the computation of the inverseof a 2× 2 matrix in Example 4.15.

In order to give a reasonable definition of the determinant of an n×n matrix,we introduce the concept of a permutation. We consider the three-dimensionalcase first. There are six (or 3 factorial) possible rearrangements or permutationsof the numbers (1, 2, 3). (There are three ways to choose the first number, thentwo ways to choose the second, then one way to choose the third. This gives

38

us 3! = 3× 2× 1 possible distinct arrangements.) We call any pair of numbersin the rearranged ordered triple an inversion if the higher number appears tothe left of the lower. For example, the triple (3, 2, 1) has three inversions: (3, 2),(3, 1), and (2, 1). We call a permutation of (1, 2, 3) odd if it has an odd numberof inversions and even if it has an even number of inversions.

• (1, 2, 3) has no inversions. It is an even permutation.

• (1, 3, 2) has one inversion: (3, 2). It is an odd permutation.

• (2, 3, 1) has two inversions: (2, 1) and (3, 1). It is even.

• (2, 1, 3) has one inversion: (2, 1). It is odd.

• (3, 1, 2) has two inversions: (3, 1) and (3, 2). It is even.

• (3, 2, 1) has three inversions as noted above. It is odd.

Remark 6.4. Note that we can create any permutation of the array (1, 2, 3)by the process of repeatedly interchanging pairs of integers. Readers shouldconvince themselves that no matter how this is done, an even permutationrequires an even number of interchanges and while an odd permutation requiresan odd number.

Remark 6.5. All of the preceding can be generalized to permutations of the setof numbers (1, 2, . . . , n). For instance (3, 5, 2, 4, 1) contains the seven inversions(3, 2), (3, 1), (5, 2), (5, 4), (5, 1), (2, 1) and (4, 1). Hence, it is an odd permutationof the integers (1, 2, 3, 4, 5).

Definition 6.6. We define the n-dimensional permutation symbol by

εi1,i2,...,in =

−1 if (i1, i2, . . . , in) is an odd permutation of (1, 2, . . . , n),1 if (i1, i2, . . . , in) is an even permutation of (1, 2, . . . , n),0 if (i1, i2, . . . , in) contains any repeated indices.

In particular, the three-dimensional permutation symbol is given by

εijk =

−1 if (i, j, k) is an odd permutation of (1, 2, 3),1 if (i, j, k) is an even permutation of (1, 2, 3),0 if (i, j, k) contains any repeated indices.

We now define the n× n determinant.

39

Definition 6.7. For an n× n matrix we define

det

a11 a12 · · · a1n

a21 a22 · · · a2n

......

. . ....

an1 an2 · · · ann

=

∣∣∣∣∣∣∣∣∣a11 a12 · · · a1n

a21 a22 · · · a2n

......

. . ....

an1 an2 · · · ann

∣∣∣∣∣∣∣∣∣=

n∑i1=1

n∑i2=1

· · ·n∑

in=1

εi1i2...ina1i1a2i2 · · · anin .

For a 3× 3 matrix this has the form

det

a11 a12 a13

a21 a22 a23

a31 a32 a33

=

∣∣∣∣∣∣a11 a12 a13

a21 a22 a23

a31 a32 a33

∣∣∣∣∣∣ =3∑i=1

3∑j=1

3∑k=1

εijka1ia2ja3k.

While this definition has the advantage of being explicit and simple to state,it is not an easy formula to compute. It has, after all, nn terms in the sum,which can get large very quickly. Even if we note that “only” n! of the termsare nonzero, this is still a large number for even a moderate size of n. In orderto deal with this, we use our basic definition to derive a family of formulascalled expansion be cofactors. This will give us fairly reasonable methodcomputation of determinants, though as we will see, there is no getting aroundthe fact that in general, computing the determinant of a matrix is a laborintensive process.

Let’s examine the formula for the determinant of a 3×3 matrix. Fortunately,while the sum is over 33 = 27 terms, only 3! = 6 of them are nonzero.

det

a11 a12 a13

a21 a22 a23

a31 a32 a33

= a11a22a33 + a12a23a31 + a13a21a32

−a11a23a32 − a13a22a31 − a12a21a33.

A few observations will lead us to some important results. Note the following:

• Each of the 3! nonzero terms in the sum has 3 factors.

• One factor from each term comes from each row.

• One factor from each term comes from each column.

Now note that for each ordered pair ij, the entry aij appears in two (or 2!)terms of the sum. We can factor aij out of the sum of these two terms and writethose terms in the form

aijAij

40

where we call Aij the cofactor of aij . Since each term of the full determinanthas one factor from every row (and every column) of the original matrix, we canwrite the determinant in terms of the factors from particular any row or anycolumn. That is, for any i = 1, 2, 3 or j = 1, 2, 3 we can write

detA = ai1Ai1 + ai2Ai2 + ai3Ai3

= a1jA1j + a2jA2j + a3jA3j .

At this point, let us observe that the observations above easily generalize tothe n× n case.

• Each of the n! nonzero terms in the determinant of an n × n matrix hasn factors.

• One factor from each term comes from each row.

• One factor from each term comes from each column.

Since each term of the determinant has one factor from every row (and everycolumn) of the original matrix, we can write the determinant in terms of thefactors from either the ith row or the jth column and write the determinant of

detA = ai1Ai1 + ai2Ai2 + · · ·+ ainAin

= a1jA1j + a2jA2j + · · ·+ anjAnj .

All of this would be pretty useless if there wasn’t a convenient formula forthe cofactors of a matrix. Fortunately, one can show the following.

Lemma 6.8. For any i, j = 1, 2, . . . , n

Aij = (−1)i+jMij

where Mij is the ijth minor of A – the determinant of the (n − 1) × (n − 1)matrix obtained by deleting the ith row and jth column of the matrix A.

The proof of this result in the general n× n case requires more informationabout permutations than will be covered in this text. However, the result can beobtained by direct computation in the 3×3 case. For example, we can rearrangethe formula for the 3× 3 determinant above, factoring out all terms from, say,the second row

detA = a11a22a33 + a12a23a31 + a13a21a32

−a11a23a32 − a13a22a31 − a12a21a33

= a21(a13a32 − a12a33) + a22(a11a33 − a13a31) + a23(a12a31 − a11a32)

= (−1)2+1a21

∣∣∣∣ a12 a13

a32 a33

∣∣∣∣+ (−1)2+2a22

∣∣∣∣ a11 a13

a31 a33

∣∣∣∣+(−1)2+3a23

∣∣∣∣ a11 a12

a31 a32

∣∣∣∣ .We call this technique for computing the determinant expansion by co-

factors and we state it now as a theorem.

41

Theorem 6.9. Let A be an n× n matrix. Then for any i = 1, 2, . . . , n andany j = 1, 2, . . . , n we have

detA =n∑l=1

(−1)(i+l)ailMil

=n∑k=1

(−1)(k+j)akjMkj .

Here is Mkl is the klth minor of A – the determinant of the (n−1)×(n−1)matrix obtained by deleting the kth row and lth column of the matrix A.

Example 6.10. Let us compute the determinant of the matrix

A =

1 2 34 5 67 8 9

several different ways. We first use the basic formula

detA = a11a22a33 + a12a23a31 + a13a21a32

−a11a23a32 − a13a22a31 − a12a21a33

= 1(5)(9) + 2(6)(7) + 3(4)(8)− 1(6)(8)− 3(5)(7)− 2(4)(9)= 0.

We now expand by cofactors along, say, the third row.

detA = 7∣∣∣∣ 2 3

5 6

∣∣∣∣− 8∣∣∣∣ 1 3

4 6

∣∣∣∣+ 9∣∣∣∣ 1 2

4 5

∣∣∣∣= 7(2(6)− 3(5))− 8(1(6)− 3(4)) + 9(1(5)− 2(4))= 0.

On the other hand, choosing the second column gives us

detA = −2∣∣∣∣ 4 6

7 9

∣∣∣∣+ 5∣∣∣∣ 1 3

7 9

∣∣∣∣− 8∣∣∣∣ 1 3

4 6

∣∣∣∣= −2(4(9)− 6(7)) + 5(1(9)− 3(7))− 8(1(6)− 3(4))= 0.

Of course, all yield the same value for the determinant.

Remark 6.11. We note that using expansion by cofactors a 3× 3 determinantcan be computed using three 2 × 2 determinants. In a similar way, a 4 × 4determinant can be computed using four 3×3 determinants, a 5×5 determinantcan be computed using five 4 × 4 determinants, etc. Thus, the difficulty ofcomputing an n× n determinant grows like n!.

42

We now state some theorems giving properties of the determinant. We willnot give the proofs here (though a few are left to the problems). Readers whowish to engage in a more extensive study of linear algebra are encouraged to doso. Texts such as [7] will help .

Theorem 6.12. Let A be an n× n matrix.

1.detA = detAT .

2. If two rows (or two columns) of A are identical then

detA = 0.

3. If a row (or column) of A is has all zero entries then

detA = 0.

4. The determinant of a diagonal matrix is the product of its diagonalelements.

These results follow directly from the definition of the determinant and proper-ties of permutations. They are easy to verify in the 2× 2 and 3× 3 case.

The next theorem has two important functions. It states that the determi-nant is a linear function of the individual row of the matrix, and it describeshow the determinant is affected by the “elementary row operations” used inGaussian elimination.

Theorem 6.13. Let A, B, and C be n× n matrices.

1. If A, B, and C are identical except for the ith row, and the ith row ofC is the sum of the ith rows of A and B then

detC = detA+ detB.

2. If B is obtained from A by interchanging two rows (or two columns)then

detB = −detA.

3. If B is obtained from A by multiplying one row (or one column) of Aby the scalar c then

detB = cdetA.

4. If B is obtained from A by adding a scalar multiple of one row of Ato different row then

detB = detA.

43

The following theorem is quite important.

Theorem 6.14. For any n× n matrices A and B we have the following.

1.det(AB) = detAdetB.

2. If A is invertible then detA 6= 0 and

det(A−1) =1

detA.

Unfortunately, the proof of the first part requires some techniques that are notcovered in this book. It is covered in most standard texts on linear algebra.The second part follows directly from the first and the identity

1 = det I = det(AA−1) = detAdetA−1.

Definition 6.15. An orthogonal matrix Q is called a rotation indexrota-tion matrix if detQ = 1.

Remark 6.16. Since for any orthogonal matrix we have

det(QQT ) = (detQ)2 = det I = 1,

it follows that detQ = ±1. Thus, rotation matrices satisfy detQ = 1. The readershould verify that the orthogonal matrices in Example 4.23 are rotations.

While (as we will see in the rest of the book) determinants have many ge-ometric applications, one of the most important applications is to the problemn linear equations in n unknowns. The following theorem should be familiar tothe reader.

44

Theorem 6.17. Let A be an n×n matrix. Then the following are equivalent.(That is, if one statement is true – all are true. If one statement is false –all are false.)

1. For every b ∈ Rn the matrix equation

Ax = b

has a unique solution.

2.detA 6= 0.

3. The matrix A is invertible.

4. The matrix A can be row reduced to the identity matrix.

5. The homogeneous systemAx = 0

has only the trivial solution.

Again we direct the reader to standard linear algebra texts for the proof of thistheorem.

Problems

Problem 6.1. Compute the determinants of the following matrices.

A =(

2 3−4 1

), B =

(0 71 −5

),

C =(

2 00 5

), D =

(9 3

18 6

).


A =

1 2 32 4 65 −4 1

, B =

2 0 70 3 −50 0 5

,

C =

6 1 07 −8 1−2 4 1

, D =

9 3 00 4 71 0 3

.


A =

2 5 0 −1−2 0 3 2

0 −7 0 61 4 −1 1

, B =

3 1 −2 00 4 1 20 0 0 60 0 1 1

.

45

C =

0 0 0 −10 0 3 00 −6 0 05 0 0 0

, D =

1 1 1 10 4 2 20 0 9 61 1 1 4

.

Problem 6.4. For the following permutations of (1, 2, 3, 4) find all inversionsand determine whether the permutation odd or even.(a) (3, 1, 2, 4).(b) (4, 3, 2, 1).(c) (2, 3, 4, 1).(d) (1, 3, 4, 2).

Problem 6.5. For the following permutations of (1, 2, 3, 4, 5) find all inversionsand determine whether the permutation odd or even.(a) (3, 1, 5, 2, 4).(b) (5, 4, 3, 2, 1).(c) (2, 3, 4, 1, 5).(d) (1, 5, 3, 4, 2).

Problem 6.6. We say that a matrix is upper triangular if aij = 0 if j < i andlower triangular if aij = 0 if j > i. Use the definition of the determinant toshow that if a matrix is upper triangular, lower triangular, or diagonal then thedeterminant is the product of the diagonal elements

detA = a11a22 · · · ann.

Hint: Note that there is only one permutation of the numbers (1, 2, 3, . . . , n)with no inversions.

Problem 6.7. Suppose P1 is a permutation of the integers (1, 2, . . . , n). LetP2 be a new permutation obtained by switching two of the integers in P1. Showthat the number of inversions in P2 differs from the number of P1 by an oddinteger. Thus,

εP2 = −εP1 .

Hint: First show that this is true if the two integers being switched are adjacent.Then show that we can switch any two integers by switching an odd number ofadjacent integers.

Problem 6.8. Use Problem 6.7 to prove the following part of Theorem 6.13:if an n × n matrix B is obtained from A by interchanging two rows (or twocolumns) then

detB = −detA.

46

Chapter 7

The Cross Product andTriple Product in R3

To this point we have defined two types of products involving vectors.

1. The scalar product of a scalar c and a vector v ∈ Rn yields a vector

cv ∈ Rn.

The resulting vector is parallel to the original vector.

2. The dot product or inner product of two vectors u,v ∈ Rn yields a scalar

u · v ∈ R.

The result gives us information about the angle between the two vectors.

In the special case of R3 we define the cross product of two vectors which yieldsanother vector.

Definition 7.1. For any vectors u = (u1, u2, u3) and v = (v1, v2, v3) in R3

we define the cross product to be the vector

u× v =3∑i=1

3∑j=1

3∑k=1

εijkuivjek.

Remark 7.2. This equation can be expanded in the forms

u× v = (u2v3 − u3v2)e1 + (u3v1 − u1v3)e2 + (u1v2 − u2v1)e3

= (u2v3 − u3v2)i + (u3v1 − u1v3)j + (u1v2 − u2v1)k.

47

Remark 7.3. A useful mnemonic for the cross product can be obtained usingthe notation for determinants

u× v =

∣∣∣∣∣∣i j ku1 u2 u3

v1 v2 v3

∣∣∣∣∣∣ .Expanding by cofactors on the first row gives us

u× v =∣∣∣∣ u2 u3

v2 v3

∣∣∣∣ i− ∣∣∣∣ u1 u3

v1 v3

∣∣∣∣ j +∣∣∣∣ u1 u2

v1 v2

∣∣∣∣k.This expands to the formula above.

Example 7.4. If u = (−2, 1, 4) and v = (0, 3,−1) then

u× v =

∣∣∣∣∣∣i j k−2 1 4

0 3 −1

∣∣∣∣∣∣ = −13i− 2j− 6k.

48

Theorem 7.5. The cross product has the following properties.

1. For all u and v in R3

u× v = −v × u.

2. For all u ∈ R3

u× u = 0.

3. For all u, v, and w in R3 and real numbers α and β

u× (αv + βw) = α(u× v) + β(u×w),(αu + βv)×w = α(u×w) + β(v ×w).

4. For any i = 1, 2, 3 and j = 1, 2, 3

ei × ej =3∑k=1

εijkek.

This reduces to

e1 × e2 = e3,

e2 × e3 = e1,

e3 × e1 = e2.


u · (u× v) = v · (u× v) = 0.


‖u× v‖ = ‖u‖‖v‖| sin θ|,

where θ is the (smallest) angle between the vectors u and v

Proof. Parts 1, 2, and 3 follow directly from the properties of the determinant(see Theorem 6.12) and the mnemonic for the cross product. Part 4 can beeasily verified through direct computation.

Part 5 can be obtained through a formula that will be of interest below. Forany u, v, and w in R3 we note that

w · (u× v) =

∣∣∣∣∣∣w1 w2 w3

u1 u2 u3

v1 v2 v3

∣∣∣∣∣∣ .This follows directly from the determinant mnemonic for the cross product and

49

the definition of the dot product. If we let w be either u or v, we get Part 5from the properties of the determinant.

Part 6 requires a somewhat longer calculation

‖u× v‖2 = (u2v3 − u3v2)2 + (u3v1 − u1v3)2 + (u1v2 − u2v1)2

= u22v

23 − 2u2v3u3v2 + u2

3v22

+u23v

21 − 2u3v1u1v3 + u2

1v23

+u21v

22 − 2u1v2u2v1 + u2

2v21

= (u21 + u2

2 + u23)(v2

1 + v22 + v2

3)− (u1v1 + u2v2 + u3v3)2

= ‖u‖2‖v‖2 − (u · v)2

= ‖u‖2‖v‖2 − ‖u‖2‖v‖2 cos2 θ

= ‖u‖2‖v‖2 sin2 θ.

Taking the square root of both sides and noting that since θ is the smallestangle between the two vectors we have sin θ ≥ 0 completes the proof.

Remark 7.6. Note that until Part 6 of Theorem 7.5, we put no geometric in-terpretation on the cross product. Everything rested on the algebraic propertiesof vectors as triples of real numbers. Part 6 gives a formula for the magnitudeof the cross product. This magnitude has a geometric interpretation: ‖u× v‖is the area of the parallelogram with sides u and v.

-

u

v‖v‖ sin θ

θ

Figure 7.1: The length of the base of the parallelogram is ‖u‖. It’s altitude is‖v‖ sin θ. This gives the area ‖u‖‖v‖ sin θ.

Remark 7.7. What is the direction of the cross product? Part 5 of Theo-rem 7.5 tells us that the cross product is perpendicular to both u and v. Ifthey are not parallel, then u × v is perpendicular to the (unique) plane con-taining both vectors. Knowing the magnitude of a vector and knowing that itis perpendicular to a given plane still leaves us with two possibilities (pointing“above” the plane or “below” the plane). Which of the two is it? This dependson how we have defined the three perpendicular coordinate axes in space thatcorrespond to the i = e1, j = e2, and k = e3 vectors (which point in the positivecoordinate directions). There are two ways of setting up a Cartesian coordinate

50

system in three dimensions. It is pretty easy to convince yourself that we canchoose any two perpendicular directions for the first two positive coordinateaxes corresponding to e1 and e2. However, once we have done this there aretwo ways to direct the third axis (or e3).

Definition 7.8. We call a coordinate system for R3 right-handed if push-ing e1 toward e2 with the palm of a typical right hand causes the thumbto point in the direction of e3. If e3 points in the opposite direction, thecoordinate system is called left-handed.

It is standard to use a right-handed coordinate system for R3. If this is thecase, the direction of u×v is determined by the right-hand rule: If u is pushedtoward v with the palm of your right hand, your thumb will point toward u×v.

In the proof of the previous theorem, we computed a quantity that has aninteresting physical interpretation.

Definition 7.9. For any u, v, and w in R3 we define the vector tripleproduct of the three vectors to be

w · (u× v) =

∣∣∣∣∣∣w1 w2 w3

u1 u2 u3

v1 v2 v3

∣∣∣∣∣∣ .

Remark 7.10. We can see from the determinant formula that the order of thevectors matters only up to a sign. That is

w · (u× v) = u · (v ×w) = v · (w × u)= −u · (w × v) = −w · (v × u) = −v · (u×w).

We can compute the magnitude of this quantity

|w · (u× v)| = cosψ‖w‖‖u× v‖

where ψ is the angle between w and u× v. Note the following.

• The quantitycosψ‖w‖

gives us the altitude of the vector w above the plane determined u and v.

• The quantity‖u× v‖

gives us the area of the parallelogram with sides u and v.

51

• The volume of a parallelepiped (or, more generally, any prism) is theproduct of the area of the base and the distance between the top andbottom face (the altitude)

Putting these together gives us the following.

Theorem 7.11. The magnitude of the vector triple product of three vectorsis the volume of a parallelepiped with the three vectors as sides.

XXXXXXXXXXXX

XXXXXXXXXXXX

XXXXXXXXXXXX

XXXXXXXXXXXXz

6

u

v

w

u× v

cosψ‖w‖

ψ

Figure 7.2: Parallelepiped created by the triple of vectors u. v, and w.

Problems

Problem 7.1. Let u = (−2, 1, 3), v = (4, 0,−1),w = (3, 1, 1).(a) Compute u× v.(b) Compute w × u.(c) Compute w · (u× v).(d) Compute u · (u × v). (Note you should know the result before doing thecomputation.)

Problem 7.2. Let u = i + 3j + k, v = 2j + 3k,w = i− k.(a) Compute u× v.(b) Compute w × u.

52

(c) Compute w · (u× v).(d) Compute u · (v × w). (Note you should know the result before doing thecomputation.)

Problem 7.3. Find all unit vectors orthogonal to both the vectors given below.(a) (1, 2,−1) and (3, 3,−4).(b) (2, 1, 5) and (1, 0, 2).(c) j and −k.

(d) i− 2k and j + k.

Problem 7.4. Find the area of a parallelograms with sides given by the pairsof vectors in Problem 7.3

Problem 7.5. Find the area of the triangle with vertices (1, 0, 1), (2, 2, 2), and(−1, 2,−2).

Problem 7.6. Find the volumes of the parallelepipeds with edges determinedby the vectors given below.(a) (1, 0, 0), (1, 2,−1), and (3, 3,−4).(b) (1, 1, 3), (2, 1, 5), and (1, 0, 2).(c) i− k, j, and −k.

(d) e1 + e2, e1 − 2e3, and e2 + e3.

Problem 7.7. Prove the identity

a× (b× c) = (a · c)b− (a · b)c.

This is commonly known as the “bac-cab rule.” (You can write the right sideas b(a · c)− c(a · b).)

Problem 7.8. Is the cross product associative? In other words, is it alwaystrue that

a× (b× c) = (a× b)× c?

If not, under what conditions on a, b, and c does the equation above hold?

Problem 7.9. Show that the absolute value of

det(u1 u2

v1 v2

)is the area of the parallelogram with sides given by u = (u1, u2) and v = (v1, v2).Hint: show that the square of the determinant is ‖u‖2‖v‖2 sin2 θ.

Problem 7.10. Find the area of the parallelogram with vertices (−1, 3), (0, 4),(1, 2), (2, 3).

Problem 7.11. Show that if u× v = 0 then u and v are parallel.

53

Problem 7.12. It is possible to define a version of the cross product of n− 1vectors in Rn. In this problem we explore the product of three vectors in R4.Let

u =

u1

u2

u3

u4

, v =

v1

v2

v3

v4

, w =

w1

w2

w3

w4

,

be any vectors in R4. We define

u ∧ v ∧w =4∑i=1

4∑j=1

4∑k=1

4∑l=1

εijklujvkwlei =

∣∣∣∣∣∣∣∣e1 e2 e3 e4

u1 u2 u3 u4

v1 v2 v3 v4

w1 w2 w3 w4

∣∣∣∣∣∣∣∣ .(a) Show that for any scalars α and β and any x ∈ R4 we have

(αu + βx) ∧ v ∧w = α(u ∧ v ∧w) + β(x ∧ v ∧w)

(b) Show that

(u ∧ v ∧w) · u = (u ∧ v ∧w) · v = (u ∧ v ∧w) ·w = 0

(c) Show thatu ∧ v ∧w = −u ∧w ∧ v = −v ∧ u ∧w.

The subject of differential geometry uses rich theory of exterior products orwedge products. This problem is only a brief glimpse.

54

Chapter 8

Lines and Planes

In this section we consider some of the many ways to determine lines and planesin R3 geometrically and describe them mathematically.

Let’s begin with a list of some of the ways a line can be described geomet-rically. A line can be uniquely determined by

• any point on the line and any vector parallel to the line,

• any two points on the line, or

• two intersecting, nonidentical planes.

On the other hand a plane can be uniquely determined by

• any point in the plane and any vector perpendicular (normal) to the plane,

• any point in the plane and any two vectors parallel to the plane that arenot parallel to each other,

• any three distinct points in the plane that do not lie in the same line, or

• any two intersecting lines in the plane that are not coincident.

Of course, these examples are hardly exhaustive. But they suggest a few waysof describing each type of geometric object mathematically. In each case weexhibit two ways of describing the object.

1. We can describe the points on the line or plane as the output of a function.This is known as a parametric representation.

2. We can describe the points on the line or plane as the solution set of anequation or system of equations.

We will examine both types of representation for each of our geometric ob-jects. We start with a parametric representation of a line.

55

Example 8.1 (Parametric representation of a line). Suppose a line contains thepoint X0 = (x0, y0, z0) and is parallel to the vector v = (a, b, c). This simplysays that if X = (x, y, z) is any point on the line, then the vector

−−−→X0X =

(x − x0, y − y0, z − z0) is parallel to v. This can be expressed as the vectorequation

(x− x0, y − y0, z − z0) = t(a, b, c), t ∈ R.We can solve for the points (x, y, z) on our line and express those points as afunction of the parameter t ∈ R.

(x, y, z) = l(t) = (x0 + ta, y0 + tb, z0 + tc) = X0 + tv, t ∈ R.

Note that there can be many parametric representations. Any choice of apoint on the line and a parallel vector will the yield same full set of points; thefunctions will be different though their ranges will be the same.

Example 8.2. If we wish to use the parametric form to describe the line con-taining the points A = (2,−5, 3) and B = (5, 1, 9) we compute a vector con-necting them

v =−−→AB = (5− 2, 1− (−5), 9− 3) = (3, 4, 6).

Then, letting A play the role of X0 we get the parametric vector equation

(x, y, z) = l(t) = (2 + 3t,−5 + 4t, 3 + 6t), t ∈ R.

Writing this as three scalar functions rather than a single vector function givesus

x = 2 + 3t,y = −5 + 4t,z = 3 + 6t.

for any t ∈ R.

We now shift both the type of geometric object and the type of representationand consider a plane as the solution set of an equation.

Example 8.3 (Linear equation for a plane). Suppose a plane contains the pointX0 = (x0, y0, z0) and is normal (perpendicular) to the vector v = (a, b, c). Thissays that if X = (x, y, z) is any point in the plane, then the vector

−−−→X0X =

(x − x0, y − y0, z − z0) is orthogonal to v. This can be expressed as the scalarequation −−−→

X0X · v = a(x− x0) + b(y − y0) + c(z − z0) = 0.

This can be rearranged in the form

ax+ by + cz = d

where d = ax0+by0+cz0. Unlike the parametric form of the line where a specificfunction gives us the points of the line as its output, the points of the plane aresolutions of the single linear equation in three unknowns ax+ by+ cz = d. Anyequation of this general form represents a plane.

56

Example 8.4. To find the equation of a plane containing the points A =(3.−2, 5), B = (9,−3, 7) and C = (4, 6, 0), we first find two vectors in the plane.For instance, we can take u =

−−→AB = (−6, 1,−2) and v =

−−→BC = (5,−9, 7). The

cross product of these vectors is normal to the plane.

n = u× v =

∣∣∣∣∣∣i j k−6 1 −2

5 −9 7

∣∣∣∣∣∣ = 25i + 52j + 49k.

Using A as our reference point in the plane we get the equation

25(x− 3) + 52(y + 2) + 49(z − 5) = 0.

Example 8.5 (Parametric representation of a plane). We note that the equa-tion for a plane ax+ by+ cz = d can be solved by Gaussian elimination (thoughthere is nothing much to eliminate). If a 6= 0 the reduced row echelon form ofthe augmented matrix would be(

1,b

a,c

a,d

a

).

We would assign arbitrary values y = s and z = t to the two variables withoutleading ones in their columns and solve for x to get the solutions

x =d

a− s b

a− t c

a,

y = s,

z = t,

for any s ∈ R and t ∈ R. This can be written in vector form as xyz

= p(s, t) =

da00

+ s

− ba10

+ t

− ca01

.

Here p is a function from R2 to R3 defined for any values of the parameters sand t.

Example 8.6 (System of equations representing a line). How can we bestdescribe the line defined by the intersection of two planes. For example considerthe line defined by the two planes

x+ y − z = 4, and2x+ 3y + z = 9.

Of course, we really don’t need to do anything more to describe the line. Thepoints (x, y, z) on the line are simply the solutions of the system of two equationsin three unknowns. However, if we wish to express the line in parametric form,

57

we need to solve the system so that the solutions are described as a function.The system can be represented as the augmented matrix.(

1 1 −1 42 3 1 9

).

This reduces to the matrix (1 0 −4 30 1 3 1

),

which yields the solutions

x = 4t+ 3,y = −3t+ 1,z = t,

for all t ∈ R. Of course, we can write this as the equation of a line in parametricform

(x, y, z) = l(t) = (3 + 4t, 1− 3t, t), t ∈ R.

In Problem 8.9 we ask the reader to find the equations of two planes thatintersect in a line with a given parametric representation.

Problems

Problem 8.1. Find a parametric equation of the line containing the point(2, 7,−4) and parallel to the vector (3, 5, 0).

Problem 8.2. Find a parametric equation of line containing the points (8, 0,−6)and (7, 9, 3).

Problem 8.3. Find the equation of the plane containing the point (3, 2, 1) andnormal to the vector (−1, 4, 0).

Problem 8.4. Find the equation of the plane containing the line

(x, y, z) = l1(t) = (5 + 4t,−5− 3t, 2), t ∈ R,

and perpendicular to the line

(x, y, z) = l2(t) = (2− 3t, 3− 4t, 5 + 7t), t ∈ R.

Problem 8.5. Do the lines

l1(t) = (5 + 3t,−1 + 7t, 4− 2t), t ∈ R,

andl2(t) = (6− 2t, 9 + 3t, 4 + 2t), t ∈ R,

intersect? If so, where? If there is a plane containing the two lines find anequation for it.

58

Problem 8.6. Show that the line

l(t) = (1 + 2t, t, 2 + 3t), t ∈ R,

lies in the planex− 5y + z = 3.

Problem 8.7. Find the equation of the plane containing the points X =(0, 1, 2), Y = (−3, 4, 5) and Z = (2, 0, 1).

Problem 8.8. Find the parametric equation of the line defined by the inter-section of the two planes

2x− 4y + 5z = 7,

and3x+ 2z = 9.

Problem 8.9. Find the equations of two planes that intersect in the line con-taining the point X0 = (1,−1, 3) and parallel to the vector v = (5, 0,−2). Hint:The normals to the planes must be perpendicular to v.

Problem 8.10. Show that the distance between a point X0 = (x0, y0, z0) andthe plane Ax+By + Cz = D is

|Ax0 +By0 + Cz0 +D|√A2 +B2 + C2

.

You may assume that if X = (x, y, z) is the closest point in the plane to X0

then−−−→X0X is parallel to the normal vector.

Problem 8.11. Find an equation for the distance between a plane and a linethat does not intersect the plane.

59

Chapter 9

Functions, Limits, andContinuity

In this section we give some basic definitions, theorems, and examples of limitsand continuity of functions. As we shall see, when the domain of a function ismultidimensional the possible problems with continuity of functions are muchmore complicated than those of the function encountered in elementary calculus.

We begin with what for most readers will be a review of the terminology offunctions.

Definition 9.1. We call Ω ⊂ Rn the domain of a function f : Ω→ Rm iff represents a well-defined rule or strategy that for each x ∈ Ω prescribes aunique value f(x) ∈ Rm. The range of the function is defined to be

R(f) = y ∈ Rm | there exists x ∈ Ω such that f(x) = y.

The graph of f is the set G(f) of points in Rn+m of the form

(x1, x2, . . . , xn, f1(x), f2(x), . . . , fm(x)) ∈ Rn+m

where x = (x1, x2, . . . , xn) ∈ Ω.

The following concepts will be important in our discussions of coordinatetransformations.

60

Definition 9.2. Let Ω ⊂ Rn and Υ ⊂ Rm. We say that f : Ω→ Υ is:

• One-to-one or injective if whenever f(x) = f(y) we have x = y.That is, every point in the target set Υ is the image of at most onepoint in the domain Ω.

• Onto or surjective if for every z ∈ Υ there exists x ∈ Ω such thatf(x) = z. That is, every point in the target set Υ is the image of atleast one point in the domain Ω.

• Invertible if there exists a function f−1 : Υ → Ω such thatf−1(f(x)) = x for every x ∈ Ω. Note that this is equivalent to sayingthat every point in the target set Υ is the image of exactly one pointin the domain Ω. That is, a function is invertible if and only if it isone-to-one and onto.

We introduce the language of open and closed sets in Rn. We begin withthe most basic open set, an open ball.

Definition 9.3. The open ball of radius ε about x ∈ Rn is the set

Bε(x) = y ∈ Rn | ‖y − x‖ < ε.

We now consider more general sets.

Definition 9.4. Let Ω ⊆ Rn.

• We say that Ω is bounded if there exists M > 0 such that ‖x‖ ≤Mfor all x ∈ Ω.

• We say that x is an interior point of Ω if there is some ε > 0 suchthat Bε(x) ⊂ D. We say that Ω is an open set if every x ∈ Ω is aninterior point.

• We say that x is a boundary point of Ω if for every ε > 0 there isat least one point in Ω and at least one point in the exterior of Ω inBε(x). We say that Ω is a closed set if it contains all of its boundarypoints.

• We call the union of the interior points and boundary points of Ω theclosure of Ω, denoted Ω.

Example 9.5. Consider the annulus

A = x ∈ R2 | 1 < ‖x‖ < 2.

61

The boundary of A has two pieces: the circle of radius one and the circle ofradius two. Every point x ∈ A is an interior point. If we take ε = min2 −‖x‖, ‖x‖ − 1/2 (that is, half the distance between x and the boundary) thenthe ball of radius ε about x lies inside the annulus. The closure of A is given by

A = x ∈ R2 | 1 ≤ ‖x‖ ≤ 2.

Example 9.6. The punctured unit disk

D = x ∈ R2 | 0 < ‖x‖ < 1

is essentially the unit disk with the origin removed. As above, the boundary ofD has two pieces. Obviously, the unit circle is part of the boundary. But themore interesting boundary point is the origin itself. To see that it is a boundarypoint, note that every ball about the origin contains points in the punctureddisk and at least one point not in the punctured disk - the origin itself. Asbefore, every point x ∈ D is an interior point. The closure of the punctureddisk D is given by the closed disk.

D = x ∈ R2 | ‖x‖ ≤ 1.

We now define the limit of a function. Note that the form of the definitionis almost identical to the definition of the limit of a real valued function of a(single) real variable.

Definition 9.7. Let Ω ⊆ Rn be the domain of the function f : Ω → Rm.Let x0 be in the closure of Ω. Then we say the limit of f as x approachesx0 is l if for every ε > 0 there exists δ > 0 such that for any x ∈ Ω with

0 < ‖x− x0‖ < δ

we have‖f(x)− l‖ < ε.

In this case we writelim

x→x0f(x) = l,

or f(x)→ l as x→ x0.

Example 9.8. Consider the function

f(x, y) =x4 − y4

x2 + y2.

Since this quotient is not defined when the denominator is zero, we take thedomain of the definition to be, say, the punctured disk described in Example 9.6.

62

While the function is not defined at the origin, we can ask if the limit exists as(x, y)→ (0, 0). In this case we note that

f(xi, yi) =x4i − y4

i

x2i + y2

i

=(x2i − y2

i )(x2i + y2

i )x2i + y2

i

= x2i − y2

i → 0− 0 = 0 as (xi, yi)→ (0, 0).

So the limit exists and

lim(x,y)→(0,0)

x4 − y4

x2 + y2= 0.

The following theorem (which we state without proof) show that the limitsof various algebraic combinations of functions behave in the obvious way.

Theorem 9.9. Let Ω ⊂ Rm be the domain of functions f : Ω → Rn andg : Ω→ Rn Suppose that for some x0 ∈ Ω we have

limx→x0

f(x) = lf ,

limx→x0

g(x) = lg.

Then,

1. limx→x0 cf(x) = clf for every scalar c ∈ R,

2. limx→x0 v · f(x) = v · lf for every vector v ∈ Rn,

3. limx→x0 Af(x) = Alf for every k × n matrix A,

4. limx→x0(f(x) + g(x)) = lf + lg,

5. limx→x0(f(x) · g(x)) = lf · lg,

6. limx→x0(f(x)× g(x)) = lf × lg if n = 3.

The next theorem allows us to compute the limit of a product of functionswhen the limit of one function is zero even if we know only that the secondfunction is bounded.

63

Theorem 9.10. Let Ω ⊂ Rm be the domain of functions f : Ω → Rn andg : Ω→ R Suppose that for some x0 ∈ Ω we have

limx→x0

f(x) = 0,

and that g is bounded in a neighborhood of x0. That is, there exists M > 0and γ > 0 such that

|g(x)| ≤M

for all x ∈ Ω ∩Bγ(x0). Then

limx→x0

g(x)f(x) = 0,

Proof. Let ε > 0 be given. Since limx→x0 f(x) = 0 there exists δ1 > 0 such thatfor every x ∈ Ω with 0 < ‖x− x0‖ < δ1 we have

‖f(x)− 0‖ = ‖f(x)‖ < ε

M.

We now choose δ to be the smaller of γ and δ1. Then for any x ∈ Ω with0 < ‖x− x0‖ < δ we have

‖g(x)f(x)− 0‖ = |g(x)| ‖f(x)‖ < Mε

M= ε.

This completes the proof.

Remark 9.11. Theorem 9.10 was stated for a bounded scalar function g andscalar multiplication with the vector function f . Analogous versions hold fordot products, cross products, and matrix products if the dimension of the rangeof the functions is changed appropriately.

Remark 9.12. In the calculus of functions of one dimension, limits were notall that complicated. There are basically three generic things that can go wrongso that the limit of a function did not exist.

1. A function could have a “jump discontinuity.” That is, it can have welldefined, finite limits from both the right and left, but those limits mightbe different. An important example of this is the Heaviside function

h(x) =

0, if x < 01, if x ≥ 0,

which has a jump discontinuity at x = 0. Though the limits from the leftand right exist, but since they are different, the overall limit of h as x→ 0does not.

64

2. A function might not have a limit because it becomes unbounded. In thiscase we say it diverges to plus or minus infinity at a point. This is thecase with

f(x) =1x2

which diverges to infinity as x approaches 0 from either the right or left.

3. A function would not have a limit if it oscillates so wildly that the limitdoes not exist. This is the case with the function

g(x) = sin(

1x

)which has no limit as x→ 0.

An important key here is that in one dimension there are really only two pathsto a point: from the right or the left. When we move to two dimensions (orhigher) there are an infinite number of paths we can take to a point.

Example 9.13. The function

f(x, y) =x2

x2 + y2

has no limit as (x, y) approaches (0, 0). To see this, suppose we choose a sequenceof points along the line through the origin y = αx where α is any constant. Alongsuch a line we would have

f(x, αx) =x2

x2 + α2x2=

11 + α2

.

Thus, the function is constant along any line through the origin, but the constantchanges depending on the direction we choose. To visualize the graph of thefunction, think of a spiral staircase, where the height as one goes toward thecenter post depends on which step you are on.

As it did in one-dimensional calculus, the notion of limit of a function leadsdirectly to the notion of continuity.

Definition 9.14. Let Ω ⊆ Rn. We say that the function f : Ω → Rm iscontinuous at x0 ∈ Ω if

limx→x0

f(x) = f(x0).

If f is continuous at every x ∈ Ω we say that f is continuous on Ω and writef ∈ C(Ω).

65

x

y

x2

x2 + y2

x

y

Figure 9.1: Graph of the function f(x, y) = (x2)/(x2 + y2) which has a “spiralstaircase” discontinuity at the origin. Lines going into the origin have a differentlimiting height.

66

Example 9.15. Our examples above of limits of functions with multidimen-sional domains can be interpreted in terms of continuity. For instance, thefunction

f(x, y) =

x4−y4

x2+y2 , (x, y) 6= (0, 0)0, (x, y) = (0, 0)

is continuous. Since the limit of the rational function x4−y4

x2+y2 existed as (x, y)→(0, 0), we could simply extend the domain to the origin and define the value ofthe extended function to be the limit. However, the function

g(x, y) =

x2

x2+y2 , (x, y) 6= (0, 0)0, (x, y) = (0, 0)

is not continuous. In fact, there is no way to extend the domain of the functionx2

x2+y2 to the origin since the limit of the function does not exist at that point.

The following theorem, (which we state without proof) says that variousalgebraic combinations of continuous functions are continuous.

Theorem 9.16. Let Ω ⊂ Rn, and let c ∈ R.

1. If f : Ω→ Rm is continuous at x0 ∈ Ω then so is the constant multiplecf .

2. If f : Ω → Rm and g : Ω → Rm are continuous at x0 ∈ Ω then so isthe sum f + g.

3. If f : Ω → R and g : Ω → R are continuous at x0 ∈ Ω then so is theproduct fg.

4. If f : Ω → R and g : Ω → R are continuous at x0 ∈ Ω and g(x0) 6= 0then the quotient f/g is continuous at x0 ∈ Ω.

5. If f : Ω → Rm and g : Ω → Rm are continuous at x0 ∈ Ω then so isthe dot product f · g.

6. If f : Ω→ R3 and g : Ω→ R3 are continuous at x0 ∈ Ω then so is thecross product f × g.

7. If f : Ω→ R and g : Ω→ Rm are continuous at x0 ∈ Ω then so is thescalar product fg.

Finally, we define a pretty obvious concept that will be important later on.

67

Definition 9.17. Let Ω ⊂ Rn be the domain of the function f : Ω → Rm.We say f is bounded on Ω if there exists M ∈ R such that

‖f(x)‖ < M

for all x ∈ Ω.

Problems

Problem 9.1. The following limits exist. Find their values.(a)

lim(x,y)→(0,0)

x2 − y2 + 3y + 3yx+ y

.

(b)

lim(x,y)→(0,0)

x4 − y4

x2 + y2.

(c)lim

(x,y,z)→(0,0,0)

xyz

x2 + y2 + z2.

(d)

lim(x,y)→(0,0)

exy − 1xy

.

(e)

lim(x,y)→(0,0)

sinxyx

.

Problem 9.2. Show that the following limits do not exist by examining thelimits along at least two paths(a)

lim(x,y)→(0,0)

x2 − y2

x2 + y2.

(b)lim

(x,y)→(0,0)

xy

x2 + y2.

(c)

lim(x,y)→(0,0)

x− yx2 + y2

.

(d)

lim(x,y)→(0,0)

x+ y2√x2 + y2

.

68

(e)

lim(x,y)→(0,0)

x2y

x4 + y2.

Problem 9.3. Consider the limit

lim(x,y)→(0,0)

x4y4

(x4 + y2)3 .

What happens as (x.y) → (0, 0) along the line y = αx? What happens alongthe curve y = x2? What can we say about the general limit?

Problem 9.4. For a ∈ R define

f(x, y) =

sin(x2+y2)x2+y2 , (x, y) 6= (0, 0),

a, (x, y) = (0, 0).

Is there any choice of a for which the function is continuous on the whole plane?

Problem 9.5. For a ∈ R define

f(x, y) =

e−1

(x−y)2 x 6= y,a x = y

Is there any choice of a for which the function is continuous on the whole plane?

69

Chapter 10

Functions from R to Rn

In the next three chapters, we are going to examine functions involving multiplevariables.

• In Chapter 10 (the current chapter) we examine functions whose domainis R and whose range is Rn.

• In Chapter 11 we examine functions whose domain is Rn and whose rangeis R.

• In Chapter 12 we examine functions whose domain is Rn and whose rangeis Rm.

In these “precalculus” chapters we will be mostly interested in terminology andvisualization of these functions.

Readers have probably encountered functions from R to Rn before, at leastwhen studying ordinary differential equations. It is typical to think of theindependent variable (in the domain) as time, while the dependent variable canbe the position in space or other quantities (such as temperature, pressure andvolume of a quantity of gas in a cylinder). As we shall see in the remainderof the book, because the domain is simple and one-dimensional, the calculus ofsuch functions is simple as well1.

If we think of the domain of this type of function as time and the range asspace, we think of the “motion” described by the function sweeping out a curve.The following definitions may seem to draw a lot of overly fine distinctions, butthese distinction can be useful in many situations.

1In fact, the more advanced subject of “semigroups” considers the calculus of functionswhose domain is R but whose range is infinite-dimensional. While this presents some sig-nificant technical challenges, the bottom line is that the results of calculus and ordinarydifferential can readily be extended to these functions.

70

Definition 10.1. A trajectory is a continuous function r from an intervalI ⊂ R to Rn. The range of a trajectory (a subset of Rn) is called a curve.If I = [t0, t1] is a closed bounded interval we call r(t0) the initial pointand r(t1) the terminal point of the trajectory.

Thus, a curve is just a set of points in space while a trajectory gives usinformation as to how those points were traversed. In particular, if we think ofI as representing an interval of time, the trajectory tells us how fast the curvewas traversed and in what direction. It also tells us if any points of the curvewere traversed more than once.

Example 10.2. The trajectories

r1(t) = (cos t, sin t), t ∈ [0, 2π],

andr2(t) = (sin 2t, cos 2t), t ∈ [3π, 4π],

both trace out the unit circle, though they trace the points in the oppositedirections and have different initial and terminal points.

-

?

6

&%'$

x

y

sI

R

-

?

6

&%'$

x

y sR

I

Figure 10.1: The two trajectories r1(t) = (cos t, sin t), t ∈ [0, 2π] and r2(t) =(sin 2t, cos 2t), t ∈ [3π, 4π] trace the same curve, but in different direction andwith different initial and terminal points.

Example 10.3. The trajectory

h(t) = (cos t, sin t, t/6), t ∈ [0, 6π],

generates a helix about the z-axis with initial point (1, 0, 0) and terminal point(1, 0, π).


l(t) = (3t− 5, 4t+ 7,−2t+ 6), t ∈ [0, 2]

traverses a line segment with initial point (−5, 7, 6) and terminal point (1, 15, 2).

71

-1-0.5

00.5

1

x

-1

-0.5

0

0.5

1y

0

1

2

3

z

-1

-0.5

0

0.5y

Figure 10.2: The helix generated by h(t) = (cos t, sin t, t/6), t ∈ [0, 6π]


h(t) =

(3t− 5, 4t+ 7,−2t+ 6), t ∈ [0, 1](4t− 6, 11,−7t+ 11), t ∈ (1, 2]

traverses a continuous, “piecewise linear” curve from initial point (−5, 7, 6) tothe point (−2, 11, 4) and then proceeding to the terminal point (2, 11,−3). (SeeFigure 10.3.)

Definition 10.6. We say that a trajectory is a cyclic if its initial andterminal points are the same. A trajectory is simple if it occupies no pointtwice, except that, perhaps, it could be cyclic.

Example 10.7. The trajectories r1 and r2 given above are both simple andcyclic. The trajectory

r3(t) = (cos t, sin t), t ∈ [0, 4π]

would be cyclic, but not simple since it traces the curve twice.

The concepts of curve and trajectory are very similar, but a curve has onlygeometric information. A third concept, “path,” contains information aboutorder but no information about speed.

72

-6-5

-4-3

-2

x

7

8

9

10

11y

4

6

8

10

z

7

8

9

10

11y

Figure 10.3: Piecewise linear trajectory.

Definition 10.8. We say that two trajectories r : I1 → Rn and g : I2 → Rnare path equivalent if there is a continuously differentiable, monotoneincreasing, onto function φ : I1 → I2 such that

r(t) = g(φ(t)) t ∈ I1.

An “equivalence class” of trajectories (the set of all trajectories equivalentto a given trajectory) is called a path. Any trajectory in the class is calleda representative of the path. The path of a simple trajectory is called asimple path. The path of a cyclic trajectory is called a cycle. The pathof a simple, cyclic trajectory is called a simple cycle. A curve that is therange of a simple path is called a simple curve.

We will not give a rigorous treatment of equivalence classes in this book. (Sucha treatment can be found in [2].) Instead, we simply note that two path equiv-alent trajectories traverse the same points of a curve in the same order. Inparticular, they have the same initial and terminal points, and path equivalentsimple trajectories are traversed in the same direction. If you analyze the formaldefinition carefully, you will see that a path is simply a curve with the additionof information about the order in which the points on the curve were traversed.For a simple path this reduces to the direction one takes to get from the initialpoint to the terminal point. For that reason, a simple path is often referred to

73

as an oriented simple curve.

Example 10.9. The trajectories

r1(t) = (cos t, sin t), t ∈ [0, 2π],

andr4(t) = (cos 4t, sin 4t), t ∈ [0, π/2],

are equivalent (using the mapping φ(t) = 4t). The path of these trajectories (asimple cycle) is simply the unit circle, traced counterclockwise, beginning andending with the point (1, 0).

It is often convenient to go “backwards” along a path.

Definition 10.10. Let r : [t0, t1] → Rn be a trajectory representing thesimple path P, The reverse of P is the path equivalent to the trajectoryr− : [0, 1]→ Rn given by

r−(s) = r((1− s)t1 + s t0)

for s ∈ [0, 1]. We write −P for the reverse of P.

Problems

Problem 10.1. Define a function describing a trajectory the traverses a linesegment with initial point (1, 3, 4) and terminal point (2,−1, 6).

Problem 10.2. Define a function describing a trajectory the traverses a circleof radius 3 about the origin in the (x, y)-plane. The path should be simple,cyclic, counterclockwise, and the initial point should be (−3, 0).

Problem 10.3. Define a function describing a trajectory the traverses a circleof radius 1 about the point (−2, 4) in the (x, y)-plane. The path should becyclic, clockwise, and traverse the circle exactly three times. The initial pointshould be the point (−2, 3).

Problem 10.4. Define a function describing a trajectory the traverses a theellipse

x2

9+y2

4= 1.

The path should be simple, cyclic, counterclockwise, and the initial point shouldbe (0, 2).

Problem 10.5. Define a function describing a trajectory the traverses theintersection of the sphere of radius four in R3 and the plane z = 1. The pathshould be simple and cyclic. You may choose the initial point and the orientationthe trajectory, but should describe these clearly.

74

Problem 10.6. Define a function describing a continuous, “piecewise smooth”simple cyclic trajectory that first traverses the upper half-circle of radius twoabout the origin in the (y, z)-plane from (2, 0) to (−2, 0) and then proceedsalong the x-axis back to the initial point.

Problem 10.7. Define a function describing a continuous, “piecewise smooth”simple trajectory that first traverses the smallest portion of the circle of radius1 about the origin in the (y, z)-plane from (0, 0, 1) to (0, 1, 0) and then proceedson the line segment to (2, 5,−7).

Problem 10.8. Plot the curve

r(t) =(

sin t| sin t|

), t ∈ [0, 2π].

Is the curve simple? Is the curve closed?

Problem 10.9. Consider a wheel of radius r. Suppose we mark the point wherethe wheel touches the ground and then roll the wheel to our right with speed v.The center of the wheel travel along the trajectory

f(t) =(vtr

).

The point that was marked on the wheel moves counterclockwise relative to thecenter of the wheel in a trajectory

g(t) =(−r sinωt−r cosωt

).

where ω is the angular velocity of the wheel. If the wheel maintains contactwith the road (no burning rubber) the angular velocity is determined by thevelocity of the center v and the radius r. What is the angular velocity? (Hint:How far does the center have to travel for the entire circumference of the wheelto make contact with the road? How long does that take? How many radiansdid the wheel turn in that time?) Show that regardless of the velocity v themark travels along the path (relative to the observer)

c(t) = r

(t− sin t1− cos t

).

This path is called a cycloid. Plot the cycloid.

75

Chapter 11

Functions from Rn to R

In this section we consider functions where the domain is multidimensional andthe range is one-dimensional. These are often called scalar fields and there area host of examples of such functions

• The temperature at each point in a room.

• The mass density at each point in a solid body.

• The elevation of a geographic point represented by a point on a two-dimensional map.

We will study the calculus of such functions pretty thoroughly in the chaptersbelow. At this point we simply want to pause and discuss how to visualize thesefunctions. Visualization of data is a complex subject that goes beyond thesimple graphing that we will do here. We simply discuss a few basic techniques.

We begin with functions whose domain is R2. Generically, the graph of suchfunctions is a two-dimensional surface in R3. In our time, the standard wayof representing such an object on a two-dimensional piece of paper is using a“perspective drawing.” Of course, we can use computer programs to draw suchsurfaces. (For instance, Mathematica’s Plot3D command will do this.) However,there is much to be gained from the process of drawing figures by hand - evenif the final result is not as good as the output of a computer.1

Of course, graphing a surface is often difficult, and one of the best ways toapproach the task is to draw a sequence of curves that intersect the graph.

1Even before computer generated graphics became common and inexpensive, very fewthings could drain the life out of a room full of mathematicians and allied scientists than theprospect of testing their artistic abilities by graphing. Indeed, no one who has seen me teacha course that involved extensive graphing of three-dimensional objects would get the idea thatdrawing beautiful graphs is an easily acquired skill. However, even if my graphs don’t do thebest job at communicating information to others, the process of creating them conveys a greatdeal of information to me. So I encourage students to continue to struggle with hand drawngraphs - even if the results are somewhat disappointing.

76

Definition 11.1. A section (or cross section) of a three-dimensional objectis the intersection of that object with a plane. A section of the graph of afunction f : R2 → R usually refers to the intersection of the graph with avertical plane.

Useful sections to graph typically include intersections with the coordinateplanes (x = 0 and y = 0) and planes where either x or y is constant. Planeslike x = y and x = −y can also be useful.

An alternative to drawing the graph of a function (which is an object inRn+1) is to draw a “contour plot” of the function. This is a collection of graphsof subsets of the domain where the function is constant.

Definition 11.2. Let Ω ⊂ Rn be the domain of a function f : Ω→ R. Forany c ∈ R, the level set of f at level c is the set

L(f)c = x ∈ Ω | f(x) = c.

Remark 11.3. If n = 2 we generally have “level curves.” If n = 3 we gener-ally have “level surfaces.” (The exceptions are degenerate cases like constantfunctions.)

Remark 11.4. It is easy to see the advantage of drawing level sets ratherthan graphs. For instance, in the case n = 2 we have to draw curves in theplane rather than surfaces in three-dimensional space. For n = 3, the graph ofa function is an object (hypersurface) in R4, which is probably impossible tovisualize. The level surfaces are surfaces in R3, which are at least feasible tovisualize.

Remark 11.5. The Mathematica commandsContourPlot and ContourPlot3D

make plots for n = 2 and n = 3 respectively.

Example 11.6. We consider the function

f(x, y) = 4x2 + y2.

In Figure 11.1 we have plotted the surface generated by function and six sec-tions. Note that all of the sections are parabolas in their respective planes. InFigure 11.2 we have graphed the four level curves of the function determined by

4x2 + y2 = 1,4x2 + y2 = 4,4x2 + y2 = 9,4x2 + y2 = 16.

These curves are ellipses.

77

-2

-1

0

1

2

x

-2

-1

0

1

2

y

0

5

10

15

20

z

-2

-1

0

1

2

x

Figure 11.1: Graph of the functions f(x, y) = 4x2 + y2 with the sections in theplanes x = 0,±1 and y = 0,±1.

-3 -2 -1 1 2 3x

-3

-2

-1

1

2

3

y

Figure 11.2: Contour plots of the function f(x, y) = 4x2 + y2 at levels c =1, 4, 9, 16.

Example 11.7. We consider the function

g(x, y) = xy.

In Figure 11.3 we have plotted the surface generated by function (which isusually referred to as a “saddle”) and six sections. Note that all of the sectionsare lines through the origin in their respective planes. In Figure 11.4 we have

78

graphed the four level curves of the function determined by

xy = 0,xy = 1,xy = −1,xy = 2,xy = −2.

These curves are a pair of lines at level c− 0 and hyperbolas at the other levels.

-2

-1

0

1

2

x

-2

-1

0

1

2

y

-4

-2

0

2

4

xy

-2

-1

0

1

2

x

-2

-1

0

1y

Figure 11.3: Graph of the functions g(x, y) = xy with the sections in the planesx = 0,±1 and y = 0,±1.

-3 -2 -1 1 2 3x

-3

-2

-1

1

2

3

y

Figure 11.4: Contour plots of the function g(x, y) = xy at levels c = 0,±1,±2.

79

Problems

Problem 11.1. Graph the function

f(x, y) = 4x2 + 9y2.

Include the sections in the planes x = 0,±1 and , y = 0,±1.


f(x, y) =19x2 − 4y2.

Include the sections in the planes x = 0,±3 and , y = 0,±1.


f(x, y) =

√x2

25+y2

4.

Be sure to describe the domain of the function and include a sections that youfeel are helpful.

Problem 11.4. Graph four revealing level curves of the function

f(x, y) = 4x2 − 8x+ 9y2 + 36y + 40.

Be sure to label the level of each curve.

Problem 11.5. Graph four revealing level curves of the function

f(x, y) = x2 + 2x− 25y2 + 100y − 99.

Be sure to label the level of each curve.

80

Chapter 12

Functions from Rn to Rm

In this chapter we consider functions where both the domain and range is mul-tidimensional. Of course, for a function f : Rn → Rm we can simply think of itas a collection of m scalar fields

f1(x1, x2, . . . , xn)f2(x1, x2, . . . , xn)

...fm(x1, x2, . . . , xn)

.

Let’s look at some special cases.

12.1 Vector Fields

A vector field is a function where both the domain and range have the samedimension (higher than one).

Example 12.1. There are many important examples from physics of this typeof function.

• At every point in space (R3) there exists a measurable electric field andmagnetic field, each of them three-dimensional vectors with a magnitudeand direction.

• At every point in space there is a gravitational force per unit mass.Again, this is a three-dimensional vector quantity with a magnitude anddirection.

• At every point in a moving fluid velocity field describing the speedand direction of motion. It is common to consider both two and three-dimensional flows. The dimension of the velocity vector would correspondto the dimension of the flow.

81

Remark 12.2. Visualization of vector fields is difficult. Even in R2, the graphof a vector field would be a subset of R4 and essentially impossible to visualizedirectly. Instead of this, we content ourselves with the method of graphingthe domain and placing a directed line segment representing the output of thefunction at a selection of points in the domain.

The are several computer programs that will help with this. For instance,Mathematica uses the command PlotVectorField and PlotVectorField3Dwhich are in the PlotField3D package.

Example 12.3. The two-dimensional vector field

f(x, y) = (−y, x)

represents a counterclockwise flow about the origin. Figure 12.1 displays aMathematica plot of the type described above. When plotting by hand it is oftenuseful to plot arrows along the axes and a few other lines to begin. For instanceon the x-axis (y = 0), the output of the function has a zero x-component. Thus,it points in a vertical direction” up when x is positive, down when x is negative.Similarly, the output field points to the left on the positive y-axis and to theright on the negative. A similar analysis works for lines like y = x and y = −x.

-2 -1 1 2x

-2

-1

1

2

y

Figure 12.1: Plot of the vector field f(x, y) = (−y, x).

The field

g(x, y) =

(− x√

x2 + y2,− y√

x2 + y2

)represent a flow toward the origin. (See Figure 12.2.) When hand drawing thefield it would be helpful to note that the output of the field at the point (x, y)is parallel to the vector (x, y). Since the scalar multiple is negative, the outputpoints to the origin.

82

-2 -1 1 2x

-2

-1

1

2

y

Figure 12.2: Plot of the vector field g(x, y) = (−x/√x2 + y2,−y/

√x2 + y2).

Example 12.4. The three-dimensional vector field

v(x, y, z) = (y,−x,−z)

swirls about the z-axis while flowing toward the xy-plane. (See Figure 12.3.)Compare the first two components to the two dimensional field f above. Thethird component points in the negative z direction – toward the xy-plane.

-2

0

2x

-2

0

2y

-5

-2.5

0

2.5

5

z

-2

0

2x

-2

0

2y

Figure 12.3: Plot of the swirling 3-D vector field v(x, y, z) = (y,−x,−z).

83

12.2 Parameterized Surfaces

We can use functions from R2 to R3 to describe two-dimensional parameterizedsurfaces in R3.

Example 12.5. We can describe a plane containing the point X0 = (x0, y0, z0)and parallel to the vectors u = (u1, u2, u3) and v = (v1, v2, v3) using the function

p(s, t) = X0 + su + tv =

x0 + su1 + tv1

y0 + su2 + tv2

z0 + su3 + tv3

,

Here (s, t) ∈ R2.

Example 12.6. We can describe a right circular cylinder of radius r centeredon the y-axis using the following function h : [0, 2π)× R→ R3

h(θ, s) =

r cos θs

r sin θ

,

Here r > 0 is fixed. The x and z coordinates sweep out a circle of radius r aboutthe origin in the xz-plane using the parameter θ ∈ [0, 2π). The parameter s ∈ Rtranslates that circle along the y-axis.

Example 12.7. We can describe a sphere of radius ρ > 0 using the functiong : [0, 2π)× [0, π]→ R3

g(θ, φ) =

ρ cos θ sinφρ sin θ sinφρ cosφ

,

where ρ > 0 is fixed. (θ, φ) ∈ [0, 2π)× [0, π] are standard spherical coordinates.φ is the angle between the vector g and the positive z-axis. θ is the anglebetween the projection of h onto the xy-plane and the positive x-axis. Notethat by letting φ range from 0 to π and θ range from 0 to 2π, we can representthe entire sphere. If we tried to represent the sphere as a graph we could onlygive half of the sphere with a single function, e.g.

z = f1(x, y) =√ρ2 − x2 − y2, x2 + y2 ≤ ρ2,

andz = f2(x, y) = −

√ρ2 − x2 − y2, x2 + y2 ≤ ρ2.

Problems

84

-1

-0.5

0

0.5

1

x

-1

-0.5

0

0.5

1

y

-1

-0.5

0

0.5

1

z

-1

-0.5

0

0.5x

-1

-0.5

0

0.5y

Figure 12.4: Plot of the parameterized spherical surface.

Problem 12.1. Plot the following vector fields.(a)

f(x, y) = (y,−x/2).

(b)g(x, y) = (x,−y/2).

(c)u(x, y, z) = (0,−y, z).

(d)v(x, y, z) = (−z,−y, x).

Problem 12.2. Represent a right circular cylinder of radius one about they-axis as a parameterized surface.

Problem 12.3. Represent the ellipsoid described by

x2 + 4y2 + 9z2 = 1

as a parameterized surface. (Hint: Modify the a parameterizations of thesphere.)

85

Chapter 13

Curvilinear Coordinates

In this chapter we discuss a particularly important type of function from Rn toRn: coordinate transformations. These are invertible functions from one regionof Rn to another. We now give examine some important examples.

13.1 Polar Coordinates in R2

Many functions in the plane have circular symmetry. While it is usually possibleto describe them in Cartesian coordinates, we can often gain a great deal ofsimplicity and clarity by using a system of polar coordinates that describes eachpoint in the plane by its distance from the origin r and its angle θ with thex-axis. The basic formula for the transformation is

x = r cos θ,y = r sin θ,

where r ∈ [0,∞) and θ ∈ (−π, π]. We will use the notation of vector fields forthis transformation.(

xy

)= pp(r, θ) =

(x(r, θ)y(r, θ)

)=(r cos θr sin θ

).

Several immediate observations are in order.

• The transformation is singular. That is, it is not one-to-one: all points inthe rθ-plane with r = 0 map to a single point (the origin) in the xy-plane.

• The transformation is well defined for any r ∈ R and θ ∈ R. Thus thedomain of the transformation was chosen somewhat arbitrarily so that themapping was one-to-one except at the origin. As long as one keeps thepossible problems in mind, it is often useful to extend the domain of thetransformation.

86

• The transformation (with its domain restricted to r ∈ (0,∞) and θ ∈(−π, π]) is invertible using the equations

r =√x2 + y2,

tan θ =y

x.

If the pair (x, y) is in the open first or fourth quadrants we can use

θ = arctan(yx

).

If we recall that the range of the acrtangent function is (−fracπ2, π2 ) wesee that in the second quadrant we have

θ = arctan(yx

)+ π,

and third quadrant we have

θ = arctan(yx

)− π.

The simplest objects to describe in polar coordinates are circles about the ori-gin (r = C) and rays from the origin (θ = C). Regions bounded by these objectsare also easy to describe. For instance, the annular region 1/2 ≤

√x2 + y2 ≤ 1,

y ≥ x/√

3, y ≥ −√

3x. While the Cartesian description of this region is com-plicated, the polar description is trivial: 1/2 ≤ r ≤ 1, π

6 ≤ θ ≤ 2π3 . (See

Figure13.1.)

-

6

θ

r

π6

2π3

0.5 1-

6

x

y

J

JJJ

pp(r, θ)

q

Figure 13.1: Polar mapping from a rectangle in the rθ-plane to a sector of anannulus in the xy-plane.

In Cartesian coordinates, we typically use the standard basis vectors i =(1, 0) and j = (0, 1) when describing vector fields. We now define a polar basisthat is often useful for vector fields with circular symmetry.

er(θ) = cos θi + sin θjeθ(θ) = − sin θi + cos θj

87

(See Figure 13.2.) Note that this is an orthonormal set and any vector field inR2 can be described as a linear combination of these vectors. However, vectorfields with the right symmetry can be particularly easy to describe in this way.For instance, the polar mapping itself takes the form

pp(r, θ) = rer(θ).

The 2-dimensional vector fields of Example 12.3 are equally easy to represent.The swirling flow of Figure 12.1 can be written

f(r, θ) = f(r cos θ, r sin θ) = reθ(θ).

The flow toward the origin of Figure 12.2 is given by

g(r, θ) = g(r cos θ, r sin θ) = −er(θ).

-

6y

x

eθ = − sin θi + cos θj

er = cos θi + sin θj

pp(r, θ) = r cos θi + r sin θjθ

33

JJJ]

Figure 13.2: Polar coordinates in R2

13.2 Cylindrical Coordinates in R3

In three dimensions we often have circular symmetry about a particular line. Ifwe choose our origin and coordinate system so that the line represents the z-axis,we can transform between Cartesian coordinates and “cylindrical” coordinates.In this system, the z-coordinate is retained from the Cartesian system and thex and y coordinates are transformed as in the two-dimensional polar system.

x = r cos θ,y = r sin θ,z = z.

In vector form we use the notation xyz

= pc(r, θ, z) =

x(r, θ, z)y(r, θ, z)z(r, θ, z)

=

r cos θr sin θz

.

88

-

6

y

z

x

r

pc(r, θ, z) = (r cos θ, r sin θ, z)

ez = k

eθ(θ) = − sin θi + cos θj

er(θ) = cos θi + sin θj

JJJJJJJJJJJ

JJ

JJ

JJ

JJ

θ

6

JJJJ

3

Figure 13.3: Cylindrical coordinates in R3

89

Of course, certain geometric objects are easy to describe in this system.

• Right circular cylinders about the z-axis are given by the equation r = C.(Here C is a constant.)

• Planes parallel to the xy-plane are given by z = C.

• Planes containing the z-axis are given by θ = C.

• Spheres about the origin are given by z2 + r2 = C2.

• Right circular cones with apex at the origin and axis on the z-axis aredescribed by z = Cr.

• More generally, the equation z = f(r), r > 0 represents a surface ofrotation about the z-axis.

As before, we define an orthonormal basis of vectors, convenient for describ-ing vector fields with cylindrical symmetry. These are essentially the same asthose for two-dimensional polar coordinates.

er(θ) = cos θi + sin θj.eθ(θ) = − sin θi + cos θj.

ez = k.

Note that, for example, the swirling flow of Figure 12.3 can be written

v(r, θ, z) = v(r cos θ, r sin θ, z) = −reθ(θ)− zez.

Again, this is a singular coordinate system since the plane r = 0 maps tothe line x = y = 0.

13.3 Spherical Coordinates in R3

As the names suggests, spherical coordinates in R3 are designed to take advan-tage of spherical symmetry. The coordinates are given by the distance fromthe point (x, y, z) to the origin, the angle between the projection of the pointonto the xy-plane and the positive x-axis (i.e. the angle θ used in cylindricalcoordinates) and the angle between the vector (x, y, z) and the positive z-axis.Thus,

x = ρ cos θ sinφ,y = ρ sin θ sinφz = ρ cosφ.

Here ρ ≥ 0, θ ∈ [0, 2π), and φ ∈ [0, π]. We also have

ρ =√x2 + y2 + z2,

tan θ =y

x,

tanφ =

√x2 + y2

z.

90

In vector form we use the notation xyz

= ps(ρ, θ, φ) =

x(ρ, θ, φ)y(ρ, θ, φ)z(ρ, θ, φ)

=


.

(See Figure 13.4.)

-

6

y

z

x

ρ

JJJJJJJJJJJ

eθ = − sin θi + cos θj

eρ = cos θ sinφi + sin θ sinφj + cosφk

ps(ρ, θ, φ) = ρeρ(θ, φ)

eφ = cos θ cosφi + sin θ cosφj− sinφk

θ

φ

33

JJJ

Figure 13.4: Spherical coordinates in R3.

Again we define an orthonormal basis of vectors, convenient for describingvector fields with spherical symmetry.

eρ(θ, φ) = cos θ sinφi + sin θ sinφj + cosφk,

eθ(θ) = − sin θi + cos θj,eφ(θ, φ) = cos θ cosφi + sin θ cosφj− sinφk.

Problems

Problem 13.1. Describe the following sets of points in the xy-plane as sets inthe polar coordinate rθ-plane.

(a) S1 = (x, y) ∈ R2 | x2 + y2 ≤ R2, the disk of radius R about the origin.

(b) S2 = (x, y) ∈ R2 | (x − 1)2 + y2 ≤ 1, the disk of radius one about thepoint (1, 0).

91

(c) S3 = (x, y) ∈ R2 | x2 + (y − 1)2 ≤ 1, the disk of radius one about thepoint (0, 1).

Problem 13.2. Describe the following sets of points in R3 as sets of cylindricalcoordinates (r, θ, z).

(a) S1 = (x, y, z) ∈ R3 |√x2 + y2 ≤ z ≤ 1, a right circular cone with point

at the origin and base on the plane z = 1.

(b) S2 = (x, y, z) ∈ R3 | (x − 1)2 + y2 ≤ 1, 0 < z < 3., a right circularcylinder of height three whose base is the disk of radius one about the point(0, 1) in the xy-plane .

(c) S3 = (x, y, z) ∈ R3 |√

3√x2 + y2 ≤ z ≤

√4− x−y2, a volume between

a cone and the sphere about the origin of radius 2.

Problem 13.3. Describe the following sets of points in R3 as sets of sphericalcoordinates (ρ, φ, θ).

(a) S1 = (x, y, z) ∈ R3 |√x2 + y2 ≤ z ≤ 1, a right circular cone with point

at the origin and base on the plane z = 1.

(b) S2 = (x, y, z) ∈ R3 | (x − 1)2 + y2 ≤ 1, 0 < z < 3., a right circularcylinder of height three whose base is the disk of radius one about the point(0, 1) in the xy-plane .

(c) S3 = (x, y, z) ∈ R3 |√

3√x2 + y2 ≤ z ≤

√4− x−y2, a volume between

a cone and the sphere about the origin of radius 2.

92

Part II

Differential Calculus ofSeveral Variables

93

Chapter 14

Introduction to DifferentialCalculus

In this chapter we examine differential calculus of functions of several variables.In order to get our bearings, let us once again consider a “syllabus” for theanalogous topics in a course in single variable calculus.

1. The derivative of a function of a single variable is defined as the limit ofdifference quotients

f ′(x) = limy→x

f(y)− f(x)y − x

.

2. The difference quotients are interpreted as the slope of secant lines of thegraph of f . The derivative is interpreted as the slope of a tangent line.

3. The tangent line is shown to be the line that “best approximates” theoriginal graph.

4. Various rules for differentiation of functions are developed (e.g. the prod-uct rule, quotient rule, and chain rule).

5. Higher order derivatives are defined. The second derivative is interpretedgeometrically in terms of concavity of the graph.

6. Various applications, including max-min problems (finding the maximumand minimum values of functions) are discussed. The first and secondderivative tests for maxima and minima are developed.

7. In preparation for the definition of the natural logarithm and exponential,theorems about the “invertibility” of real-valued functions are developed.

Again, this may not be the best order for a course teaching someone calculusfor the first time. But it is a reasonable way to organize the topics covered. Itwill serve us well as an outline for our attack on the same problems with morecomplicated multivariable functions.

94

As one might suspect, simply defining a derivative in multiple dimensions isperhaps the hardest part of the program. Once again, we will see that if thedomain of our functions is one-dimensional, calculus proceeds much as before.However, when we move to a multidimensional domain there are many possiblegeneralizations of “the derivative.” Indeed, we cover several: partial derivatives,the total derivative matrix, the gradient, the divergence, the curl. Each willhave applications in particular situations.

Other parts of the program proceed much as before with some importantcomplications. Scalars are replaced with vectors and matrices; basic algebra isreplaced with linear algebra.

95

Chapter 15

Derivatives of Functionsfrom R to Rn

Differential calculus is simple for trajectories, which are simply arrays of n real-valued functions of a single variable – the type of function studied in elementarycalculus.

Definition 15.1. We say that a trajectory r : [t0, t1] → Rn is differen-tiable at t ∈ [t0, t1] if

r′(t) = limh→0

1h

(r(t+ h)− r(t))

exists. We refer to v(t) = r′(t) as the velocity of r and σ(t) = ‖v(t)‖ asthe speed. If r is differentiable at each t ∈ [t0, t1] we say r is C1.

Note that a trajectory is differentiable if and only if each of its n componentfunctions is a differentiable real-valued function of a single variable. As we mightexpect, if a trajectory is differentiable, so is any equivalent trajectory.

Lemma 15.2. If a trajectory is differentiable, then any path equivalent trajec-tory is also differentiable.

Proof. Suppose g : I2 → Rn is a differentiable trajectory and r : I1 → Rnis path equivalent. Then there is a monotone increasing, onto, differentiablefunction φ : I1 → I2 such that

r(t) = g(φ(t)).

Thus, the differentiability of r(t) is equivalent to the differentiability of thecomposite function g(φ(t)). However, here we can use the chain rule for real-

96

valued functions1 of a single variable to compute

r′(t) =d

dtg(φ(t)) =

g′1(φ(t))φ′(t)g′2(φ(t))φ′(t)

...g′n(φ(t))φ′(t)

= g′(φ(t))φ′(t).

It is pretty intuitive that we should define the length of a trajectory to bethe integral of the speed along that trajectory. (In the next part of the book wewill revisit this formula and put its adoption on a more rigorous basis.)

Definition 15.3. The arclength of a trajectory r : [t0, t1] → Rn is givenby

L(r) =∫ t1

t0

‖r′(t)‖ dt.

Of course, we would expect that length is a purely geometric concept and doesn’treally have anything to do with the speed at which a particular trajectory wastraversed. Fortunately, the definition above fits our intuition.

Theorem 15.4. Any two path equivalent trajectories have the same ar-clength.

Proof. Suppose, as in the previous proof, we have differentiable equivalent tra-jectories r : I1 → Rn and g : I2 → Rn with φ : I1 → I2 such that

r(t) = g(φ(t)).

We note that‖r′(t)‖ = ‖g′(φ(t))‖φ′(t)

where we have used the fact that φ′ ≥ 0. Calculating arclength, we get

L(r) =∫I1‖r′(t)‖ dt =

∫I1‖g′(φ(t))‖φ′(t) dt =

∫I2‖g′(s)‖ ds = L(g).

1Recall that the Chain Rule for functions of a single variable says the following. Supposef : [a, b]→ R and g : [c, d]→ R satisfy f([a, b]) ⊆ [c, d] so that the composite g(f(·)) : [a, b]→ Ris well defined. Suppose f and g are both differentiable. Then the composite function g(f(t))is differentiable on [a, b] and

d

dtg(f(t)) = g′(f(t))f ′(t).

97

Here we have used the integration by substitution formula2 for functions of asingle variable.

Example 15.5. The trajectory r : [0, 2π]→ R2 defined by

r = (R cos t, R sin t)

is a counterclockwise simple cycle around a circle of radius R > 0. As expected,its arclength is

L(r) =∫ 2π

0

√R2 sin2 t+R2 cos2 t dt =

∫ 2π

0

R dt = 2πR.

Example 15.6. An ellipse satisfying

x2

a2+y2

b2= 1

can be described by the trajectory g : [0, 2π]→ R2 defined by

g = (a cos t, b sin t).

Its arclength is given by the integral

L(g) =∫ 2π

0

√a2 sin2 t+ b2 cos2 t dt.

This can not, in general, be computed in closed form. Values can be obtain interms of tabulated functions called (appropriately enough) elliptic integrals.

The following is an important concept for vector fields, especially those thatrepresent the velocity of a fluid.

Definition 15.7. Let Ω ⊂ Rn be the domain of a vector field f : Ω → Rn.We say that a trajectory p : [a, b] → Ω is a flow line or path line of thevector field f if

p′(t) = f(p(t)).

Thus, a flow line of a vector field is a trajectory whose velocity is given bythat vector field. If we think of the vector field as representing the velocity fieldof a fluid, the flow line represents the path that a molecule of the fluid wouldtraverse with the flow.

Note that the equation defining the flow lines is really a first-order system ofn ordinary differential equations. We can use standard theorems from elemen-tary differential equations (see, e.g. [9]) to show that any smooth vector fieldhas flow lines going through every point.

2Recall that the integration by substitution formula says thatZ b

af(u(t))u′(t) dt =

Z u(b)

u(a)f(s) ds.

98

Theorem 15.8. Let Ω ⊂ Rn be the domain of a C1 vector field f : Ω→ Rn.Then through every x0 ∈ Ω there exists exactly one flow line p satisfying

p′(t) = f(p(t)), p(0) = x0.

Example 15.9. Consider the vector field f(x, y) = (2y,−2x). The flow linesp(t) = (p1(t), p2(t)) are defined by the system of differential equations

p′1 = 2p2,

p′2 = −2p1.

While this might be a nice time to review general methods for solving systems oflinear ordinary differential equations, we will forego that pleasure and observethat by differentiating the first equation we get

p′′1 = 2p′2 = −4p1.

This familiar second-order, linear, constant coefficient, ordinary differential equa-tion has solutions

p1(t) = A cos 2t+B sin 2t.

Where A and B are arbitrary constants. Again from the first equation we get

p2(t) = −A sin 2t+B cos 2t.

A bit of computation yields

p21 + p2

2 = A2 +B2

which implies that our flow lines are circles about the origin. Examination ofthe vector field shows that the flow moves counterclockwise. (See Figure 15.1.)

Problems

Problem 15.1. Find the velocity vector of the trajectory

r(t) =(

cos t| cos t|

), t ∈ [0, π],

and compute it’s arclength.

Problem 15.2. Find the velocity vector of the trajectory

h(t) =

2 cos t2 sin tt3/2

, t ∈ [0, 5],

and compute it’s arclength.

99

-2 -1 1 2x

-2

-1

1

2

y

Figure 15.1: Flow lines and the vector field f(x, y) = (2y,−2x).

Problem 15.3. Find the length of one arc of the cycloid

c(t) =(

r(t− sin t)r(1− cos t)

),

discussed in Problem 10.9. When is the speed of a cycloid at its maximum?When is the speed of a cycloid at its minimum?

Problem 15.4. Parameterize the line segment connecting the two points

(x1, y1, z1) and (x2, y2, z2).

Compute the arclength of this trajectory and show that it agrees with the for-mula for the distance between two points.

Problem 15.5. For any C1 trajectory f whose velocity is never zero we definethe unit tangent vector

t(t) =f ′(t)‖f ′(t)‖

.

Show that t′(t) is orthogonal to t(t). (We say t′(t) is “normal” to the trajectory.)

Problem 15.6. Compute and graph the flow lines of the vector field f(x, y) =(x,−y).

100

Chapter 16

Derivatives of Functionsfrom Rn to R

As we saw in Chapter 9, when we consider functions with a multidimensionaldomain even the idea of a limit gets complicated. This is because we can movein an infinite number of directions rather than the two directions we are confinedto on a line. As we will see, there are a number of useful ways to define someform of “derivative” in this situation. In this chapter we will focus on one ofthe simplest: the partial derivative.

16.1 Partial Derivatives

When defining a partial derivative, we simply avoid all of the problems that canoccur in a multidimensional domain by confining ourselves to lines parallel tothe coordinate axes.

Definition 16.1. Let Ω ⊂ Rn be the domain of f : Ω → R. For anyx ∈ Ω and any i = 1, 2, . . . , n we note that the function of a single variablef : R→ R defined by the composite function

f(t) = f(x + tei)

is well defined at t = 0. If f is differentiable at t = 0, then we say that theith partial derivative of f at x exists. We write

∂f

∂xi(x) =

d

dtf(t)

∣∣∣∣t=0

= limt→0

f(x + tei)− f(x)t

.

If the partial derivative for each i = 1, 2, . . . , n exists for each x ∈ Ω andeach of these derivatives is a continuous function on Ω we say f ∈ C1(Ω).

101

Remark 16.2. In practice, one computes partial derivatives of an explicit func-tion by computing the derivative with respect to the ith variable while treatingthe remaining variables as if they were constants. Since the partial derivativeis defined in terms of the derivative of a function of a single variable, all of therules of differentiation derived in elementary calculus hold. Thus we can uselinearity of the derivative, the product rule, quotient rule, chain rule, and rulesfor taking the derivatives of polynomials, trig functions, and exponentials.

Example 16.3. Let

f(x, y, z) = e−7z cos(x2 + 4y).

Then

∂f

∂x(x, y, z) = −e7z sin(x2 + 4y3)2x,

∂f

∂y(x, y, z) = −e7z sin(x2 + 4y3)12y2,

∂f

∂z(x, y, z) = 7e7z cos(x2 + 4y3).

16.2 Higher Order Partial Derivatives

Definition 16.4. Let Ω ⊂ Rn be the domain of f : Ω→ R. For any x ∈ Ωand any i = 1, 2, . . . , n and any j = 1, 2, . . . , n we define the second-orderpartial derivative of f with respect to i and j to be

∂2f

∂xj∂xi(x) =

∂

∂xj

(∂f

∂xi

)(x).

Derivatives of still higher order are defined in a similar fashion. For instance

∂3f

∂xk∂xj∂xi(x) =

∂

∂xk

(∂

∂xj

(∂f

∂xi

))(x).

If all kth-order partial derivatives exist and are continuous functions on allof Ω we say f ∈ Ck(Ω).

Remark 16.5. There are many alternate notions for partial derivatives. Oneof the most common uses subscripts

fx =∂f

∂x,

gx2 =∂g

∂x2,

102

and so forth. For higher order iterated partial derivatives, the order in whichthe variables are displayedis different in these two notations.

fxy = (fx)y =∂

∂y

(∂f

∂x

)=

∂2f

∂y∂x.

As we will see below, for sufficiently smooth functions the order in which higherorder partials is taken does not matter, so this distinction isn’t worth muchworry.

Example 16.6. As in Example 16.3, let

f(x, y, z) = e−7z cos(x2 + 4y).

Then

∂2f

∂y∂x(x, y, z) = −e7z cos(x2 + 4y3)24xy2,

∂2f

∂z∂y(x, y, z) = −7e7z sin(x2 + 4y3)12y2,

∂2f

∂y∂z(x, y, z) = −7e7z sin(x2 + 4y3)12y2.

Note that in Example 16.6 we had

∂2f

∂z∂y=

∂2f

∂y∂z.

In fact, if a function is in C2, this is always true - second partials can be computedin any order with the same result.

Theorem 16.7. Let Ω ⊂ Rn be the domain of f : Ω → R. If f ∈ C2(Ω)then for any i = 1, 2, . . . , n and any j = 1, 2, . . . , n we have

∂2f

∂xi∂xj=

∂2f

∂xj∂xi.

Proof. Suppose that x is an interior point of Ω. From the definition we have

∂2f

∂xj∂xi(x) = lim

s→0

1s

(∂f

∂xi(x + sej)−

∂f

∂xi(x))

= lims→0

1s

([limt→0

1t(f(x + sej + tei)− f(x + sej))

]−[

limt→0

1t(f(x + tei)− f(x))

])= lim

s→0limt→0

1st

(g(s, t)− g(0, t))

103

where we have defined

g(s, t) = f(x + sej + tei)− f(x + sej).

Now let s and t be fixed with s and t sufficiently small that g(z, t) is well definedfor all z ∈ [0, s]. (Readers should convince themselves that this is always possibleif x is an interior point.) Then the Mean Value Theorem1 applied to s 7→ g(s, t)says there is an s between 0 and s such that

g(s, t)− g(0, t) = (s− 0)∂g

∂s(s, t).

However, using the definition of the partial derivative and the function g, wecan compute

∂g

∂s(s, t) =

∂f

∂xj(x + sej + tei)−

∂f

∂xj(x + sej)

= h(t, s)− h(0, s)

where we have defined

h(t, s) =∂f

∂xj(x + sej + tei).

Again applying the Mean Value Theorem to t 7→ h(t, s) we find there is a tbetween 0 and t such that

h(t, s)− h(0, s) = (t− 0)∂h

∂t(t, s) = t

∂2f

∂xi∂xj(x + sej + tei).

Putting all of this together we get

∂2f

∂xj∂xi(x) = lim

s→0limt→0

1st

(g(s, t)− g(0, t))

= lims→0

limt→0

1st

(st

∂2f

∂xi∂xj(x + sej + tei)

)=

∂2f

∂xi∂xj(x).

To compute the limit we have used the fact that (s, t)→ (0, 0) implies (s, t)→(0, 0) and the fact that all second partials of f are continuous.

The result extends to boundary points in Ω by continuity. That is, sincethe partials are equal in a neighborhood of a boundary point and since they arecontinuous at the boundary point, then they must be equal at the boundarypoint.

1Recall that the Mean Value Theorem for real-valued functions of a single real variablesays that if f : [a, b]→ R is continuous on [a, b] and continuously differentiable on (a, b), thenthere exists c ∈ (a, b) such that

f(b)− f(a) = f ′(c)(b− a).

When applying this to a function of two variables with one variable “frozen” we have to usepartial rather than ordinary derivatives.

104

Remark 16.8. An analogous result holds for derivatives of higher order: Ifthe partial derivatives of a function up to order k are continuous on a domain,then any kth partial derivative of the function can be computed by taking thederivatives in any order with the same result.

16.3 The Chain Rule for Partial Derivatives

One of the most important differentiation rules from the calculus of single vari-ables is the “chain rule” which describes the differentiation of composite func-tions. A (not very general) version of the theorem is as follows.

Theorem 16.9. Suppose we have two C1 functions of a single variabley : R→ R and x : R→ R, and suppose we define a new function z : R→ Rby composition

z(t) = y(x(t)), t ∈ R.

Then z is C1, and the derivative of z is given by

z′(t) = y′(x(t))x′(t).

An even more suggestive formula is given by the notation

dz

dt=dy

dx

dx

dt.

The significance of the name “chain rule” is that there is a “chain” of de-pendence from t to x to y,

t 7→ x(t) 7→ y(x(t)),

and that each “link” of the chain contributes a factor to the derivative of thecomposition. Of course, when we are dealing with functions of several variableswe can have many more complicated compositions to deal with. At this time,we look at a series of examples of composite functions and describe a method fordescribing their partial derivatives via the chain rule. Since there are so manypossibilities, we put off any general theorem until the section on composition ofmappings below. (As we shall see, the more general form of the chain rule israther hard to apply in practice. The algorithms we apply here are much easierto use for computation.)

Example 16.10. For our first example, suppose we have a function

e(x, y, z, t)

depending on three space variables (x, y, z) and time t. Suppose the functionp(t) = (x(t), y(t), z(t)) describes the trajectory of a particle in space. Thecomposite function

e(t) = e(x(t), y(t), z(t), t)

105

tracks the value of the function e along the trajectory of the particle. How dowe compute its derivative? This time, there is not a single “chain” connectingt to e; there are four of them. (Note that t appears in four different placesin the definition of e.) We represent the chains of dependence graphically inFigure 16.1. We construct the derivative of e with respect to t as follows.

*

HHHHHjJJJJJJJ

HHHHHj

*

t

z

y

x

e^

Figure 16.1: Chain of dependence diagram for e(x(t), y(t), z(t), t)).

• Each chain or path from t to e contributes a term to the total derivative.(A path must always flow in the direction of the arrows of dependence.)

• Each link in an individual chain contributes a factor to its correspondingterm, much like the links in the chain for a function of one variable.

With this algorithm in mind, we compute

de

dt=∂e

∂x

dx

dt+∂e

∂y

dy

dt+∂e

∂z

dz

dt+∂e

∂t.

The notation we have used here is both suggestive and easy to read. Note howcumbersome the type of notation we used for a function of a single variablewould be.

de

dt(t) =

∂e

∂x(x(t), y(t), z(t), t)

dx

dt(t) +

∂e

∂y(x(t), y(t), z(t), t)

dy

dt(t)

+∂e

∂z(x(t), y(t), z(t), t)

dz

dt(t) +

∂e

∂t(x(t), y(t), z(t), t).

While this notation is somewhat messy, it can clarify things like the specific rolethat the function x(t) in determining e′.

106

-

-

@

@@@@R

x

y

e

b

w

HHHHHj

*

^

Figure 16.2: Chain of dependence diagram for w(x, y, e(x, y), b(x, y))).

Example 16.11. Our second example concerns a function w(x, y, e, b) of fourvariables composed with two functions of two variables e(x, y) and b(x, y). Thechain of dependence diagram for this composition is given in Figure 16.2. Thereare three paths from x to w yielding the partial derivative

∂w

∂x=∂w

∂x+∂w

∂b

∂b

∂x+∂w

∂e

∂e

∂x.

Similarly, there are three paths from y to w yielding

∂w

∂y=∂w

∂y+∂w

∂b

∂b

∂y+∂w

∂e

∂e

∂y.

A worrisome ambiguity in our notation should jump out at you. In each ofthese equation, we have used a symbol in two different ways. For instance inthe last equation we have the following.

• The symbol ∂w∂y refers to the derivative of the composite function (which

depends on the two variable x and y) when it appears on the left side ofthe equation.

• The symbol ∂w∂y refers to the derivative of the function of four variables

w(x, y, e, b) when it appears on the right side of the equation.

In fact, the ambiguity is simply the result of laziness (or, more generously,“efficiency.”) If we had strictly followed the rules for functional notation2, we

2My thesis advisor, Stuart Antman, had a cute trick to help drive this point home. Hewould define a function

f(x, y) = x2 + y2.

and then define

x = r cos θ

y = r sin θ.

He would then ask, “What is f(r, θ)?” Of course, many students (even when warned that the

107

-

-

@

@@@@R ?

x

t

e

b

u

HHHHHj

*

^

Figure 16.3: Chain of dependence diagram for u(x, t, e(x, t), b(x, t, e(x, t))).

would have given the composite function a new name to distinguish it from theoriginal, e.g.

w(x, y) = w(x, y, e(x, y), b(x, y)).

However, the abuse of notation above is very common, and it is important fora reader to be able to interpret symbols like this and determine the meaningfrom the context.

Example 16.12. For our third example, we will consider a function u(x, t, e, b)composed with b(x, t, e) and e(x, t). The chain of dependence diagram for thiscomposition is given in Figure 16.3. Note that since there are two path from xto b (one direct, one through e) there are a total four paths from x to u. Thisgives us the partial derivative

∂u

∂x=∂u

∂x+∂u

∂e

∂e

∂x+∂u

∂b

∂b

∂x+∂u

∂b

∂b

∂e

∂e

∂x.

Once again, the notation is ambiguous because we are using the same symbolto represent the original function and the composite. The reader is invited tocompute ∂u

∂t

Example 16.13. We can use the chain rule to transform partial differentialequations. For instance, suppose u(x, y) satisfies

x2uxx + y2uyy + xux + yuy = 0.

question was a trap) would answer f(r, θ) = r2. The correct answer is

f(r, θ) = r2 + θ2,

and in factf(♣, †) = ♣2 + †2.

The function f is a rule that tells you to take square of the number in the first slot and addthat to the square of the number in the second slot. The equations describing x and y interms of r and θ are irrelevant.

108

If we make the “change of variables”3 x = es and y = et and define

w(s, t) = u(es, et)

then w satisfies∂2w

∂s2+∂2w

∂t2.

To see this we calculate∂w

∂s= es

∂u

∂x(es, et)

and∂2w

∂s2= es

∂u

∂x(es, et) + (es)2 ∂

2u

∂2x(es, et)

which is just xux + x2uxx evaluated at (x, y) = (es, et). A similar calculationfor wtt completes the derivation.

Problems

Problem 16.1. For f(x, y) = x3y2 − 5xy + 3x calculate the following.

(a)∂f

∂x(x, y).

(b)∂f

∂y(x, y).

(c)∂f

∂x(3, 5).

(d)∂f

∂y(s, t).

(e)∂f

∂x(y, x).

Problem 16.2. For f(x, y) = xy ln(x2 + y2) calculate the following.

(a)∂f

∂x(x, y).

(b)∂f

∂y(x, y).

(c)∂f

∂x(1, 2).

(d)∂f

∂y(u, v).

(e)∂f

∂x(y,−y).

3There in much more on this subject to come in the next chapter.

109

Problem 16.3. For f(x, y, z) = eyz − sin(xy) calculate the following.

(a)∂f

∂x(x, y, z).

(b)∂f

∂y(x, y, z).

(c)∂f

∂z(x, y, z).

(d)∂f

∂y(r, s, t).

(e)∂f

∂x(2, 3,−1).

Problem 16.4. For f(x, y) = x2y5 − 4x6y9 calculate the following.

(a)∂2f

∂x∂y(x, y).

(b)∂2f

∂y∂x(x, y).

(c)∂2f

∂x2(x, y).

(d)∂2f

∂y2(x, y).

(e)∂2f

∂x2(u, v).

Problem 16.5. For f(x, y, z) = (x2 + y2 + z2)α/2 where α ∈ R calculate thefollowing.

(a)∂2f

∂x∂y(x, y, y).

(b)∂2f

∂y∂z(x, y, y).

(c)∂2f

∂x2(x, y, z).

(d)∂2f

∂y2(x, y, z).

(e)∂2f

∂z2(x, y, z).

Problem 16.6. Draw a chain of dependence diagram and compute a formulafor

∂

∂xg(x, y(x, t), z(x, t))

and∂

∂tg(x, y(x, t), z(x, t)).

110


d

dxw(x, y(x))

andd2

dx2w(x, y(x)).


∂

∂xh(x, u(x, y), v(y, z)),

∂

∂yh(x, u(x, y), v(y, z)),

and∂

∂zh(x, u(x, y), v(y, z)).

Problem 16.9. Suppose u(x, t) = f(x+ ct) + g(x− ct). Show that u satisfies

∂2u

∂t2= c2

∂2u

∂x2

Problem 16.10. Show that if v(x, y) = f(ax+ by) then v satisfies

b∂v

∂x− a∂v

∂y= 0.

Problem 16.11. Suppose u(x, t) satisfies

∂2u

∂t2=∂2u

∂x2.

Define w(y, z) = u(y − z, y + z). Show that

∂2w

∂y∂z= 0.

Problem 16.12. We say that a function y(x) is implicitly defined by theequation

f(x, y) = 0

if there exists a function y(x) such that

f(x, y(x)) = 0

for all x in some domain. Show that in this case

dy

dx= −

∂f∂x∂f∂y

provided ∂f∂y 6= 0 and f and y are sufficiently smooth.

111

Problem 16.13. Suppose that two functions y(x) and z(x) are implicitly de-fined by the system of two equations

f(x, y, z) = 0,g(x, y, z) = 0.

Derive formulas for the derivatives of y and z.

Problem 16.14. Suppose that the equation

f(x, y, z) = 0

1. implicitly defines a function x(y, z) such that f(x(y, z), y, z) = 0,

2. implicitly defines a function y(x, z) such that f(x, y(x, z), z) = 0, and

3. implicitly defines a function z(x, y) such that f(x, y, z(x, y)) = 0.

Show that∂z

∂y

∂y

∂x

∂x

∂z= −1.

(This is a common identity in thermodynamics, though the hypotheses are rarelystated as explicitly as they are here.)

112

Chapter 17

Derivatives of Functionsfrom Rn to Rm

17.1 Partial Derivatives

In this chapter we consider the derivatives of vector-valued functions f : Rn →Rm. It is easy to define the partial derivatives of these functions with a slightgeneralization of the definition for scalar-valued functions combined with ourdefinition of the derivative of a trajectory.

Definition 17.1. Let Ω ⊂ Rn be the domain of f : Ω→ Rm. For any x ∈ Ωand any i = 1, 2, . . . , n we note that the trajectory f : R → Rm defined bythe composite function

f(t) = f(x + tei)

is well defined at t = 0. If f is differentiable at t = 0, then we say that theith partial derivative of f at x exists. We write

∂f∂xi

(x) =d

dtf(t)

∣∣∣∣t=0

= limt→0

f(x + tei)− f(x)t

.

If the partial derivative for each i = 1, 2, . . . , n exists for each x ∈ Rn andeach of these derivatives is a continuous function on Ω we say f ∈ C1(Ω).

Remark 17.2. We can see from the definition that we compute the partialderivatives of the vector-valued function f = (f1, f2, . . . , fm) by simply takingthe partial derivatives of each of the scalar-valued component functions individ-ually.

∂f∂xi

=(∂f1

∂xi,∂f2

∂xi, . . . ,

∂fm∂xi

).

113

17.2 The Total Derivative Matrix

The following form of the derivative of a vector-valued function is more directlyanalogous to the derivative of a scalar function.

Definition 17.3. Let Ω ⊂ Rn be the domain of the vector-valued functionf : Ω → Rm. We say that f is differentiable at x0 ∈ Ω if there exists anm× n matrix A such that the linear approximation to f at x0 given by

lf (x0; x) = A(x− x0) + f(x0)

satisfies

limx→x0

f(x)− lf (x0; x)‖x− x0‖

= 0.

In this case we call A the total derivative matrix and write

Df(x0) = A.

We now prove a theorem that says the following.

1. The total derivative matrix is unique.

2. The function lf (x0; x) is the best linear approximation1 to f at x0.

Theorem 17.4. Suppose f is differentiable at x0 and B is any m×n matrixother than Df(x0) = A. Then if we let

l(x) = B(x− x0) + f(x0)

we have

limx→x0

f(x)− l(x)‖x− x0‖

6= 0.

Proof. We note that

f(x)− l(x)‖x− x0‖

=f(x)− lf (x0; x)‖x− x0‖

+(B −A)(x− x0)‖x− x0‖

.

Since the first term goes to zero in the limit, if we can show that

limx→x0

(B −A)(x− x0)‖x− x0‖

6= 0

1It is more accurate to call functions of the form Ax + b “affine” and reserve the term“linear” for functions of the form Ax. However it doesn’t cause too much confusion to put upwith this common abuse of terminology, and it relieves us of the duty to remind people thata function describing a straight line isn’t, in general, linear.

114

we are done. But since B − A is not the zero matrix there is some unit vectore such that (B − A)e 6= 0. Thus, if we let x(t) approach x0 along the linex(t) = te + x0 we get

(B −A)(x(t)− x0)‖x(t)− x0‖

=t(B −A)e|t|‖e‖

= ±(B −A)e 6= 0.

We now prove a theorem that tells us how to compute the total derivativematrix. It is simply the “obvious” matrix of first partial derivatives. (At least,it is obvious in the sense that it is the matrix of first partial derivatives of havingthe correct dimensions for the m× n matrix A.)

Theorem 17.5. Let Ω ⊂ Rn be the domain of the vector-valued functionf : Ω→ Rm. If f is differentiable at x0 ∈ Ω then the partial derivatives of fall exist at x0 and

Df(x0) =

∂f1∂x1

(x0) ∂f1∂x2

(x0) · · · ∂f1∂xn

(x0)∂f2∂x1

(x0) ∂f2∂x2

(x0) · · · ∂f2∂xn

(x0)...

.... . .

...∂fm∂x1

(x0) ∂fm∂x2

(x0) · · · ∂fm∂xn

(x0)

Proof. Recall that multiplying an m×n matrix A by the standard basis vectorei ∈ Rn yields the ith column of A, Aei. Using the definition of differentiabilitywith A = Df(x0) and x = x0 + tei we see that

0 = limt0

f(x0 + tei)− tAei − f(x0)‖tei‖

= limt0

1t(f(x0 + tei)− f(x0))−Aei

=∂f∂xi

(x0)−Aei.

Since taking the limit through negative values of t yields the same result, thisshows that the ith partial derivative of f exists and is equal to the ith columnof A.

Example 17.6. Consider the function f : R2 → R3 defined by

f(x, y) =

x2 + y2

sinxye3x−5y

.

115

We compute

Df(x, y) =

2x 2yy cosxy x cosxy3e3x−5y −5e3x−5y

.

At the point x0 = (1, 0) we compute the linear approximation

lf ((1, 0); (x, y)) =

2 00 1

3e3 −5e3

( x− 1y − 0

)+

10e3

=

2x− 1y

e3(3x− 5y − 2)

.

Example 17.7. Consider the function u : R4 → R2 defined by

u(x1, x2, x3, x4) =(x3

1 + x23

x42 + x7

4

).

We compute

Du(x1, x2, x3, x4) =(

3x21 0 2x3 0

0 4x32 0 7x6

4

).

At the point x0 = (1,−1,−1, 1) we compute the linear approximation

lu((1,−1,−1, 1); (x1, x2, x3, x4))

=(

3 0 −2 00 −4 0 7

)x1 − 1x2 + 1x3 + 1x4 − 1

+(

22

)

=(

3x1 − 2x3 − 3−4x2 + 7x4 − 11

).

Example 17.8. For functions f : R2 → R note that the graph of the linearapproximation

z = lf ((x0, y0), (x, y))

= f(x0, y0) + (fx(x0, y0), fy(x0, y0))(x− x0

y − y0

)= f(x0, y0) + fx(x0, y0)(x− x0) + fy(x0, y0)(y − y0)

is a plane in R3. We refer to this as the equation of the tangent plane to thegraph of f(x, y) at the point (x0, y0).2

2The reader should verify that the tangent plane to the graph of f contains all tangentlines to sections of the graph. For instance, the tangent line to

x 7→ f(x, y0)

at x = x0 is f(x0, y0) + fx(x0, y0)(x− x0).

116

For example, the tangent plane to the graph of the paraboloid

f(x, y) = 3x2 + 6y2

at the point (1,−1) is given by

z = 9 + 6(x− 1)− 12(y + 1).

The following theorem should be a familiar generalization of one from thecalculus of a single variable (and in fact, its proof is virtually identical).

Theorem 17.9. Let Ω ⊂ Rn be the domain of the vector-valued functionf : Ω→ Rm. If f is differentiable at x0 ∈ Ω then f is continuous at x0.

Proof. Suppose f is differentiable at x0 ∈ Ω and let lf (x0; x) = A(x−x0)+f(x0)be its linear approximation there, Then

f(x)− f(x0) = [f(x)− lf (x0; x)] + [lf (x0; x)− f(x0)]= [f(x)− lf (x0; x)] + [A(x− x0)].

The first term goes to zero as x→ x0 since f is differentiable. The second goesto zero by Theorem 9.9 since A is a fixed matrix.

Theorem 17.5 tells us that if a function is differentiable, then its partialderivatives exists. But it is usually easier to check the existence of partialderivatives than it is to prove differentiability. Fortunately, the following theo-rem (which we state without proof) says that if the partial derivatives exist andare continuous in a neighborhood of a point x0 then the function is differentiablethere.

Theorem 17.10. Let Ω ⊂ Rn be the domain of the vector-valued functionf : Ω→ Rm. If the partial derivatives of f exist and are continuous in someopen ball Bε(x0) ⊂ Ω of radius ε > 0 around x0 ∈ Ω then f is differentiableat x0.

Example 17.11. Note the precise relationship between Theorem 17.5 and The-orem 17.10.

• If a function is differentiable at x0 then the partial derivatives exist atx0.

• If the partial derivatives of a function exist and are continuous in a neigh-borhood of x0 then the function is differentiable at x0.

117

So is it possible for the partial derivatives to exist (without being continuous)yet have the function not be differentiable? The answer is “yes.” Since thepartial derivatives only tell us what is happening along two lines, it is prettyeasy to construct a function that is well behaved along those lines and a messelsewhere. Consider the following version of the “spiral staircase” function

f(x, y) = xy

x2+y2 , (x, y) 6= (0, 0)0, (x, y) = 0.

This function is the constant 0 along the coordinate axes x = 0 and y = 0. Thus,it’s partial derivatives exist at the origin. However, the function is discontinuousat the origin. (It’s limit along the line x = y is 1

2 . Its limit along the line x = −yis− 1

2 .) Thus, by Theorem 17.9. the function can’t be differentiable at the origin.

17.3 The Chain Rule for Mappings

In this section we state and prove a rather general version of the chain rule.

Theorem 17.12. Let Ω ⊂ Rm and Υ ⊂ Rn be domains of the functionsf : Ω → Rn and g : Υ → Rk respectively. Suppose that the range of f is asubset of Υ, and let g : Ω→ Rk be the composite function

g(x) = g(f(x)).

If f is differentiable at x0 ∈ Ω and g is differentiable at y0 = f(x0) ∈ Υthen the composite function g is differentiable at x0 and

Dg(x0) = Dg(f(x0))Df(x0).

Remark 17.13. Note that all of the dimensions of the matrices work out aswe would wish. Since g maps Rm to Rk, Dg(x0) is a k ×m matrix. SimilarlyDg(f(x0)) is k×n and Df(x0) is n× k. When multiplied in the order specifiedtheir product is well defined and they yield a matrix of the correct dimensionsfor the derivative of the composite function.

Our proof of this theorem will depend on the following lemma.

Lemma 17.14. If f : Rm → Rn is differentiable at x0 then

‖f(x)− f(x0)‖‖x− x0‖

is bounded in a neighborhood of x0.

Proof. We first note that

f(x)− f(x0)‖x− x0‖

=f(x)−Df(x0)(x− x0)− f(x0)

‖x− x0‖+Df(x0)(x− x0)‖x− x0‖

.

118

The first term goes to zero by the definition of differentiability. Thus, we needonly show that the norm of the second term is bounded. We note that the secondterm has the form Ae where A is a fixed matrix and e is a unit vector which mayvary. The set of all unit vectors (a sphere in Rn) is closed and bounded. In thechapter on max/min problem below, we show that any continuous real-valuedfunction (such as f(e) = ‖Ae‖) must have a finite maximum value on such aset.

Proof of Theorem 17.12. We first note that

g(x)− g(x0)−Dg(f(x0))Df(x0)(x− x0)‖x− x0‖

=g(x)− g(x0)−Dg(f(x0))(f(x)− f(x0))

‖x− x0‖

+Dg(f(x0))f(x)− f(x0)−Df(x0)(x− x0)

‖x− x0‖.

Our goal is to show that this goes to zero as x → x0. The second term goesto zero by Theorem 9.9 and the differentiability of f . The first term is moreinteresting. We proceed “formally” with a flawed argument and then discusshow the flaw can be fixed. We write the first term in the form

g(f(x))− g(f(x0))−Dg(f(x0))(f(x)− f(x0))‖f(x)− f(x0)‖

‖f(x)− f(x0)‖‖x− x0‖

.

We note that the second factor in this term

‖f(x)− f(x0)‖‖x− x0‖

is bounded by Lemma 17.14 since f is differentiable at x0. We then argue thatthe first factor

g(f(x))− g(f(x0))−Dg(f(x0))(f(x)− f(x0))‖f(x)− f(x0)‖

goes to zero as x → x0 since f(x) → f(x0) by continuity and since g is differ-entiable at f(x0). Since one factor is bounded and the other goes to zero, theproduct goes to zero by Theorem 9.10, and the proof is complete.

All right, what is wrong with that argument? Well, the problem is that thefirst factor is undefined at any point x ∈ Ω at which f(x) = f(x0). Thus, wecan’t (in general) take its limit as x → x0 since we might not be allowed toconsider every point in a ball around x0. However, we can fix this. We notethat since g is differentiable at f(x0) ∈ Υ, the function

h(y) =

g(y)−g(f(x0))−Dg(f(x0))(f(y−f(x0))

‖y−f(x0)‖ , y 6= f(x0)0, y = f(x0).

119

is well defined and continuous for all y ∈ Υ. Thus, the function

x 7→ h(f(x))

is well defined on all of Ω since it is the composition of two continuous functions.We can replace the first factor with this since the two quantities are equal atevery point at which the first factor is well defined. (What we have really donehere is show that the first factor can be “extended” continuously to all of Ω.)

Example 17.15. Suppose we define f : R2 → R3 by

f(x, y) =

xy2x3y

and g : R3 → R2 by

g(u, v, w) =(

u+ 3v2v − vw

).

Then we can compute

Df(x, y) =

y x2 00 3

and

Dg(u, v, w) =(

1 3 00 2− w −v

).

Evaluating Dg at (u, v, w) = f(x, y) yields

Dg(f(x, y)) = Dg(xy, 2x, 3y) =(

1 3 00 2− 3y −2x

).

Now consider the composition g : R2 → R2 defined by

g(x, y) = g(f(x, y)) = g(xy, 2x, 3y) =(

xy + 6x4x− 6xy

).

Computing the derivative of the composition directly yields

Dg(x, y) =(

y + 6 x4− 6y −6x

).

Of course, as the chain rule assures us, this is the same as the matrix product

Dg(f(x, y))Df(x, y) =(

1 3 00 2− 3y −2x

) y x2 00 3

.

120

Example 17.16. In Example 16.10 we stated without proof that if

e(t) = e(x(t), y(t), z(t), t),

thende

dt=∂e

∂x

dx

dt+∂e

∂y

dy

dt+∂e

∂z

dz

dt+∂e

∂t.

To see that this can be obtained from the chain rule for mappings above, we letg : R→ R4 be defined by

g(t) = (x(t), y(t), z(t), t)

so thate(t) = e(g(t)).

Then the chain rule for mappings gives us

d

dte(t) = De(g(t))Dg(t) =

(∂e∂x

∂e∂y

∂e∂z

∂e∂t

)x′

y′

z′

1

which agrees with the result we got from the dependence diagrams.

Problems

Problem 17.1. Compute ∂f∂x and ∂f

∂y for the following functions.

(a) f(x, y) = (x2 + y4, cos(x2 + y2), ex−y).(b) f(x, y) = (sin(e2y), ln(x2 + y4)).(c) f(x, y) = (3x− 5y, 7x+ 4y, x, y).

Problem 17.2. Compute the total derivative matrices of the following func-tions.(a) f(x, y) = (x2 + y4, cos(π(x2 + y2)), ex−y).(b) f(x, y, z) = (sin(2π(x6 + e2y)), ln(x2 + y2 + z2)).(c) f(x, y, z, u, v) = (3x− 5y − 7u+ 8v, 7x+ 4y + u− v, x+ 5y − 7u+ 6v, 2x−y + 3u− 9v).

Problem 17.3. Compute the linear approximation lf (x0,x) of the functionsbelow at the indicated point x0.(a) f(x, y) = (x2 + y4, cos(π(x2 + y2)), ex−y), (x0, y0) = (2, 2),(b) f(x, y, z) = (sin((2π(x6 + e2y)), ln(x2 + y+z2)), (x0, y0, z0) = (2,−1, 3).(c) f(x, y, z, u, v) = (3x− 5y − 7u+ 8v, 7x+ 4y + u− v, x+ 5y − 7u+ 6v, 2x−y + 3u− 9v), (x0, y0, z0, u0, v0) = (4,−1, 2, 5, 0).

121

Problem 17.4. Let

g(u, v) = (u− v, euv, cos(u)− sin(v))

andf(x, y) = (ex−y, x2 − y2).

Compute Dg(f(x, y)) two ways. First, calculate the derivative matrix as aproduct using the chain rule for mappings, Second, compute the compositionand take the derivative of this mapping directly. Show these give the sameresult.

Problem 17.5. Let

g(u, v, w) = (u− 2v + w, 3u+ 2v + 5w, 4u− 7v − w, 6u− 5v − 9w)

andf(x, y) = (2x− 4y, 3x+ 5y,−x+ y).

Compute Dg(f(x, y)) two ways. First, calculate the derivative matrix as aproduct using the chain rule for mappings, Second, compute the compositionand take the derivative of this mapping directly. Show these give the sameresult.

Problem 17.6. Use techniques similar to Example 17.16 to show that theresults of Example 16.11 obtained by chain of dependence diagrams can also beobtained by the chain rule for mappings.

Problem 17.7. State a chain rule result for compositions of the form

h(g(f(x))).

Use this to show that the results of Example 16.12 can be obtained by the chainrule for mappings.

Problem 17.8. Suppose a function f(x, y, z) is composed with the sphericalcoordinate transformation

x = ρ cos θ sinφ,y = ρ sin θ sinφ,z = ρ cosφ.

Compute ∂f∂ρ , ∂f

∂θ , ∂f∂φ , ∂2f

∂θ2 , ∂2f∂ρ∂θ , and ∂2f

∂ρ∂φ . You may use either the chain rulefor mappings or chain of dependence diagrams, whichever you find easier.

Problem 17.9. Suppose Ωf and Ωg are subsets of Rn f : Ωf → Ωg and g :Ωg → Ωf are inverses. That is

f(g(x)) = x,

for all x ∈ Ωg. Show that the total derivative matrices of the two functions areinverses of each other. That is

Df−1(g(x) = Dg(x).

122

Chapter 18

Gradient, Divergence, andCurl

In this chapter we discuss three first-order differential operators. That is,three operations that take first-order partial derivatives of certain types of func-tions and produce new functions. The total derivative is an example of such anoperator. It takes a function from Rn to Rm and produces a function from Rnto the m× n matrices. The three operators we study here are defined on morespecialized classes of functions.

18.1 The Gradient

The gradient is a vector-valued operator defined on scalar fields.

Definition 18.1. Let Ω ⊂ Rn be the domain of a real-valued functionsf : Ω → R. If f is differentiable we define the gradient of f to be thevector field ∇f : Ω→ Rn defined by

∇f(x) =

∂f∂x1

(x)∂f∂x2

(x)...

∂f∂xn

(x)

=n∑i=1

∂f

∂xi(x)ei.

The notation grad f = ∇f is also common.

Remark 18.2. Since the gradient is a vector it can be written as either a rowor a column unless it is used in conjunction with matrix multiplication. In thatcase it is assumed to be a column or an n × 1 matrix. Note the relationship

123

between the gradient and the total derivative, the 1× n (row) matrix

Df(x) =(∂f

∂x1(x),

∂f

∂x2(x), . . . ,

∂f

∂xn(x)).

We can think of the gradient as the transpose of the total derivative

∇f = DfT ,

and we can replace matrix multiplication by the total derivative with the dotproduct with the gradient

Df(x)v = ∇f(x) · v.

Example 18.3. For x = (x, y, z) 6= 0 we define

f(x) =1‖x‖

=1√

x2 + y2 + z2= (x2 + y2 + z2)−

12 .

We compute

∇f(x, y, z) =

−x(x2 + y2 + z2)−32

−y(x2 + y2 + z2)−32

−z(x2 + y2 + z2)−32

=1‖x‖3

−x−y−z

.

The gradient can be used to define a generalization of the partial derivativecalled the directional derivative.

Definition 18.4. Let Ω ⊂ Rn be the domain of a real-valued functionsf : Ω → R, and let v ∈ Rn be a unit vector. If f is differentiable we definethe directional derivative of f at x ∈ Ω in the direction v to be

Dvf(x) =d

dtf(x + tv)

∣∣∣∣t0

= limt→0

f(x + tv)− f(x)t

.

The following theorem gives us an easy way to calculate directional deriva-tives.

Theorem 18.5. Let Ω ⊂ Rn be the domain of a real-valued functions f :Ω→ R, and let v ∈ Rn be a unit vector. If f is differentiable then

Dvf(x) = ∇f(x) · v.

124

Proof. For x ∈ Ω and any unit vector v ∈ Rn define g : R→ Rn by

g(t) = x + tv.

Note that Dg = v, g(0) = x, and that f(x + tv) = f(g(t)). Thus, using thechain rule for mappings and the relationship between the total derivative andthe gradient, we can compute

Dvf(x) =d

dtf(g(t))

∣∣∣∣t=0

= Df(g(t))Dg(t)|t=0

= Df(x)v= ∇f(x) · v.

Example 18.6. Note that when v is one of the standard basis vectors ei weget

Deif(x) =∂f

∂xi(x).

Thus, partial derivatives are special cases of the directional derivative.

The following theorem gives us some geometric information about the gra-dient.

Theorem 18.7. Suppose f : Ω → R is a differentiable function and∇f(x) 6= 0. Then the directional derivative is maximized when v pointsin the direction of ∇f(x) and is minimized when v points in the directionof −∇f(x). That is, ∇f(x) points in the direction of steepest increase of fwhile −∇f(x) points in the direction of steepest decrease.

Proof. Using the fact that v is a unit vector, we get

Dvf(x) = ∇f(x) · v = cos θ‖∇f(x)‖‖v‖ = cos θ‖∇f(x)‖

where θ is the angle between ∇f(x) and v. Thus Dvf(x) depends on v onlythrough the angle θ. Thus, Dvf(x) is maximized when the cosine is maximized(θ = 0, v in the direction of∇f(x)) and minimized when the cosine is minimized(θ = π, v in the direction of −∇f(x)).

The next theorem describes the relationship between the gradient of a func-tion and the level sets of that function.

Theorem 18.8. Suppose f : Ω → R is differentiable. Then ∇f(x0) isnormal to the level surface of f at x0 ∈ Ω. That is, suppose f(x0) = c, andg(t) is a curve that lies entirely in the level set f(x) = c. If g(t0) = x0 then∇f(x0) is orthogonal to the tangent vector g′(t0).

125

Proof. Suppose f(g(t)) = c and g(t0) = x0. Since the composition is constant,its derivative is zero. Thus, using the chain rule we get

0 =d

dtf(g(t))

∣∣∣∣t=t0

= Df(g(t))Dg(t)|t=0

= Df(x0)g′(t0)= ∇f(x) · g′(t0).

Example 18.9. To find the equation for the tangent plane to the sphere

x2 + y2 + z2 = 14

at the point x0 = (x0, y0, z0) = (1, 2, 3) we calculate the gradient of f(x, y, z) =x2 + y2 + z2

∇f = (2x, 2y, 2z).We evaluate this at the point (1, 2, 3) to get the normal vector n = (2, 4, 6), anduse this to derive the equation for the tangent plane

0 = n · (x− x0) =

246

· x− 1

y − 2z − 3

= 2x+ 4y + 6z − 28,

or 2x+ 4y + 6z = 28.

We can use the gradient to give a version of the Mean Value Theorem forscalar functions on Rn.

Theorem 18.10. Let Ω ⊂ Rn contain the entire line connecting x1 ∈ Ω tox2 ∈ Ω, and suppose f : Ω → R is C1. Then there is a point x ∈ Ω on theline segment between x1 and x2 such that

f(x2)− f(x1) = ∇f(x) · (x2 − x1).

Proof. We define a real valued function of a single variable by

g(t) = f(tx2 + (1− t)x1), t ∈ [0, 1].

We note that this function is C1 and therefore the mean value theorem for realvalued functions of a single variable says there exists t ∈ (0, 1) such that

g(1)− g(0) = g′(t)(1− 0).

Note that g(1) = f(x2) and g(0) = f(x1). The chain rule gives us

g′(t) = ∇f(tx2 + (1− t)x1) · (x2 − x1).

So if we letx = tx2 + (1− t)x1

this gives us the desired result.

126

18.2 The Divergence

In the next two sections we describe two first-order differential operators on vec-tor fields. We will not be able to give a rigorous geometric interpretation of themuntil we have derived the theorems of Part IV. However, we can see how theywork in various situations, and we will describe their geometric interpretationsand do some calculations that make the interpretations plausible.

We begin our discussion with an operator called the divergence.

Definition 18.11. Let Ω ∈ Rn be the domain of a C1 vector field f : Ω→Rn. We define the divergence of f at x ∈ Ω to be scalar

div f(x) =∂f1

∂x1(x) +

∂f2

∂x2(x) + · · ·+ ∂fn

∂xn(x) =

n∑i=1

∂

∂xi(f(x) · ei).

The divergence measures the tendency of a vector field to “diverge,” “ex-pand,” or “flow away from” a point. The divergence will be positive if the pointacts as a “source” of the vector field, negative if it acts as a “sink.” While wewill not attempt to justify this rigorously until Part IV, the following examplesshould at least make the statement plausible.

Example 18.12. Consider the field

g(x, y) =

(− x√

x2 + y2,− y√

x2 + y2

)

which flows toward the origin as depicted in Figure 12.2. We compute

div g =∂

∂x

(−x(x2 + y2)−1/2

)+

∂

∂y

(−y(x2 + y2)−1/2

)= −(x2 + y2)−1/2 + x2(x2 + y2)−3/2 − (x2 + y2)−1/2 + y2(x2 + y2)−3/2

= −(x2 + y2)−1/2.

This is negative and becomes unbounded at the origin (as does the vector field).

Example 18.13. The two-dimensional vector field depicted in Figure 12.1

f(x, y) = (−y, x)

represents a counterclockwise flow about the origin. We compute

div f =∂

∂x(−y) +

∂

∂y(x) = 0.

The divergence is zero indicating no tendency of the vector field to expand orcontract.

127

Example 18.14. As we have seen before in Figure 12.3, the three-dimensionalvector field

v(x, y, z) = (y,−x,−z)

swirls about the z-axis while flowing toward the xy-plane. We compute

div v =∂

∂x(y) +

∂

∂y(−x) +

∂

∂z(−z) = −1.

Note that the third component (which causes the field to flow toward the xy-plane) contributes a nonzero value to the divergence. The other components(which contribute to the swirling nature of the flow) have zero contribution.

Remark 18.15. There is an another, very nice notation for the divergenceusing the “del operator.” We use the symbol

∇ =

∂∂x1∂∂x2...∂∂xn

to denote a vector of partial derivative operators. We have already used thissymbol for the gradient and we can see how the notation mimics scalar multi-plication1.

∇g =

∂∂x1∂∂x2...∂∂xn

g =

∂g∂x1∂g∂x2...∂g∂xn

.

For the divergence, we can use the dot product.

div f(x) = ∇ · f

=

∂∂x1∂∂x2...∂∂xn

·

f1

f2

...fn

=

∂f1

∂x1(x) +

∂f2

∂x2(x) + · · ·+ ∂fn

∂xn(x).

18.3 The Curl

We now define the curl operator, which is defined only for vector fields on R3.

1Of course, when we differentiate we are not really multiplying an operator and a function,but the notation makes a useful mnemonic device.

128

Definition 18.16. Let Ω ⊂ R3 be the domain of a C1 vector field v : Ω→R3. At each x ∈ Ω we define the vector-valued operator

curl v =3∑i=1

3∑j=1

3∑k=1

εijk∂vk∂xj

ei.

Remark 18.17. Comparing our definition of the curl with the definition of thecross product, we see we can use the notation

curl v = ∇× v.

Using the determinant notation and the usual (x, y, z) coordinates and i, j andk basis vectors we get

curl v = ∇× v

=

∣∣∣∣∣∣i j k∂∂x

∂∂y

∂∂z

v1 v2 v3

∣∣∣∣∣∣=

(∂v3

∂y− ∂v2

∂z

)i−(∂v3

∂x− ∂v1

∂z

)j

+(∂v2

∂x− ∂v1

∂y

)k.

Remark 18.18. As we shall see in Part IV, the curl measures the tendency of avector field to swirl or rotate. (Another (less widely used) notation for the curlis rot v.) The direction of the curl marks the axis of rotation and the magnitudemarks the rate of rotation if we think of the vector field as a velocity.

Example 18.19. Let us compute the curl of the three-dimensional vector fielddepicted in Figure 12.3:

v(x, y, z) = (y,−x,−z).

We get

curl v =

∣∣∣∣∣∣i j k∂∂x

∂∂y

∂∂z

y −x −z

∣∣∣∣∣∣=

(∂

∂y(−z)− ∂

∂z(−x)

)i−(∂

∂x(−z)− ∂

∂z(y))

j

+(∂

∂x(−x)− ∂

∂y(u))

k

= −2k

129

Unlike Example 18.14 where we computed the divergence of this vector field,the third component of the original vector field v (which causes the field toflow toward the xy-plane) contributes a nothing to the curl. The other twocomponents (which contribute to the swirling nature of the flow) are the onesthat have a nonzero contribution. Note also that the curl points along thez−axis: the axis about which the field rotates. Furthermore, the direction ofrotation is related to the axis of rotation by the right-hand rule.

Problems

Problem 18.1. Compute ∇f for each of the following functions.(a) f(x, y) = x2y3 − 5yx.

(b) f(x, y, z) =x

y2 + z2+

y

z2 + x2+

z

x2 + y2.

(c) f(x, y) =1

(x2 + y2)3/2.

(d) f(x, y, z, u, v) = 5x− 3y + 4z2 − 4u3 + 6v−1.(e) f(x, y, z) = x+ ln(y2 + z2).(f) f(u, v, w) = eu + cos(vw).(g) f(x) = ‖x‖p, x ∈ Rn, p ∈ R.(h) f(x) = ln(‖x‖), x ∈ Rn.

Problem 18.2. Compute the directional derivative Dvf(x) for the given f andthe direction v parallel to the given vector w.(a) f(x, y) = x3y2 − y3x2, w = (3, 4).

(b) f(x, y) = ln(√x2 + y2), w = (2, 1).

(c) f(x, y, z) = exy + sin(yz), w = (−4, 3, 0).(d) f(x, y, z) = e−z(x2 + y2 + z2), w = (1,−1, 1).

Problem 18.3. Find the equation of the tangent plane to the given surface atthe given point.(a) x3 + y2 − z2 = 20, at x0 = (2, 4,−2).(b) x2y2z2 = 16, at x0 = (1,−1, 4).(c) x3 + ln(y2 + z2) = 8, at x0 = (2,−1, 0).(d) cos(πexyz) = −1, at x0 = (0, 2, 4).

Problem 18.4. Compute the divergence of the following vector fields.(a) f(x, y, z) = y2i + x2j + z2k.(b) f(x, y, z) = yi + xj + xk.(c) f(x, y, z) = axi + byj + czk.

(d) f(x, y, z) =xi + yjx2 + y2

.

(e) f(x, y, z) =yi− xjx2 + y2

.

130

Problem 18.5. Compute the curl of the following vector fields.(a) f(x, y, z) = y2i + x2j + z2k.(b) f(x, y, z) = yi + xj + xk.(c) f(x, y, z) = axi + byj + czk.

(d) f(x, y, z) =xi + yjx2 + y2

.

(e) f(x, y, z) =yi− xjx2 + y2

.

Problem 18.6. The velocity field v(x) of a rigid body rotating with angularvelocity ω about the line through the origin parallel to the unit vector n =(n1, n2, n3) is given by

v(x) = ωn× x.

Here x = (x, y, z). Calculate ∇ · v and ∇× v.

Problem 18.7. In this problem you are asked to show that the gradient, di-vergence, and curl are all invariant under a rigid rotation of coordinates. Thatis, suppose R is a 3× 3 rotation matrix

RRT = RTR = I, detR = 1,

or in components

3∑k=1

rikrjk = δij ,

3∑i,j,k=1

εijkr1ir2jr3k = 1.

Let x = (x1, x2, x3) be Cartesian coordinates for R3 and let y = (y1, y2, y3) bea rotated coordinates system defined by

y = Rx, x = RTy;

or in components

yi =3∑j=1

rijxj , xj =3∑i=1

rijyi.

For any scalar field f defined as a function of the x coordinates, we define atransformed field

g(y) = f(RTy)

or

g(y1, y2, y3) = f

(3∑i=1

ri1yi,

3∑i=1

ri2yi,

3∑i=1

ri3yi

).

For a vector field f defined as a function of the x coordinates we define thetransformed field

g(y) = Rf(RTy),

131

or for k=1,2,3,

gk(y1, y2, y3) =3∑j=1

rkjfj

(3∑i=1

ri1yi,

3∑i=1

ri2yi,

3∑i=1

ri3yi

).

(a) Show that∇yg = R∇xf,

or for k=1,2,3,∂g

∂yk=

3∑j=1

rkj∂f

∂xj.

(b) Show that∇y · g = ∇x · f ,

or3∑i=1

∂gi∂yi

=3∑i=1

∂fi∂xi

.

(c) Show that∇y × g = R∇x × f ,

or for i = 1, 2, 33∑

j,k=1

εijk∂gj∂yk

=3∑l=1

ril

3∑j,k=1

εljk∂fj∂xk

.

132

Chapter 19

Differential Operators inCurvilinear Coordinates

The physical interpretation of the gradient, divergence, curl, and Laplacian op-erators are intimately tied to their definitions in terms of Cartesian coordinates.For functions defined in terms of curvilinear coordinate systems we need to de-rive formulas that will allow us to perform the calculations of these operatorswithout converting back and forth between the curvilinear system and a Carte-sian system. In this chapter, we do this for the three most common curvilinearsystems: polar coordinates in R2 and cylindrical and spherical coordinates inR3.

19.1 Differential Operators Polar Coordinates

In this section we derive the following formulas for polar coordinates.

133

Theorem 19.1. Let f(r, θ) ∈ R be a real-valued function and v(r, θ) =vr(r, θ)er(θ) + vθ(r, θ)eθ(θ) ∈ R2 be a vector-valued function of polar coordi-nates for R2 as defined in Section 13.1. Then the gradient of f(r, θ) is givenby

∇f =∂f

∂rer +

1r

∂f

∂θeθ.

The divergence of v(r, θ) is given by

∇ · v =∂vr∂r

+vrr

+1r

∂vθ∂θ

.

The Laplacian of f(r, θ) is given by

∆f =∂2f

∂r2+

1r

∂f

∂r+

1r2

∂2f

∂θ2.

We begin by deriving the gradient of a scalar function f(r, θ). Since thefundamental definition of the gradient is given in Cartesian coordinates, wemust convert to a function f(x, y) to compute the gradient. The coordinatetransformation between polar and Cartesian coordinates is given by

x = r cos θy = r sin θ.

While this transformation is invertible with an appropriately restricted domain,the formulas for the inverse transformation r = r(x, y), θ = θ(x, y), are differentin different quadrants. (As we will see, this is a typical with deriving formulasfor differential operators in curvilinear coordinates systems. While coordinatetransformations are one-to-one maps, there is often only one direction wherethere is an “easy” formula for the transformation.) To deal with this problem,we proceed formally. Starting with our function f(r, θ) we define

f(x, y) = f(r(x, y), θ(x, y)).

Taking the gradient of our function of Cartesian coordinates and using the chainrule, we get

∇f =∂f

∂xi +

∂f

∂yj

=(∂f

∂r

∂r

∂x+∂f

∂θ

∂θ

∂x

)i +(∂f

∂r

∂r

∂y+∂f

∂θ

∂θ

∂y

)j.

We will use the identities

er(r, θ) = cos θi + sin θj,eθ(r, θ) = − sin θi + cos θj,

134

to get

i = cos θ er(r, θ)− sin θ eθ(r, θ),j = sin θ er(r, θ) + cos θ eθ(r, θ).

We also need to derive formulas for ∂r∂x , etc. To compute formulas for these we

consider the equations

x = r(x, y) cos θ(x, y)y = r(x, y) sin θ(x, y).

Differentiating each of these with respect to x gives us

1 = rx cos θ − r sin θ θx0 = rx sin θ + r cos θ θx.

Solving these gives us formulas for rx and θx. A similar calculation yieldsformulas for ry and θy.

Our calculation here could have been cut a bit shorter if we had used thefact inverse mappings have total derivative matrices that are inverses of eachother. (See Problem 17.9.) The total derivative matrix of the transformationfrom polar to Cartesian coordinates is

∂x

∂r

∂x

∂θ

∂y

∂r

∂x

∂r

=

(cos θ −r sin θ

sin θ r cos θ

)

Using Problem 17.9 we get∂r

∂x

∂r

∂y

∂θ

∂x

∂θ

∂y

=

∂x

∂r

∂x

∂θ

∂y

∂r

∂x

∂r

−1

=

cos θ sin θ

− sin θr

cos θr

We summarize these observations in the following lemma.

135

Lemma 19.2. Let r(x, y) and θ(x, y) be the standard transformation fromCartesian to polar coordinates for R2. Then

∂r

∂x= cos θ,

∂r

∂y= sin θ,

∂θ

∂x= − sin θ

r,

∂θ

∂y=

cos θr

.

Using the calculations above we get

∇f =(∂f

∂r

∂r

∂x+∂f

∂θ

∂θ

∂x

)i +(∂f

∂r

∂r

∂y+∂f

∂θ

∂θ

∂y

)j

=(∂f

∂rcos θ − ∂f

∂θ

sin θr

)(cos θ er − sin θ eθ)

+(∂f

∂rsin θ +

∂f

∂θ

cos θr

)(sin θ er(r, θ) + cos θ eθ(r, θ))

=∂f

∂rer +

1r

∂f

∂θeθ.

We now compute the divergence of a vector field of the form

v(r, θ) = vr(r, θ)er(θ) + vθ(r, θ)eθ(θ).

We write this as

v(r, θ) = vr(r, θ)(cos θi + sin θj) + vθ(r, θ)(− sin θi + cos θj)= (vr cos θ − vθ sin θ)i + (vr sin θ + vθ cos θ)j)= v1i + v2j.

136

We compute

∂v1

∂x=

∂

∂x(vr cos θ − vθ sin θ)

=(∂vr∂r

rx +∂vr∂θ

θx

)cos θ − vr sin θ θx

−(∂vθ∂r

rx +∂vθ∂θ

θx

)sin θ − vθ cos θ θx

=(∂vr∂r

cos θ − ∂vr∂θ

sin θr

)cos θ + vr

sin2 θ

r

−(∂vθ∂r

cos θ − ∂vθ∂θ

sin θr

)sin θ + vθ

cos θ sin θr

,

∂v2

∂y=

∂

∂y(vr sin θ + vθ cos θ)

=(∂vr∂r

ry +∂vr∂θ

θy

)sin θ + vr cos θ θy

+(∂vθ∂r

ry +∂vθ∂θ

θy

)cos θ − vθ sin θ θy

=(∂vr∂r

sin θ +∂vr∂θ

cos θr

)sin θ + vr

cos2 θ

r

+(∂vθ∂r

sin θ +∂vθ∂θ

cos θr

)cos θ − vθ

cos θ sin θr

.

Combining these and simplifying gives us

∇ · v =∂v1

∂x+∂v2

∂y=∂vr∂r

+vrr

+1r

∂vθ∂θ

.

We use this to compute the Laplacian of the scalar field. Since the Laplacianis the divergence of the gradient field we have

∆f(r, θ) = ∇ · ∇f

= ∇ ·(∂f

∂rer +

1r

∂f

∂θeθ

)=

∂

∂r

∂f

∂r+

1r

∂f

∂r+

1r

∂

∂θ

(1r

∂f

∂θ

)=

∂2f

∂r2+

1r

∂f

∂r+

1r2

∂2f

∂θ2.

19.2 Differential Operators in Cylindrical Coor-dinates

In cylindrical coordinates in R3 we establish the following identities.

137

Theorem 19.3. Let f(r, θ, z) ∈ R be a real-valued function and v(r, θ, z) =vr(r, θ, z)er(θ) + vθ(r, θ, z)eθ(θ) + vz(r, θ, z)ez ∈ R3 be a vector-valued func-tion of cylindrical coordinates for R3 as defined in Section 13.2. Then thegradient of f(r, θ, z) is given by

∇f =∂f

∂rer +

1r

∂f

∂θeθ +

∂f

∂zez.

The divergence of v(r, θ, z) is given by

∇ · v =∂vr∂r

+vrr

+1r

∂vθ∂θ

+∂vz∂z

.

The curl of v(r, θ, z) is given by

∇×v =(

1r

∂vz∂θ− ∂vθ

∂z

)er +

(∂vr∂z− ∂vz

∂r

)eθ +

(∂vθ∂r

+1rvθ −

1r

∂vr∂θ

)ez.

The Laplacian of f(r, θ, z) is given by

∆f =∂2f

∂r2+

1r

∂f

∂r+

1r2

∂2f

∂θ2+∂2f

∂z2.

The details of the proof are much the same as polar coordinates in R2. Wewon’t go through them here, but we will develop a few tools to make the taskeasier for the reader.

To review, the basic transformation is given by

x = r cos θ,y = r sin θ,z = z.

The total derivative matrix of this transformation is

∂x

∂r

∂x

∂θ

∂x

∂z

∂y

∂r

∂y

∂θ

∂y

∂z

∂z

∂r

∂z

∂θ

∂z

∂z

=

cos θ −r sin θ 0

sin θ r cos θ 0

0 0 1

Inverting this matrix and using the results of Problem 17.9 as above gives usthe following lemma.

138

Lemma 19.4. Let r(x, y, z), θ(x, y, z) and z(x, y, z) be the standard trans-formation from Cartesian to cylindrical coordinates for R3. Then

∂r

∂x

∂r

∂y

∂r

∂z

∂θ

∂x

∂θ

∂y

∂θ

∂z

∂z

∂x

∂z

∂y

∂z

∂z

=

cos θ sin θ 0

− sin θr

cos θr

0

0 0 1

The rest of the proof is left to the reader.

19.3 Differential Operators in Spherical Coordi-nates

For spherical coordinates in R3 we establish the following identities.

Theorem 19.5. Let f(ρ, θ, φ) ∈ R be a real-valued function and v(ρ, θ, φ) =vρ(ρ, θ, φ)eρ(θ, φ) + vθ(ρ, θ, φ)eθ(θ, φ) + vφ(ρ, θ, φ)eφ(θ, φ) ∈ R3 be a vector-valued function of spherical coordinates for R3 as defined in Section 13.3.Then the gradient of f(ρ, θ, φ) is given by

∇f =∂f

∂ρeρ +

1ρ sinφ

∂f

∂θeθ +

1ρ

∂f

∂φeφ.

The divergence of v(ρ, θ, φ) is given by

∇ · v =∂vρ∂ρ

+2vρρ

+1

ρ sinφ∂vθ∂θ

+1ρ

∂vφ∂φ

+cotφρ

vφ.

The curl of v(ρ, θ, φ) is given by

∇× v =(

cotφρ

vφ +1ρ

∂vθ∂φ− 1ρ sinφ

∂vφ∂θ

)eρ

+(

1ρvφ +

∂vφ∂ρ− 1ρ

∂vρ∂φ

)eθ +

(1

ρ sinφ∂vρ∂θ− 1ρvθ −

∂vθ∂ρ

)eφ.

The Laplacian of f(ρ, θ, φ) is given by

∆f =∂2f

∂ρ2+

2ρ

∂f

∂ρ+

1ρ2 sin2 φ

∂2f

∂θ2+

1ρ2

∂2f

∂φ2+

cotφρ2

∂f

∂φ.

139

Again, we will omit the details of the proof for the sake of brevity. (Thoughadmittedly, the details are a good deal more tedious for this system.) But onceagain we will develop a few important tools to make the task easier for thereader.

To review, the basic transformation is given by

x = ρ cos θ sinφ,y = ρ sin θ sinφ,z = ρ cosφ.

The total derivative matrix of this transformation is

∂x

∂ρ

∂x

∂θ

∂x

∂φ

∂y

∂ρ

∂y

∂θ

∂y

∂φ

∂z

∂ρ

∂z

∂θ

∂z

∂φ

=

cos θ sinφ −ρ sin θ sinφ ρ cos θ cosφ

sin θ sinφ ρ cos θ sinφ ρ sin θ cosφ

cosφ 0 −ρ sinφ

Inverting this matrix and using the results of Problem 17.9 as above gives usthe following lemma.

Lemma 19.6. Let ρ(x, y, z), θ(x, y, z) and φ(x, y, z) be the standard trans-formation from Cartesian to spherical coordinates for R3. Then

∂ρ

∂x

∂ρ

∂y

∂ρ

∂z

∂θ

∂x

∂θ

∂y

∂θ

∂z

∂ρ

∂x

∂ρ

∂y

∂ρ

∂z

=

cos θ sinφ sin θ sinφ cosφ

− sin θr sinφ

cos θr sinφ

0

cos θ cosφr

sin θ cosφr

− sinφr

The rest of the proof of the identities is left to the reader.

Problems

Problem 19.1. Calculate the gradient and Laplacian of the following scalarfunctions of polar coordinates for R2 at all points in their domains.

(a) f(r, θ) = rα, for a constant α ∈ R.

(b) f(r, θ) = r3(sin3 θ + cos3 θ).

(c) f(r, θ) = r2 cos θ.

140

Problem 19.2. Calculate the divergence of the following vector functions ofpolar coordinates for R2 at all points in their domains.

(a) v(r, θ) = rαeθ(θ), for a constant α ∈ R.

(b) v(r, θ) = rαer(θ), for a constant α ∈ R.

(c) v(r, θ) =√rer(θ) + 2eθ.

Problem 19.3. Calculate the gradient and Laplacian of the following scalarfunctions of cylindrical coordinates for R3 at all points in their domains.

(a) f(r, θ, z) = (r2 + z2)α/2, for a constant α ∈ R.

(b) f(r, θ, z) = r2(cos2 θ + sin2 θ) + z2.

(c) f(r, θ, z) = r2(cos2 θ − sin2 θ)− z2.

Problem 19.4. Calculate the divergence and the curl of the following vectorfunctions of cylindrical coordinates for R3 at all points in their domains.

(a) v(r, θ, z) = rαer(θ) +√zez, for a constant α ∈ R.

(b) v(r, θ, z) = 1reθ(θ).

(c) v(r, θ, z) = r cos θer(θ) + r sin θeθ(θ) + ez.

Problem 19.5. Calculate the gradient and Laplacian of the following scalarfunctions of spherical coordinates for R3 at all points in their domains.

(a) f(ρ, θ, φ) = ρα, for a constant α ∈ R.

(b) f(ρ, θ, φ) = ρ(cos θ sinφ+ sin θ sinφ+ cosφ).

(c) f(ρ, θ, φ) = ρ2(cos θ sinφ− sin θ sinφ− cosφ).

Problem 19.6. Calculate the divergence and the curl of the following vectorfunctions of spherical coordinates for R3 at all points in their domains.

(a) v(ρ, θ, φ) = ραeρ(θ, φ), for a constant α ∈ R.

(b) v(ρ, θ, φ) =√ρeθ(θ, φ).

(c) v(ρ, θ, φ) = 1ραeφ(θ, φ).

141

Chapter 20

Differentiation Rules

Having defined several basic differential operators, we now proceed to derive acollection of “differentiation rules” that describe how to take the derivatives ofcombinations of functions.

20.1 Linearity

Like the derivative operator of elementary calculus and the partial derivativeoperators introduced in Chapter 16.1, the gradient, divergence, and curl areeach linear.

Theorem 20.1. Let Ω ⊂ Rn and Υ ⊂ R3 be the domains of the C1 functionsf, g : Ω→ R, f ,g : Ω→ Rn and u,v : Υ→ R3. Let a, b ∈ R. Then we have

1. ∇(af + bg) = a∇f + b∇g,

2. ∇ · (af + bg) = a∇ · f + b∇ · g,

3. ∇× (au + bv) = a∇× u + b∇× v.

The proof of these rules follows from the linearity of the partial derivativeoperators and is left as an exercise. (See Problem 20.1.)

20.2 Product Rules

Product rules are some of the most useful tools in analysis. There are lots of pos-sible product combinations of scalar, vector, and matrix valued functions thatone might want to differentiate. The following are some of the most common.

142

Theorem 20.2. Let Ω ⊂ Rn and Υ ⊂ R3 be the domains of the C1 functionsf, g : Ω→ R, f ,g : Ω→ Rn and u,v : Υ→ R3. Then we have

1. ∇(fg) = f∇g + g∇f,

2. ∇ · (gf) = g∇ · f +∇g · f ,

3. ∇× (gv) = g∇× v +∇g × v,

4. ∇ · (u× v) = v · ∇ × u− u · ∇ × v.

Proof. Probably the easiest way to prove these is to express the identities interms of components and use the product rule for partial derivatives. We willprove the third identity and leave the rest as exercises. (See Problem 20.2.)

∇× (gv) =3∑i=1

3∑j=1

3∑k=1

εijk∂(g vk)∂xj

ei

=3∑i=1

3∑j=1

3∑k=1

εijk

(g∂vk∂xj

+∂g

∂xjvk

)ei

= g

3∑i=1

3∑j=1

3∑k=1

εijk∂vk∂xj

ei +3∑i=1

3∑j=1

3∑k=1

εijk∂g

∂xjvkei

= g∇× v +∇g × v.

Quotient rules tend to be less important than product rules, but they makesense for scalar fields.

Theorem 20.3. Let Ω ⊂ Rn be the domain of the C1 functions f, g : Ω→R, Then we have

∇(f

g

)=g∇f − f∇g

g2

at points x ∈ Ω where g(x) 6= 0.

Since the gradient is simply a vector of first partial derivative of the scalarfield, this follows immediately from the quotient rule for partial derivatives.

20.3 Second Derivative Rules

Of course we can combine our first-order differential operators in a variety ofways to form second-order differential operators. By far the most important isthe following.

143

Definition 20.4. Let Ω ⊂ Rn be the domain of a C2 scalar field f : Ω→ R.We define the Laplacian of f to be

∆f = ∇ · ∇f = div grad f =∂2f

∂x21

+∂2f

∂x22

+ · · ·+ ∂2f

∂x2n

.

Remark 20.5. The Laplacian appears in basic equations of electromagnetism,fluid flow, thermodynamics, and elasticity. The Divergence Theorem of Part IVwill give us an idea of why it is so ubiquitous.

Remark 20.6. An alternate notation1 for the Laplacian is ∇2f = ∆f .

The following product rules involving the Laplacian are quite useful in ap-plications.

Theorem 20.7. Let Ω ⊂ Rn be the domain of the C2 functions f, g : Ω→R, Then we have

1. ∇ · (f∇g) = f∆g +∇f · ∇g,

2. ∆(fg) = f∆g + g∆f + 2(∇f · ∇g).

The proof of these is left to the reader. (See Problem 20.3.)In R3 certain combinations of the divergence, curl, and gradient yield im-

portant identities.

Theorem 20.8. Let Υ ⊂ R3 be the domain of the C2 functions f : Υ→ Rand v : Υ→ R3, Then we have

1. ∇ · (∇× v) = div curl v = 0,

2. ∇× (∇f) = curl grad f = 0.

Proof. The proofs of these depend on the equality of mixed partial derivativesproved in Theorem 16.7. We will prove the first of these and leave the secondto the reader. (See Problem 20.4.)

∇ · (∇× v) =∂

∂x

(∂v3

∂y− ∂v2

∂z

)+

∂

∂y

(∂v1

∂z− ∂v3

∂x

)+

∂

∂z

(∂v2

∂x− ∂v1

∂y

)=

(∂2v1

∂y∂z− ∂2v1

∂z∂y

)+(∂2v2

∂z∂x− ∂2v2

∂x∂z

)+(∂2v3

∂x∂y− ∂2v3

∂y∂x

)= 0.

1I don’t really like the notation, (shouldn’t it be ‖∇‖2f?) so I won’t use it in this text.However, it is very common, and the reader should be familiar with it.

144

Remark 20.9. While it is not a proof, the vector notation certainly makesthese identities easy to remember. Note that if c is a scalar and a and b arevectors in R3 we always have

a · (a× b) = 0,

anda× (ac) = 0.

If a plays the role of ∇ and b and c play the roles of f and f respectively, thesevector identities are the same as the differential identities above.

Problems

Problem 20.1. Prove Theorem 20.1 by showing that if we have C1 functionsf, g : Rn → R, f ,g : Rn → Rn and u,v : R3 → R3 and a, b ∈ R, then we have

1. ∇(af + bg) = a∇f + b∇g,

2. ∇ · (af + bg) = a∇ · f + b∇ · g,

3. ∇× (au + bv) = a∇× u + b∇× v.

Problem 20.2. Complete the proof of Theorem 20.2 by showing that for C1

functions f, g : Rn → R, f ,g : Rn → Rn and u,v : R3 → R3 we have thefollowing.(a) ∇(fg) = f∇g + g∇f.(b) ∇ · (gf) = g∇ · f +∇g · f .(c) ∇ · (u× v) = v · ∇ × u− u · ∇ × v.

Problem 20.3. Prove Theorem 20.7 by showing that for C2 functions f, g :Rn → R we have

1. ∇ · (f∇g) = f∆g +∇f · ∇g,

2. ∆(fg) = f∆g + g∆f + 2(∇f · ∇g).

Problem 20.4. Complete the proof of Theorem 20.8 by showing that

∇× (∇f) = curl grad f = 0

145

Chapter 21

Eigenvalues

In this chapter we review of the basic definitions and theorems concerning eigen-values and eigenvectors. This subject has very deep implications and many ap-plications. However, it is hard to see this when the subject is first approached.There is a natural tendency for students to focus on the technical calculations.In the study of vector calculus, we will encounter several applications that willshed some light on the physical interpretation of eigenvalues and eigenvectors.

Definition 21.1. Let A be an n× n matrix. We call a scalar λ an eigen-value of A if there is a nonzero vector x such that

Ax = λx. (21.1)

Any vector x satisfying (21.1) is called an eigenvector corresponding to λ.The pair (λ,x) is called an eigenpair.

Example 21.2. Consider the matrix

A =(

2 −1−1 2

).

Note that (2 −1−1 2

)(−11

)=(−2− 11 + 2

)= 3

(−11

).

Thus, λ = 3 is an eigenvalue of A with corresponding eigenvector (−1, 1).In addition,(

2 −1−1 2

)(11

)=(

2− 1−1 + 2

)= (1)

(11

).

Thus, λ = 1 is a second eigenvalue of A with corresponding eigenvector (1, 1).

146

Example 21.3. Now consider the matrix

B =

−1 2 12 2 21 2 −1

.

Note that −1 2 12 2 21 2 −1

121

=

4184

= 4

121

.

Thus, λ = 4 is an eigenvalue of B with corresponding eigenvector (1, 2, 1).In addition, −1 2 1

2 2 21 2 −1

−101

=

20−2

= −2

−101

.

Thus, λ = −2 is also an eigenvalue of B with eigenvector (−1, 0, 1). We canfind another eigenvector corresponding λ = −2 since −1 2 1

2 2 21 2 −1

−210

=

4−20

= −2

−210

.

Remark 21.4. The definition of an eigenvalue above makes sense even if thescalar λ ∈ C is a complex number and the vector x ∈ Cn is allowed to bean n-tuple of complex numbers. The subject of linear algebra is “incomplete”without considering complex vectors in the same way that the subject of thealgebra of numbers is incomplete without complex numbers. As we see below,even real matrices can have complex eigenvalues, just as real algebraic equations(like x2 = −1) can have complex solutions.

Example 21.5. Consider the matrix

C =(

2 −11 2

).

Note that (2 −11 2

)(i1

)=(

2i− 1i+ 2

)= (2 + i)

(i1

).

Thus, the complex number 2+ i is an eigenvalue of C with corresponding eigen-vector (i, 1).

In addition,(2 −11 2

)(−i1

)=(−2i− 1i+ 2

)= (2− i)

(−i1

).

Thus, the complex number 2− i is another eigenvalue of C with correspondingeigenvector (−i, 1).

147

Eigenvectors corresponding to a given eigenvalue are hardly unique. In fact,we have the following.

Theorem 21.6. If x and y are eigenvectors corresponding to an eigenvalueλ of an n × n matrix A then any linear combination ax + by is also aneigenvector corresponding to λ.

Proof. Suppose Ax = λx and Ay = λy. Then for any scalars a and b

A(ax + by) = A(ax) +A(by)= aAx + bAy

= aλx + bλy

= λ(ax + by).

The previous theorem leads to the following.

Definition 21.7. The set of all eigenvectors corresponding to the eigenvalueλ is called the eigenspace corresponding to λ.

Theorem 21.8. The eigenspace corresponding to an eigenvalue λ is a sub-space of Rn. The dimension of that subspace is called the geometric mul-tiplicity of λ.

The definitions of subspaces and dimension are given in any standard linearalgebra text such as [7], and we won’t go into detail about these concepts here.We note that for any eigenvector x all parallel vectors are also eigenvectors,so the eigenspace would contain at least a (one-dimensional) line through theorigin. If there were two nonparallel eigenvectors corresponding to an eigen-value, the (two-dimensional) plane generated by those two vectors would bein the eigenspace. Note that in Example 21.3, the eigenvalue λ = −2 has ge-ometric multiplicity at least two since there are two nonparallel eigenvectorscorresponding to this eigenvalue.

Remark 21.9. Equation (21.1) can be a little cumbersome to deal with sinceit has a matrix product on one side and a scalar product on the other. We cangive it a little more balance if we write

Ax = λIx

148

where I is the identity matrix. This can be rearranged in the form

Ax− λIx = (A− λI)x = 0.

This is, of course, a homogeneous linear system. The following theorem is animmediate consequence of Theorem 6.17.

Theorem 21.10. The scalar λ is an eigenvalue of the matrix A if and onlyif

det(A− λI) = 0.

This result opens some interesting possibilities for us.

Theorem 21.11. Suppose A is an n× n matrix. Then the quantity

p(λ) = det(A− λI),

thought of as a function of the variable λ, is a polynomial of degree n. Wecall p(λ) the characteristic polynomial of A.

Proof. The entries of the matrix A−λI are either scalars aij for i 6= j or linearfactors in λ of the form aii − λ for the diagonal entries. The determinant isdefined to be the sum of n! nonzero terms, each of which is the product of nentries of A − λI. Only one of the terms has all diagonal entries. This is ofdegree n in λ with leading term (−1)nλn. All other terms have degree strictlyless than n.

By Theorem 21.10, the scalar λ is an eigenvalue of A if and only if it is a rootof the characteristic polynomial. We can use various results from algebra to giveus information about the characteristic polynomial. This information in turnimplies certain facts about the eigenvalues of A. For instance, the FundamentalTheorem of Algebra gives us the following.

Theorem 21.12. The characteristic polynomial can be factored in exactlyone way into n linear factors

p(λ) = (−1)n(λ− λ1)(λ− λ2) . . . (λ− λn)

where λ1, λ2, . . . , λn are the roots of p (and therefore the eigenvalues of A).These roots are possibly complex and are not necessarily distinct.

This leads immediately to the following result.

149

Corollary 21.13. An n× n matrix has at most n eigenvalues.

Definition 21.14. We say that an eigenvalue λi of a matrix A has alge-braic multiplicity k if the factor (λ − λi) appears exactly k times in thefactorization of the characteristic polynomial of A.

Example 21.15. Let us use these techniques to show how we would computethe eigenvalues and eigenvectors of the matrix A in Example 21.2. We firstcompute and factor the characteristic polynomial.

det(A− λI) = det((

2 −1−1 2

)− λ

(1 00 1

))= det

(2− λ −1−1 2− λ

)= (2− λ)2 − (−1)2

= λ2 − 4λ+ 3 = (λ− 3)(λ− 1).

The root λ = 3 and λ = 1 are the eigenvalues we identified in Example 21.2(which we now know are the only eigenvalues of A). Each eigenvalue has alge-braic multiplicity one.

Let us now compute the corresponding eigenvectors. To find the eigenvectorscorresponding to λ = 3 we must solve

(A− λI)x = 0,

or (00

)=

((2 −1−1 2

)− 3

(1 00 1

))(x1

x2

)=

(2− 3 −1−1 2− 3

)(x1

x2

)=

(−1 −1−1 −1

)(x1

x2

).

While it is easy to solve this system by inspection, in general we would rowreduce the matrix1 (

−1 −1−1 −1

)to get (

1 10 0

)1It doesn’t make sense to use an augmented matrix for homogeneous systems. The final

column of zeros never changes during elementary row operations.

150

corresponding to the equation x1 + x2 = 0. If we let x2 = s for any s ∈ R wecan solve for x1 to get the eigenvectors corresponding to λ = 3(

x1

x2

)=(−ss

)= s

(−11

).

Setting s = 1 gives us the eigenvector that we checked in Example 21.2. Sincethe full set of eigenvectors corresponding to λ = 3 is a one-dimensional line, theeigenvalue has geometric multiplicity one.

We leave the computation of the eigenvectors and geometric multiplicitycorresponding to λ = 1 to the reader.

Example 21.16. We can do the same the for the matrix B of Example 21.3. Wecompute the characteristic polynomial using the formula for a 3×3 determinant.

det(B − λI) =

∣∣∣∣∣∣−1− λ 2 1

2 2− λ 21 2 −1− λ

∣∣∣∣∣∣= (−1− λ)(2− λ)(−1− λ) + 2(2)(1) + (1)(2)(2)

−(−1− λ)(2)(2)− (1)(2− λ)(1)− (2)(2)(−1− λ)= −λ3 + 12λ+ 16.

Factoring a cubic polynomial is not a routine task, but since we know twoeigenvalues of B already, we know that λ = −2 and λ = 4 are roots. Usingsynthetic division, we find that

−λ3 + 12λ+ 16 = −(λ− 4)(λ+ 2)2.

Thus, λ = 4 has algebraic multiplicity one while λ = −2 has algebraic multi-plicity two.

To compute the eigenvectors corresponding to λ = 4 we row reduce thematrix

B − 4I =

−5 2 12 −2 21 2 −5

to get 1 0 −1

0 1 −20 0 0

.

This is equivalent to the system

x1 − x3 = 0,x2 − 2x3 = 0.

Setting x3 = s, where s is arbitrary, and solving for the remaining variablesyields the solutions (which are eigenvectors) x1

x2

x3

=

s2ss

= s

121

.

151

The eigenspace corresponding to λ = 4 is a (one-dimensional) line through theorigin parallel to (1, 2, 1). The geometric multiplicity of λ = 4 is therefore one.

To compute the eigenvectors corresponding to λ = −2 we row reduce thematrix

B + 2I =

1 2 12 4 21 2 1

to get 1 2 1

0 0 00 0 0

.

This is equivalent to the equation

x1 + 2x2 + x3 = 0.

Here we allow x2 = s and x3 = t to arbitrary and solve for x1. This yields theset of solutions (eigenvectors) x1

x2

x3

=

−2s− tst

= s

−210

+ t

−101

.

This set is a (two-dimensional) plane in R3. The geometric multiplicity ofλ = −2 is therefore two.

Note that in the previous example the geometric multiplicity was the sameas the algebraic multiplicity. The following is a more general result which westate without proof.

Theorem 21.17. The geometric multiplicity of any eigenvalue is less thanor equal to its algebraic multiplicity.

Example 21.18. Finally we work with the complex eigenvalues and eigenvec-tors of the matrix C from Example 21.5. We first compute the characteristicpolynomial.

det(C − λI) = det(

2− λ −11 2− λ

)= (2− λ)2 + 1= λ2 − 4λ+ 5.

Here we use the quadratic formula to compute the roots (which are, of course,the eigenvalues of C).

λ =−(−4)±

√(−4)2 − 4(1)(5)

2= 2± i.

152

To compute the eigenvectors corresponding to λ = 2 + i we must row reduce

C − (2 + i)I =(−i −1

1 −i

).

With a little complex arithmetic, this reduces to(1 −i0 0

).

This is equivalent to the equation x1− ix2 = 0. Allowing x2 = s to be arbitraryand solving for x1 we get the solution set(

x1

x2

)=(iss

)= s

(i1

).

These are the eigenvectors corresponding to λ = 2 + i.To compute the eigenvectors corresponding to λ = 2− i we row reduce

C − (2− i)I =(

i −11 i

).

Again, this reduces to (1 i0 0

).

This is equivalent to the equation x1 + ix2 = 0. Allowing x2 = s to be arbitraryand solving to x1 we get the solutions(

x1

x2

)=(−iss

)= s

(−i1

).

These are the eigenvectors corresponding to λ = 2− i.

Fortunately, symmetric matrices will be important in our applications, andfor this class of matrices we have a good deal of information about the eigen-values and eigenvectors.

Theorem 21.19. Suppose A is a real symmetric matrix.

1. All eigenvalues of A are real.

2. Eigenvectors corresponding to distinct eigenvalues are orthogonal.

3. The algebraic multiplicity of each eigenvalue is equal to its geometricmultiplicity.

153

Proof. The proof of the first result requires us to introduce complex inner prod-ucts. Accordingly, we omit the proofs of parts one and three which can be foundin many standard texts on linear algebra. However, the proof of the second partrequires no new notation. Suppose

Av = λ1v, Au = λ2u,

but λ1 6= λ2. Then

λ1(v · u) = (λ1v) · u= (Av) · u= v · (ATu)= v · (Au)= v · (λ2u)= λ2v · u.

This can be written (λ1−λ2)(v ·u) = 0. Since λ1 6= λ2 this implies v ·u = 0.

Example 21.20. Example 21.3 showed a symmetric 3× 3 matrix

B =

−1 2 12 2 21 2 −1

with two real eigenvalues: λ = −2 and λ = 4. Note that indeed the algebraicand geometric multiplicities are the same for each of the eigenvalues. Note alsothat the eigenvector x1 = (1, 2, 1) corresponding to λ = 4 is orthogonal to bothof the basis eigenvectors x2 = (−2, 1, 0) and x3 = (−1, 0, 1) corresponding toλ = −2.

Example 21.21. It is pretty easy to see that a real diagonal matrix (which is,of course, symmetric) has eigenvalues given by the diagonal elements themselvesλi = aii with corresponding eigenvectors parallel to the standard basis vectorei. So, for instance, the matrix

−2 0 0 00 −2 0 00 0 5 00 0 0 5

has eigenvalues λ = −2 and λ = 5. The eigenspace corresponding to λ = −2contains all linear combinations of e1 and e2. Any vector in this set is orthogonalto any vector in the eigenspace corresponding to λ = 5 which contains all linearcombinations of e3 and e4.

The following definition will be important in the max-min problems below.

154

Definition 21.22. We say that an n× n matrix A is positive definite if

Ax · x > 0

for every nonzero x ∈ Rn and positive semidefinite if

Ax · x ≥ 0

for every nonzero x ∈ Rn. Negative definite and negative semidefinite ma-trices satisfy the opposite inequalities.

Positive definite symmetric matrices are characterized by their eigenvalues.

Theorem 21.23. A symmetric matrix is positive definite (semidefinite) ifand only if all of its eigenvalues are positive (nonnegative).

A symmetric matrix is negative definite (semidefinite) if and only if all ofits eigenvalues are negative (nonpositive).

Again, we omit the proof of this theorem.The following result can be helpful.

Theorem 21.24. Every positive definite matrix and every negative definitematrix is invertible.

Proof. Suppose for the sake of contradiction that some definite matrix A is notinvertible. Then Theorem 6.17 implies that there is a nonzero vector x suchthat Ax = 0. But this implies Ax · x = 0, a contradiction of the assumptionthat A is definite.

Problems

Problem 21.1. For the following matrices, find all eigenvalues and eigenvectors.State the algebraic and geometric multiplicity of each eigenvalue.(a) (

−1 25 2

).

(b) (3 1−1 5

).

(c) (4 −21 2

).

155

(d) 10 0 40 2 04 0 4

.

(e) 2 2 12 5 21 2 2

.

(f) 2 −1 0−1 3 −1

0 −1 2

.

Problem 21.2. Calculate the eigenvalues of the generic symmetric 2×2 matrix

A =(a cc b

).

Show that A is positive definite if and only if detA > 0 and a > 0 and negativedefinite if and only if detA > 0 and a < 0.

156

Chapter 22

Quadratic Approximationand Taylor’s Theorem

One of the more important results in the calculus of a single variable is Taylor’sTheorem, which tells us how to use differential calculus to approximate anarbitrary smooth function by a polynomial of any degree. In the second sectionof this chapter we discuss a generalization of this theorem to scalar functionson Rn. However, for higher order polynomials we have to introduce some newnotation that is a bit specialized. In the first section we restrict our discussionto the approximation of functions from Rn to R by quadratic functions. We cando this with a relatively standard system of notation and the results are crucialto deriving the first and second derivative tests for max-min problems in thenext chapter.

22.1 Quadratic Approximation of Real-ValuedFunctions

A general quadratic real-valued function on Rn can be written in the form

q(x) =12xTAx + b · x + c

=12

n∑i=1

n∑j=1

aijxixj +n∑i=1

bixi + c

where A is a symmetric n× n matrix, b ∈ Rn is a vector, and c ∈ R is a scalar.(Problem 4.8 shows why we can assume that A is symmetric without loss ofgenerality.)

What do functions of this form look like? Well, that is pretty easy to answer

157

if A is diagonal.

A =

λ1 0 0 · · · 00 λ2 0 · · · 00 0 λ3 · · · 0...

......

. . ....

0 0 0 · · · λn

so that we have

q(x) =12

n∑i=1

(λix2i + bixi) + c.

Thus, we can deduce the following information about the function and its levelsets.

1. Along coordinate axes corresponding to λi > 0, the function is a parabolawhich is concave up.

2. Along coordinate axes corresponding to λi < 0, the function is a parabolawhich is concave down.

3. Along coordinate axes corresponding to λi = 0, the function is linear.

4. If all of the λi are strictly positive (or all are strictly negative) then thegraph of the function is a paraboloid in Rn+1 and its level sets are ellipsoidsin Rn.

5. If some of the λi are strictly positive and some are strictly negative thenthe graph of the function is a saddle in Rn+1 and its level sets are hyper-boloids in Rn.

While this might seem like a very special result for diagonal matrices, in factit is true for a general symmetric matrix. In this case the λi are the eigenvaluesof the matrix. This is due to a theorem (which we don’t prove here) which saysthat any symmetric matrix can be “diagonalized.” That is, the coordinate axescan be rotated in such a way that in the new coordinate system, the matrix isdiagonal. We state this basic result as a theorem.

158

Theorem 22.1. Let q : Rn → R be a quadratic function given by

q(x) =12xTAx + b · x + c,

where A is a symmetric n×n matrix, b ∈ Rn and c ∈ R. Then the followingholds.

1. If all eigenvalues of A are strictly positive then

(a) The level sets of q are ellipsoids.

(b) q is bounded below and has a global minimum at x0 = −A−1b.

2. If all eigenvalues of A are strictly negative then

(a) The level sets of q are ellipsoids.

(b) q is bounded above and has a global maximum at x0 = −A−1b.

3. If at least one eigenvalue of A is strictly positive and at least one isstrictly negative then

(a) q is unbounded both below and above.

(b) q has no local minimum or maximum points - all critical pointsof q are saddles.

We now consider the general problem of how to approximate of an arbitraryfunction f : Rn → R by a quadratic polynomial. Note that we already have twofacts at our disposal.

1. We already know the best linear approximation for the function f .

lf (x0,x) = Df(x0)(x− x0) + f(x0) = ∇f(x0) · (x− x0) + f(x0).

2. We know from elementary calculus the quadratic Taylor polynomial ap-proximating a function of one variable f : R→ R.

qf (x) =12f ′′(x0)(x− x0)2 + f ′(x0)(x− x0) + f(x0).

This quadratic polynomial has the same functional value, first derivative,and second derivative at the point x0 as the original function f . That is

qf (x0) = f(x0),d

dxqf (x0) = f ′(x0),

d2

dx2qf (x0) = f ′′(x0).

Taylor’s theorem say more: the Taylor polynomial is the “best fitting”quadratic function in the sense given below.

159

Clearly, to extend this version of Taylor’s theorem to functions of n variableswe need a generalization of the second derivative. We introduce the following.

Definition 22.2. Let Ω ⊂ Rn be the domain of a C2 function f : Ω → R.We define the Hessian matrix of f to be the function

H(x) =

∂2f∂x2

1

∂2f∂x1∂x2

· · · ∂2f∂x1∂xn

∂2f∂x2∂x1

∂2f∂x2

2· · · ∂2f

∂x2∂xn...

.... . .

...∂2f

∂xn∂x1

∂2f∂xn∂x2

· · · ∂2f∂x2n

(x).

Note that since f is C2 the Hessian matrix1 is always symmetric.We use the Hessian matrix to state the following generalization of Taylor’s

theorem on quadratic approximation.

Theorem 22.3. Let Ω ⊂ Rn be the domain of a C2 function f : Ω → R.For x0 ∈ Ω we define the second-order Taylor polynomial of f at x0

to be

qf (x0,x) =12

(x− x0)TH(x0)(x− x0) +∇f(x0) · (x− x0) + f(x0)

=12

n∑i=1

n∑j=1

∂2f

∂xi∂xj(x0)(xi − x0,i)(xj − x0,j)

+n∑i=1

∂f

∂xi(x0)(xi − x0,i) + f(x0),

Then

limx→x0

f(x)− qf (x0,x)‖x− x0‖2

= 0.

While we won’t prove this theorem,, we note that once again we have an“interpolation result.” That is, the partial derivatives of order two or lower of

1There is a bit of confusing standard terminology here. In later chapters, we will have todistinguish between the n× n Hessian matrix of f and the determinant of that matrix whichis called simply “the Hessian” of f .

160

the function and its quadratic approximation agree at the point x0.

qf (x0,x)|x=x0= f(x0),

∂

∂xiqf (x0,x)

∣∣∣∣x=x0

=∂

∂xif(x0),

∂2

∂xj∂xiqf (x0,x)

∣∣∣∣x=x0

=∂2

∂xj∂xif(x0).


f(x, y) = x2y3.

We compute its quadratic approximation at the point (x0, y0) = (1,−1) asfollows. It’s gradient and Hessian are

∇f(x, y) =(

2xy3

3x2y2

), ∇H(x, y) =

(2y3 6xy2

6xy2 6x2y

).

Evaluating at (x0, y0) = (1,−1) we get

∇f(1,−1) =(−13

), ∇H(1,−1) =

(−2 66 −6

).

Using this with f(1,−1) = −1 give us the following quadratic function

qf ((1,−1); (x, y)) =12

(x− 1, y + 1)(−2 6

6 −6

)(x− 1y + 1

)+(−1

3

)·(x− 1y + 1

)− 1

= −(x− 1)2 + 6(x− 1)(y + 1)− 3(y + 1)2

−2(x− 1) + 3(y + 1)− 1.

22.2 Taylor’s Theorem

In this section we give a general version of Taylor’s theorem. To do so weintroduce a notation called “multi-indices.” This notation is very convenient inavoiding excessively cumbersome notations in partial differential equations, andit comes in handy in this general exposition of Taylor’s theorem on Rn.

161

Definition 22.5. A multi-index is a vector

α = (α1, α2, . . . , αn)

whose components are non-negative integers. The notation

α ≥ β

indicates that αi ≥ βi for each i. For any multi-index α, we define

|α| = α1 + α2 + · · ·+ αn.

α! = α1!α2! · · ·αn!.

for any vector x = (x1, x2, . . . , xn) ∈ Rn, we set

xα = xα11 xα2

2 · · ·xαnn .

For a smooth function f : Rn → R we define

Dαf =∂|α|f

∂xα11 ∂xα2

2 · · · ∂xαnn.

Example 22.6. For example, if α = (1, 2), then

|α| = 1 + 2 = 3,

α! = 1!2! = 2,

if x = (x, y) thenxα = x y2,

and finally,

Dαu =∂3u

∂x∂y2.

Example 22.7. General polynomials on Rn have a very compact notation usingmulti-indices. A polynomial of degree k can be written

p(x) =∑|α|≤k

aαxα,

where the aα are constants. If we spell this out in detail for a quadratic poly-nomial in R2 we get

p(x, y) = a(0,0)x0y0 + a(1,0)x

1y0 + a(0,1)x0y1 + a(2,0)x

2y0

+a(1,1)x1y1 + a(0,2)x

0y2

= a+ bx+ cy + dx2 + exy + fy2.

162

The multi-index notation makes the Taylor polynomial very easy to write.

Definition 22.8. Let Ω ⊂ Rn be the domain of a Ck function f : Ω → R.Let x0 be an interior point of Ω. Then the Taylor polynomial of degreek of f at x0 is

pkf (x0,x) =∑|α|≤k

Dαf(x0)α!

(x− x0)α.

We now state without proof the basic results for Taylor polynomial. Theseare essentially the same as those for functions of one dimensional domains. Thefirst states that the Taylor polynomial of order k interpolates the derivatives off up to order k.

Theorem 22.9. The partial derivative up to order k Taylor polynomial ofdegree k of f at x0 agree with those of f at x0. That is, if |α| ≤ k then

Dαpkf (x0,x0) = Dαf(x0).

The second is our basic convergence result.

Theorem 22.10. Let Ω ⊂ Rn be the domain of a Ck+1 function f : Ω→ R.Let x0 be an interior point of Ω. Then

limx→x0

‖pkf (x0,x)− f(x)‖‖x− x0‖k

= 0.

Problems

Problem 22.1. For the following functions, compute their second-order Taylorpolynomial at the given x0.(a)

f(x, y) = x2y3 + 5xy2 + 7x2y + xy, (x0, y0) = (1,−2).

(b)f(x, y) = ex

2+4y2, (x0, y0) = (0, 0).

(c)

f(x, y, z) = x2y2z3 + 4x2yz − 6xyz + 2x+ 3y − 5z, (x0, y0, z0) = (2, 1,−1).

163

(d)f(x, y, z) = cos(x+ 3y − z), (x0, y0, z0) = (0, 0, 0).

(e)f(w, x, y, z) = wxyz, (w0, x0, y0, z0) = (1, 2,−1,−2).

(f)

f(w, x, y, z) = sin(x− y2) + ln(w2 + z2), (w0, x0, y0, z0) = (1, 0, 0, 1).

164

Chapter 23

Max-Min Problems

In this chapter we consider one on the more important topics in analysis: opti-mization. The goal is to find the point or points in a domain where a real-valuedfunction is maximized or minimized. While the basic theoretical results on thissubject are pretty straightforward, the technical difficulties in many applicationsare formidable. Indeed, many large universities have several courses devoted toapplied optimization in disciplines such as engineering and operations research.In this treatment we content ourselves with the theoretical basics, though sometechnical difficulties are touched on in the examples and problems.

We begin with some basic definitions.

165

Definition 23.1. Let Ω ⊂ Rn be the domain of a function f : Ω→ R.

• We say that x0 ∈ Ω is a global minimizer and f(x0) is the globalminimum (or global min) of f if

f(x0) ≤ f(x) for all x ∈ Ω.

• We say that x0 ∈ Ω is a strict global minimizer and f(x0) is thestrict global minimum of f if

f(x0) < f(x) for all x ∈ Ω such that x 6= x0.

• We say that x0 ∈ Ω is a local minimizer and f(x0) is a local min-imum (or local min) of f if there exists ε > 0 such that

f(x0) ≤ f(x) for all x ∈ Ω such that ‖x− x0‖ < ε.

• We say that x0 ∈ Ω is a strict local minimizer and f(x0) is a strictlocal minimum of f if there exists ε > 0 such that

f(x0) < f(x) for all x ∈ Ω such that 0 < ‖x− x0‖ < ε.

• Global and local maximizers and maxima are defined with the reverseinequality, e.g. x0 is a global maximizer and f(x0) is a globalmaximum of f if

f(x0) ≥ f(x).

for all x ∈ Ω.

• Minimizers and maximizers are called extremizers. Minima andmaxima are called extrema.

Remark 23.2. Note that we have used different language for the extremevalues of the function (minimum, maximum, min, max) and the points in thedomain at which the extreme values are achieved (minimizer, maximizer). Thisdistinction is sometimes ignored in the literature, so readers should be carefulto determine the difference from context.

Remark 23.3. Every global minimizer is also a local minimizer, but a functioncan have local minimizers that are not global minimizers.

Our first goal is to state a general existence theorem.

Theorem 23.4. Let Ω ⊂ Rn be the domain of a function f : Ω → R. IfΩ is closed and bounded and f is continuous on Ω then Ω contains both aglobal minimizer and a global maximizer of f .

166

We will not prove this theorem since the proof depends on concepts thatare not covered in this text1. However, it is worth noting that each of thehypotheses is crucial.

• The continuous function f(x) = x has no maximizer or minimizer on theclosed, unbounded set R.

• The continuous function f(x) = tanx has no maximizer or minimizer onthe open, bounded set (−π2 ,

π2 ).

• The discontinuous function

f(x) =

0, x = ±1x, −1 < x < 1,

has no minimizer or maximizer on the closed, bounded set [−1, 1].

e

eu u

-

6

?

x

f(x)

Figure 23.1: A discontinuous function that does not achieve its maximum orminimum on a closed bounded set.

As was noted earlier, Theorem 23.4 can be used to prove Lemma 17.14. Wecan also use it for the following.

Lemma 23.5. Suppose A is a positive definite n×n matrix. Then there existsK > 0 such that

xTAx ≥ K‖x‖2

for all x ∈ Rn.

Proof. Note that for any x 6= 0 we can write

xTAx = eTAe‖x‖2

1The proof depends on the “completeness” of the real numbers (using the concepts of“supremum” and “infimum”) to establish the existence of a “minimizing sequence” of pointsin Ω. It then uses the “compactness” of the closed, bounded set Ω to establish the convergenceof a minimizing sequence to x0 ∈ Ω. Finally it uses the continuity of f to show that x0 is a(global) minimizer. The concepts of completeness, compactness, and continuity are coveredin many books on “Advanced Calculus.” (See, e.g. Abbott [1].)

167

where we have definede =

x‖x‖

.

If we now consider the continuous function

f(e) = eTAe

defined over the closed, bounded sphere of unit vectors

S = e ∈ Rn | ‖e‖ = 1

then Theorem 23.4 says that there exists a global minimizer e0 ∈ S. We define

K = f(e0) = eT0 Ae0

and note that K must be strictly positive since A is positive definite. Since

eTAe ≥ K

for all e ∈ S we have

xTAx = eTAe‖x‖2 ≥ K‖x‖2

for all x 6= 0. Since equality holds at x = 0 this completes the proof.

Theorem 23.4 tells us that if our domain is closed and bounded, then themaximizer and minimizer of any continuous function are there to be found.Unfortunately, it doesn’t give any idea how we should look for them. It turnsout that the methods for seeking extremizers can be very different dependingon whether we look at interior points or boundary points of the domain. Thetheory for finding interior extremizers is very clear and easy to apply. We willcover it thoroughly in this text in chapters on the first and second derivativetests. The problem of finding extremizers on the boundary of a set can dependheavily on the smoothness of the boundary (e.g. if the boundary has corners).Furthermore, there are a variety of specialized methods for finding boundaryextremizers if the boundary has specific shapes (e.g. if it is composed of planesor lines). While we will consider a few simple exercises of this form, we will leavemost of the specialized techniques to courses and texts that focus on optimiza-tion. We will cover (briefly and incompletely) a techniques called “LagrangeMultipliers” that is useful on smooth boundaries.

23.1 First Derivative Test

We begin our examination of the theory of interior extremizers with the firstderivative test. This is an obvious extension of the test for functions of a singlevariable, and in fact our proof will depend on the single variable test.

168

Theorem 23.6. Let Ω ⊂ Rn be the domain of a C1 function f : Ω→ R. Ifx0 ∈ Ω is an interior point of Ω and a local minimizer or maximizer of fthen

∇f(x0) = 0.

Proof. Suppose for the sake of contradiction that x0 is a local minimizer of fand ∇f(x0) 6= 0. We then define the function of a single variable

f(t) = f(x0 + t∇f(x0)).

Since x0 is an interior point of Ω, this function is well defined on some openinterval around t = 0. (If x0 is an interior point of Ω and v is any vector, thenx0 + tv ∈ Ω for t sufficiently small.) Furthermore, t = 0 is a local minimizer off(t) since, for t sufficiently small

f(0) = f(x0) ≤ f(x0 + t∇f(x0)) = f(t)

since x0 is a local minimizer for of f . Thus, by the first derivative test forfunctions of a single variable

0 =df

dt(0) = ∇f(x0 + t∇f(x0)) · ∇f(x0)|t=0 = ‖∇f(x0)‖2.

This contradicts the assumption that ∇f(x0) 6= 0. A similar proof works forlocal maximizers of f .

Example 23.7. The origin is a strict global minimizer of the smooth function

f(x, y) = x2 + y2

since0 = f(0, 0) < x2 + y2 = f(x, y)

for all (x, y) 6= (0, 0). As the theorem states, the gradient

∇f(x, y) = (2x, 2y)

is (0, 0) at the origin. See Figure 23.2.

Points satisfying the first derivative test are important enough that we givethem a name.

Definition 23.8. Let Ω ⊂ Rn be the domain of a C1 function f : Ω → R.We say that an interior point x0 ∈ Ω is a critical point of f if

∇f(x0) = 0.

169

-1

-0.5

0

0.5

1

x

-1

-0.5

0

0.5

1y

0

0.5

1

1.5

2

x2 + y2

-1

-0.5

0

0.5x

-1

-0.5

0

0.5

1y

Figure 23.2: The parabolic function f(x, y) = x2 +y2 is minimized at the origin.

Remark 23.9. The first derivative test requirement∇f(x0) = 0 is a necessarycondition for x0 to be an interior local (or global) minimizer or maximizer.However, as we recall from one-dimensional calculus, x0 can satisfy the firstderivative test and not be a local minimizer. Thus, the first derivative test isnot a sufficient condition to ensure that x0 is a local minimizer.

Example 23.10. Note that function R2 → R given by

f(x, y) = x2 − y2

has the gradient

∇f(x, y) =(

2x−2y

).

Thus it has a critical point at (x0, y0) = (0, 0). However, this point can not bea local minimizer or maximizer since f increases along the x-axis and decreasesalong the y-axis. See Figure 23.3. The characteristic shape of this graph leadsto the following definition.

Definition 23.11. A critical point which is neither a local maximizer or alocal minimizer is called a saddle point.

The second derivative test will provide us with an additional necessary con-dition for an extremum and, in some cases, a sufficient condition.

23.2 Second Derivative Test

We begin with the second derivative “necessary condition.”

170

-1

-0.5

0

0.5

1

x

-1

-0.5

0

0.5

1y

-1

-0.5

0

0.5

1

x2 - y2

-1

-0.5

0

0.5x

-1

-0.5

0

0.5

1y

Figure 23.3: The saddle f(x, y) = x2 − y2 with a critical point at the originwhich is not a minimizer or maximizer.

Theorem 23.12. Let Ω ⊂ Rn be the domain of a C2 function f : Ω→ R. Ifx0 ∈ Ω is an interior point of Ω and a local minimizer (maximizer) of f thenthe Hessian matrix H(x0) is positive semidefinite (negative semidefinite).

Proof. Suppose for the sake of contradiction that x0 is an interior minimizer off and H(x0) is not positive semidefinite. Since this is so, there is a unit vectore ∈ Rn such that

eTH(x0)e < 0.

Since x0 is an interior local minimizer the first derivative test tells us that∇f(x0) = 0. Using this, we compute the second-order Taylor polynomial of fat x0

qf (x0,x) = f(x0) + (x− x0)TH(x)(x− x0).

By Taylor’s theorem this has the property that

r(x)‖x− x0‖2

=f(x)− qf (x0,x)‖x− x0‖2

goes to zero as ‖x− x0‖ → 0.We now look along the line x0 + te (where ‖x − x0‖ = |t|) and compute

(with a little manipulation)

f(x0 + te)− f(x0) = t2(

eTH(x)e +r(x0 + te)

t2

).

Since x0 is a local minimizer, this should be nonnegative for all t sufficientlysmall. However, since eTH(x)e is strictly negative (and constant) and

r(x0 + te)‖te‖2

=r(x0 + te)

t2

171

goes to zero as t → 0 the sum must be strictly negative for all t sufficientlysmall – a contradiction.

Our eigenvalue condition on positive (and negative) definite matrices (The-orem 21.23) immediately leads us to the following.

Corollary 23.13. If at a critical point x0 the Hessian matrix H(x0) has astrictly positive eigenvalue then x0 cannot be a local maximizer. If it has astrictly negative eigenvalue then x0 cannot by a local minimizer. If it has bothstrictly positive and strictly negative eigenvalues then x0 must be a saddle.

Example 23.14. Letf(x, y) = xy.

Then

∇f(x, y) =(yx

).

So the origin is the only critical point. To use the second derivative test wecompute

H(x, y) =(

0 11 0

).

We then compute its eigenvalues

det(H − λI) = det(−λ 11 −λ

)= λ2 − 1 = (λ− 1)(λ+ 1).

So the Hessian matrix has one positive eigenvalue λ = 1 and one negative eigen-value λ = −1. Since it cannot be either a positive semidefinite or a negativesemidefinite matrix, the origin (the only critical point) cannot be a local min-imizer or maximizer and must be a saddle. This can be seen in Figure 23.4.

-1

-0.5

0

0.5

1

x

-1

-0.5

0

0.5

1y

-1

-0.5

0

0.5

1

xy

-1

-0.5

0

0.5x

-1

-0.5

0

0.5y

Figure 23.4: The graph of the saddle function f(x, y) = xy.

172

As we said, the conditions above are necessary conditions for a local mini-mizer or maximizer. A stronger condition on the second derivative gives us asufficient condition.

Theorem 23.15. Let Ω ⊂ Rn be the domain of a C2 function f : Ω → R.Suppose an interior point x0 ∈ Ω is a critical point of f . If in additionthe Hessian matrix H(x0) is positive definite (negative definite) then x0 isa strict local minimizer (maximizer) of f .

Proof. As in the proof of Theorem 23.12, we can use the fact that x0 is a criticalpoint to write

f(x)− f(x0) = (x− x0)TH(x0)(x− x0) + r(x)

wherer(x) = f(x)− qf (x0,x).

Our goal is to show that f(x)− f(x0) is strictly positive for x sufficiently closeto x0. To see this we use Lemma 23.5 to show that there exists K > 0 suchthat

f(x)− f(x0) ≥ K‖x− x0‖2 + r(x) = ‖x− x0‖2(K +

r(x)‖x− x0‖2

).

Since Taylor’s theorem tells us that

r(x)‖x− x0‖2

→ 0

as ‖x − x0‖ → 0 and since K > 0 we see that this quantity must be strictlypositive for ‖x− x0‖ sufficiently small.


f(x, y) =14((1− x2)2 + (1− y2)2

)graphed in Figure 23.5. We begin by finding its critical points, setting

∇f(x, y) = (−x(1− x2),−y(1− y2)) = (0, 0).

Solving these two equations gives us nine critical points

(0, 0), (0, 1), (0,−1), (1, 0), (1, 1), (1,−1), (−1, 0), (−1, 1), (−1,−1).

We can determine which are maxima, minima, and saddles by computing theHessian matrix

H(x, y) =(−1 + 2x2 0

0 −1 + 2y2

).

Since this is a diagonal matrix, its eigenvalues are simply the diagonal elements.Thus, the second derivative test tells us the following.

173

x

y

f

x

y

Figure 23.5: The graph of the function f(x, y) = 14 ((1− x2)2 + (1− y2)2).

• The point (0, 0) is a strict local maximum since

H(0, 0) =(−1 00 −1

)has an negative eigenvalue λ = −1 of multiplicity 2 and hence is negativedefinite.

• The four points (1, 1), (1,−1), (−1, 1), (−1,−1) are strict local minimasince the Hessian matrix at each point

H(1, 1) = H(−1, 1) = H(1,−1) = H(−1,−1) =(

2 00 2

)has an positive eigenvalue λ = 2 of multiplicity 2 and hence is positivedefinite. (In fact, these points are strict global minima since the function(which is a sum of squares) is strictly positive at all other points. However,we can not get this fact directly from the first and second derivative tests.)

• The four points (1, 0), (−1, 0), (0, 1), (0,−1) are saddles since the Hessianmatrix at each point has one positive eigenvalue λ = 2 and one negativeeigenvalue λ = −1 and hence is indefinite.

H(0, 1) = H(0,−1) =(−1 00 2

),

H(1, 0) = H(−1, 0) =(

2 00 −1

).

The contour plot of the function (graphed in Figure 23.6) is quite characteristicof the various type of critical points. The extremizers are surrounded by closedlevel curves while the saddles lie on “degenerate” level sets (which look like twodistinct curves crossing).

174

-1 -0.5 0.5 1x

-1

-0.5

0.5

1

y

Figure 23.6: A contour plot of the function f(x, y) = 14 ((1− x2)2 + (1− y2)2).

Example 23.17. When the Hessian matrix at a critical point is semidefinite(all of the eigenvalues are of one sign except for some that are zero) the secondderivative test gives us limited information. The critical point might be a max-imizer or a minimizer (depending on the sign of the other eigenvalues) but itmight also be a saddle. For example, the functions

f(x, y) = x2 + y4,

g(x, y) = x2 − y4,

both have a critical point at the origin. At that point the Hessian of both ispositive semidefinite matrix

H(x, y) =(

2 00 0

).

However, it is easy to see that f has a local minimum at the origin while g hasa saddle.

23.3 Lagrange Multipliers

As we indicate above, many important max/min problems involve optimizing afunction not in the interior of some nice domain, but over some bizarrely shapedsurface - or even the intersection of several surfaces. One of the many techniquesfor addressing such a problem is called the method of Lagrange multipliers. Inthis short section we will give a brief glimpse of this technique, stating a ratherlimited (but still quite useful) theorem, giving some of the basic ideas of a proofof the theorem, and giving some hints of how these results might be extended.For a more complete treatment see, for example, [8].

The technical statement of the method of Lagrange Multipliers can seemintimidating, but in its simplest form can be quite intuitive. The basic problem

175

involves finding the maximum or minimum of a function f(x) called the ob-jective function over a set defined as the level set of a constraint functiong(x) = c. Since the constraint set is typically an (n − 1)-dimensional “hyper-surface” in Rn we are not looking for an interior extremum, so we can no longeruse the first and second derivative tests of the previous section. Let’s look at asimple example and see if we can’t guess a condition that a minimizer or max-imizer would satisfy. Figure 23.7 displays the graph of an objective functionf(x, y). We are trying to maximize and minimize f , not over the plane but overa constraint curve of the form g(x, y) = c. In Figure 23.8 the constraint curve(in bold) is superimposed on a contour plot of the level curves of f . If you com-pare the graph of f to the level curves, it’s not hard to pick out approximatelywhere the maximum and minimum values of f occur along the constraint curve.(When I’m teaching a class on this subject, I usually select some “volunteer” tocome pick the points from the picture.) Moreover, if we “blow up” our graph,

x

y

f

x

y

Figure 23.7: Graph of the objective function

most people trying to guess the point at which f reaches its maximum choose(consciously or not) according to the following principle.

• If f is at a local maximum or minimum along the constraint curve g = cthen the level curve of f must be tangent to the constraint curve.

In fact, it’s not hard to imagine writing a rigorous proof based on this conjecture,at least in two dimensions2. However, it would be nice to have an analyticcondition (something that would be easier to calculate) than the geometriccondition above. Fortunately, our knowledge of the relationship between thegradient and level surfaces suggests an answer.

2If the two curves are not tangent, then they intersect “transversely” and split the con-straint curve into two branches lying on opposite sides of the level curve of f . Thus f mustbe lower on one branch of the constraint curve and higher on the other. (All of this assumesthat f and g are smooth and have nonzero gradients at the point in question.)

176

-0.5 0 0.5 1

-1.5

-1

-0.5

0

0.5

Figure 23.8: Level curves of the objective function and superimposed on theconstraint set (a level curve of the constraint function).

• If f is at a local maximum or minimum along the constraint curve g = cthen ∇f must be parallel to ∇g at that point.

This suggest the following theorem, which we state in Rn.

Theorem 23.18. Suppose f : Rn → R and g : Rn → R are C1 andx0 ∈ Rn is a local minimizer of f over the constraint set g(x) = c. That is,g(x0) = c, and there exists ε > 0 such that for all x ∈ Rn such that g(x) = cand ‖x− x0‖ < ε we have

f(x0) ≤ f(x).

Suppose also that ∇f(x0) 6= 0. Then there exists λ ∈ Rn such that

∇f(x0) = λ∇g(x0).

Example 23.19. Consider the problem of finding the extrema of the objectivefunction

f(x, y) = xy

over the constraint set

g(x, y) =x2

4+y2

9= 1.

Since the constraint set (an ellipse) is closed and bounded and the objectivefunction is continuous, we know that its maximum and minimum values areattained on the ellipse. By the theorem above the extremizers must satisfy

∇f(x, y) =(yx

)= λ∇g(x, y) = λ

(2x42y9

).

177

-2 -1 1 2x

-3

-2

-1

1

2

3

y

Figure 23.9: Hyperbolic level curves of the objective function f(x, y) = xy andsuperimposed on the elliptical constraint set given by g(x, y) = x2/4+y2/9 = 1.

This gives us two equations in three unknowns

y = λx/2,x = 2λy/9.

A third equation is supplied by the constraint equations g(x, y) = 1.Solving the first equation above for λ and substituting into the second gives

us x = 4y2

9x or x2 = 49y

2. This implies

x = ±23y.

Plugging this into the constraint equation gives us

2y2

9= 1,

ory = ± 3√

2.

This gives us four critical points

(x, y) = (2/√

2, 3/√

2), (−2/√

2, 3/√

2), (2/√

2,−3/√

2), (−2/√

2,−3/√

2).

Since the minimizer and maximizer must exist and must be among these criticalpoints, we simply need to evaluate the objective function and each of the pointsand see where it is largest and smallest. We see that the maximum must occurat two points since f(2/

√2, 3/√

2) = f(−2/√

2,−3/√

2) = 3. Similarly theminimum must occur at two points since f(−2/

√2, 3/√

2) = f(2/√

2,−3/√

2) =−3.

178

Example 23.20. Consider the problem of trying to minimize the linear objec-tive function

f(x, y, z) = 2x− 6y + 4z

over the sphere of radius 14 in R3,

g(x, y, z) = x2 + y2 + z2 = 14.

Our Lagrange multiplier critical points are defined by the equations

∇f(x, y, z) =

2−64

= λ∇g(x, y, z) = λ

2x2y2z

.

Solving the first of these three equations for λ yields

λ =1x.

Substituting this into the other two equations yields

y = −3xz = 2x

Substituting these into the constraint equation gives us

14x2 = 14

or x = ±1. This give us the two critical points (x, y, z) = (1,−3, 2) and(x, y, z) = (−1, 3,−2). Evaluating the objective function at these two pointsreveals that its minimum on the constraint set is f(−1, 3,−2) = −28. (Theother critical point yields the maximum value.)

Problems

Problem 23.1. Find all critical points of the functions below. Apply the secondderivative test to the critical points. State whether the second derivative testtells us that the critical point is a local maximizer, local minimizer, or a saddle.Be clear to distinguish whether the second derivative test tells you that thecritical point in question must be an extremizer or might be an extremizer.(a) f(x, y) = 4x2 + 4xy − 4x+ 6y2 + 4y + 7.(b) f(x, y) = x3 + 3x2y − 3xy2 + y3 − 3.

(c) f(x, y) = x2e−x2

+ y2e−y2.

(d) f(x, y) = x lnx+ y ln y.(e) f(x, y, z) = sin(x2 + yy + z2).

(f) f(x, y, z) = ex3+y3+z3 .

(g) f(x, y, z) = ex2+y2−z2 .

179

Problem 23.2. Suppose (x1, y1), (x2, y2), . . . , (xn, yn) are a collection of npoints in the plane. For any m, b ∈ R we define the sum of the square errorsbetween the points and the line y = mx+ b to be

E(m, b) =∑i=1

(yi − (mxi + b))2.

Find a formula in terms of x1, . . . , xn and y1, . . . yn for the m and b that minimizethis error. The line y = mx+ b is the least squares fit to the given data points.

Problem 23.3. Use the ideas developed in Problem 23.2 to find a formula foconstants a, b, and c that determine the “best fitting” parabola y = ax2 + bx+ cto the collection of data points (x1, y1), (x2, y2), . . . , (xn, yn).

Problem 23.4. Find the length L and width W of the rectangle of area Awith minimum perimeter. Solve the problem in two ways: first, by solving theconstraint equation for one of the two variables and substituting this into theobjective function; second, by Lagrange multipliers.

Problem 23.5. Find the volume of the largest parallelpiped that can be in-scribed in the ellipse

x2

a2+y2

b2+z2

c2= 1.

Problem 23.6. Find the extreme points of the functions f below, subject to thespecified constraint. Graph the constraint curve and level sets of the objectivefunction in each case.(a) f(x, y) = 4x− 2y, subject to x2 + y2 = 4.(b) f(x, y) = xy, subject to 3x+ 2y = 6.(c) f(x, y) = y, subject to x2 + 2y2 = 3.(d) f(x, y) = x2 + y2, subject to x = 4.(e) f(x, y) = y − x, subject to x = cos y.(f) f(x, y) = x2 + y2, subject to xy2 = 1

Problem 23.7. Use Lagrange multipliers to find the (minimum) distance fromthe point (1,−2, 1) to the plane x− 2y+ 3z = 2 and find the point on the planewhere that minimum distance is attained. Show that the vector from that pointto the point (1,−2, 1) is perpendicular to the plane. Hint: Minimizing thesquare of the distance might be easier.

180

Chapter 24

Nonlinear Systems ofEquations

Solving nonlinear problems is hard, no matter if they involve algebraic equations,differential equations or something more exotic. In this chapter we examinesome problems in nonlinear algebraic equations. We do this not because theseproblems are terribly important (though they are). We do it because theseproblems will give us concrete and rigorous examples of the following approachto nonlinear problems that is useful in a number of different contexts.

1. We find a specific solution to our nonlinear problem.

2. We compute the linear approximation of the nonlinear problem at thatsolution.

3. We determine whether the “linearized” problem has unique solutions.

4. If the linearized problem has unique solutions we show that the nonlinearproblem has unique solution for problems “close” to the solution found inthe first step.

24.1 The Inverse Function Theorem

How do we solve a system of n nonlinear equations in n unknowns of the form

f(x) =

f1(x1, . . . , xn)f2(x1, . . . , xn)

...fn(x1, . . . , xn)

=

p1

p2

...pn

= p?

Of course, there is no truly general answer. That is unfortunate, since this isexactly the form of a coordinate transformation. We would like to be able to

181

determine if such a transformation is invertible (both one-to-one and onto). Aswe will see, we won’t be able to give a general answer to the problem, but wewill get something reasonably close.

Before approaching the nonlinear problem, let’s consider the linear case.When can we solve a system of the form

f(x) = Ax = p

where A is an n× n matrix? Fortunately, this is just the situation of The-orem 6.17, which tells us that this problem has a unique solution for everyp ∈ Rn (that is, the function f is invertible) exactly when the n × n matrix Ais invertible.1

How does this help us in solving the general nonlinear problem? Well, weknow that we can define a linear approximation of any nonlinear function. Sincewe know a lot about solving linear problems let us see what happens when wereplace f with its linear approximation. Suppose f(x0) = p0. That is, we havea solution x0 for a particular p0. The linear approximation of f at x0 is definedto be

lf (x0; x) = f(x0) +Df(x0)(x− x0) = p0 +Df(x0)(x− x0).

So the approximate linear problem to f(x) = p is given by

lf (x0; x) = p

which reduces toDf(x0)(x− x0) = (p− p0).

This has a unique solution for every p if and only if the n×n matrix Df(x0) isinvertible. In this case

x = x0 +Df(x0)−1(p− p0).

Of course, one on the most common tests for invertibility of the matrixDf(x0) is to see if its determinant is nonzero. This determinant is importantenough to give it a special name.

Definition 24.1. Let Ω ⊂ Rn and suppose f : Ω → Rn is differentiable atx0 ∈ Ω. We define the Jacobian of f at x0 to be

Jf(x0) = detDf(x0).

We also use the notation

∂(f1, . . . , fn)∂(x1, . . . , xn)

(x0) = Jf(x0) =

∣∣∣∣∣∣∣∂f1∂x1

(x0) · · · ∂f1∂xn

(x0)...

. . ....

∂fn∂x1

(x0) · · · ∂fn∂xn

(x0)

∣∣∣∣∣∣∣ .

1As we would hope, the two definitions of the word “invertible” are compatible.

182

The inverse function theorem says that this condition for the existence of asolution of the approximate linear problem is sufficient to guarantee that theoriginal nonlinear problem has a unique solution for x and p close to x0 and p0.

Theorem 24.2 (Inverse function theorem). Let Ω ⊂ Rn be the domain ofa C1 function

f : Ω→ Rn.

Suppose thatf(x0) = p0,

and suppose that the n× n matrix

Df(x0)

is invertible, that is∂(f1, . . . , fn)∂(x1, . . . , xn)

(x0) 6= 0.

Then there is a ball Vε1(x0) around x0 and a ball Vε2(p0) around p0 suchthat for every p ∈ Vε2(p0) the equation

f(x) = p

has a unique solution x ∈ Vε1(x0). Furthermore, the inverse function

x : Vε2(p0)→ Vε1(x0)

satisfyingf(x(p)) = p

for all p ∈ Vε2(p0) is C1, and

Dx(p) = Df(x(p))−1

so∂(x1, . . . , xn)∂(p1, . . . , pn)

(p) =1

∂(f1,...,fn)∂(x1,...,xn) (x(p))

.

Example 24.3. Consider the system of equations

f(x, y) =(x2 − y2

xy

)=(uv

).

Note that the total derivative matrix Df(x0, y0) is

Df(x, y) =(

2x −2yy x

).

183

The Jacobian is∂(f1, f2)∂(x, y)

(x, y) = 2(x2 + y2).

Thus, at every point except the origin the matrix is invertible. The inversefunction theorem says that other than at the origin, if f(x0, y0) = (u0, v0) thenfor (u, v) sufficiently close to (u0, v0) there exists a unique (x, y) close to (x0, y0)such that

f(x, y) = (u, v).

Note that this does not preclude the possibility that there may be more thanone solution “far away” from the original solution. In fact, we have

f(x, y) = f(−x,−y)

so there is always another solution on the other side of the origin.

24.2 The Implicit Function Theorem

The implicit function theorem concerns the problem of “solving” algebraic sys-tems where there are more unknowns than equations, say n equations in n+ kunknowns. The equations would have the form

f(u,v) = 0

where f ∈ Rn, u ∈ Rk and v ∈ Rn. We usually refer to such systems as“underdetermined.” We don’t expect to be able to solve for all of the unknownsuniquely. The best we can hope for is to solve for n of the unknowns in termsof the remaining k unknowns. If we can do this, we say that our n equationsdefine an “implicit” function v = g(u) from k unknowns u to the remaining nunknowns v.

As we did in the previous section, let us examine the linear case, both toget some ideas on reasonable conditions for the existence of a solution and tointroduce some new notation. A general linear problem of n equations in n+ kunknowns can be written in the form

Au +Bv = b

where A is an n× k matrix, u ∈ Rk, B is an n×n matrix, v ∈ Rn, and b ∈ Rn.We think of A, B, and b as constant and u and v as unknown. Once again, theissue of solvability can be addressed directly by Theorem 6.17. These equationscan be solved uniquely by an implicit function v = g(u) if and only if B isinvertible. In this case we can define

v = g(u) = −B−1Av +B−1b,

and a simple computation shows that

Au +Bg(u) = b.

184

As above, we wish to apply our conditions for the solution of the linearproblem to the linear approximation to the general nonlinear problem. To dothis we will introduce some new notation. Given an n×k matrix A and an n×nmatrix B we can define an n× (n+ k) partitioned matrix or block matrixby

C =(A B

)=

a11 · · · a1k b11 . . . b1n...

. . ....

.... . .

...an1 · · · ank bn1 . . . bnn

,

and given u ∈ Rk and v ∈ Rn we can define a partitioned vector in Rn+k by

x =(

uv

)=

u1

...ukv1

...vn

.

Then the system of equations above can be written in the form

Cx =(A B

)( uv

)= Au +Bv = b.

When the independent variable of a nonlinear function is written as a parti-tioned vector it is natural to write the total derivative matrix of the function asa partitioned matrix. For instance, suppose that as above we write a functionf : Rn+k → Rn as

f(u,v)

where f ∈ Rn, u ∈ Rk and v ∈ Rn. We define Duf(u0,v0) to the n× k matrix

Duf(u0,v0) =

∂f1∂u1

(u0,v0) · · · ∂f1∂uk

(u0,v0)...

. . ....

∂fn∂u1

(u0,v0) · · · ∂fn∂uk

(u0,v0)

and Dvf(u0,v0) to be the n× k matrix

Dvf(u0,v0) =

∂f1∂v1

(u0,v0) · · · ∂f1∂vn

(u0,v0)...

. . ....

∂fn∂v1

(u0,v0) · · · ∂fn∂vn

(u0,v0)

.

Then the n × (n + k) total derivative matrix Df(u0,v0) can be written as thepartitioned matrix

Df(u0,v0) =(Duf(u0,v0) Dvf(u0,v0)

),

185

and the linear approximation of f can be written.

lf ((u0,v0); (u,v)) = Duf(u0,v0)(u− u0) +Dvf(u0,v0)(v − v0) + f(u0,v0).

Looking ahead a bit, we define the Jacobian determinant

∂(f1, . . . , fn)∂(v1 . . . , vn)

(u0,v0) =

∣∣∣∣∣∣∣∂f1∂v1

(u0,v0) · · · ∂f1∂vn

(u0,v0)...

. . ....

∂fn∂v1

(u0,v0) · · · ∂fn∂vn

(u0,v0)

∣∣∣∣∣∣∣ .We now return to the original nonlinear problem f(u,v) = 0. Suppose we

know one solution f(u0,v0) = 0. Then the approximate linear problem at thatsolution is

lf ((u0,v0); (u,v)) = Duf(u0,v0)(u− u0) +Dvf(u0,v0)(v − v0) = 0.

Comparing to the general linear problem above, we see that if Dvf(u0,v0) isinvertible, that is if

∂(f1, . . . , fn)∂(v1 . . . , vn)

(u0,v0) 6= 0,

then we can define

g(u) = v0 −Dvf(u0,v0)−1Duf(u0,v0)(u− u0).

Simply plugging in to the approximate linear problem, we see that for anyu ∈ Rk this satisfies

lf ((u0,v0); (u,g(u))) = 0.

As in the case of the inverse function theorem, the implicit function theoremsays that we can go further. If this condition for solvability of the linearizedproblem is satisfied, then the original nonlinear problem can be solved “close”to the initial solution.

186

Theorem 24.4 (Implicit function theorem). Let Ω ⊂ Rn+k be the domainof a C1 function f : Ω→ Rn. Suppose that there is a u0 ∈ Rk and v0 ∈ Rnsuch that the interior point (u0,v0) ∈ Ω satisfies

f(u0,v0) = 0.

If in addition, the n× n matrix

Dvf(u0,v0) =

∂f1∂v1

(u0,v0) · · · ∂f1∂vn

(u0,v0)...

. . ....

∂fn∂v1

(u0,v0) · · · ∂fn∂vn

(u0,v0)

is invertible, i.e.

∂(f1, . . . , fn)∂(v1 . . . , vn)

(u0,v0) 6= 0.

then there is a ball Vε(u0) ⊂ Rk about u0 and a continuous function g :Vε(u0)→ Rn such that

g(u0) = v0,

and for every u ∈ Vε(u0)f(u,g(u)) = 0.

Problems

Problem 24.1. Let f : R2 → R2 be given by

f(x1, x2) =(x2

1 + 2x22

2x21 + x2

2

).

(a) Calculate the Jacobian∂(f1, f2)∂(x1, x2)

.

(b) At what points (x1, x2) ∈ R2 does the inverse function theorem guaranteethat f is locally invertible?(c) Calculate

∂(x1, x2)∂(f1, f2)

(−1, 2).

(d) Calculate the inverse function in the open second quadrant, x1 < 0, x2 > 0.

Problem 24.2. Let f : R2 → R2 be given by

f(x1, x2) =(

3x21 − 2x2

2

x21 + x2

2

).

187

(a) Calculate the Jacobian∂(f1, f2)∂(x1, x2)

.

(b) At what points (x1, x2) ∈ R2 does the inverse function theorem guaranteethat f is locally invertible?(c) Calculate

∂(x1, x2)∂(f1, f2)

(−2,−1).

(d) Calculate the inverse function in the open fourth quadrant, x1 < 0, x2 > 0.

Problem 24.3. Calculate

∂(x, y, z)∂(r, θ, z)

and∂(r, θ, z)∂(x, y, z)

for the cylindrical coordinate transformation xyz

=

r cos θr sin θz

.

Problem 24.4. Calculate

∂(x, y, z)∂(ρ, θ, φ)

and∂(ρ, θ, φ)∂(x, y, z)

for the spherical coordinate transformation xyz

=


.

Problem 24.5. Show that the equations

f(u1, u2, v1, v2, v3) =

u1 + 3u2 − v1 + v2

−u1 + 2u2 + 2v1 + 2v2

3u1 − u2 − 3v3

=

000

,

define an implicit function g : R3 → R2 where f(u,g(u)) = 0. Find g explicitly.


f(u1, u2, u3, v1, v2) =(

5u1 + 3u2 − 2u3 − 3v1 + 2v2

4u1 − 2u2 + u3 + v1 − 2v2

)=(

00

),

define an implicit function g : R3 → R2 where f(u,g(u)) = 0. Find g explicitly.

188


f(u1, u2, v1, v2) =(u2

1u2 + u1u22 + v2

1 − v22

eu1+u2 − v2

)=(

00

),

define an implicit function g : R2 → R2 where f(u,g(u)) = 0 in a neighborhoodof the point (u0,v0) = (0, 0, 1, 1).


f(u1, u2, v) =(

eu1 cos v − u2

eu1 sin v − u2 + 1

)=(

00

),

define an implicit function g : R2 → R where f(u, g(u)) = 0 in a neighborhoodof the point (u0, v0) = (0, 1, 0).

Problem 24.9. An (n − 1)-dimensional surface in Rn is usually described inone of three ways:

1. As a level set of a function: φ(x1, . . . , xn) = λ.

2. As a graph: xi = ψ(x1, . . . , xi−1, xi+1, . . . , xn).

3. By a parametrization, x = g(y), where g : Rn−1 → Rn.

Give conditions under which a surface can be described (at least locally) usingany one of these methods. That is, under what conditions can a level set bewritten as a graph, a graph as a parameterization, a parameterization as a levelset, etc.?

189

Part III

Integral Calculus of SeveralVariables

190

Chapter 25

Introduction to IntegralCalculus

In this part of the book we consider integrals of vector and scalar functionsdefined on Rn. In doing so, we have to confront some real conceptual problemsdealing with the wide variety of domains of integration possible as subsets ofRn. This issue never comes up in the calculus of a single variable. In mostapplications, the only “sensible” subsets of the real line over which we mightwish to integrate are simple intervals. However, in Rn there are many usefultypes of subsets over which to integrate. We will concentrate on three:

1. n-dimensional volumes,

2. 1-dimensional curves,

3. (n− 1)-dimensional “surfaces.”

The last item might cause you to pause a bit. You should have a good ideaof what a two-dimensional surface in R3 looks like. But what is a 4-dimensional“surface” in R5? More generally, what do we mean by the “dimension” of aregion? How do we define its “area?”1 Unfortunately, a complete answer tothis question is beyond the scope of this book. It involves (at least) study of asubject called “measure theory” that is usually taught in more advanced analysiscourses. In order to give the reader the ability to do basic integral calculationswith a pretty good understanding of their theoretical basis this text containsthe following elements:

1. Quick sketches of some rigorous definitions of concepts from measure the-ory,

2. Practical formulas for computation of various integrals,1Heck, what is the right word for its size? Area? Volume?

191

3. Plausibility arguments (a polite way of saying “bad proofs”) connectingour practical formulas with traditional notions of length, area, and volume.

4. References to texts in measure theory that give complete, rigorous proofsof the connection between our practical formulas and the fundamentaldefinitions of the concepts involved.

We begin our study of integral calculus by reviewing the basic results fromthe calculus of a single variable. Consider a real-valued function defined on abounded interval f : [a, b]→ R. We would like to define the definite integral∫ b

a

f(x) dx

to the be the area between the graph of f and the x-axis.In order to do this, we are forced to ask ourselves what we really know

about the concept of area. If we go back far enough, all definitions of areacan be derived from the definition of the area of a rectangle. In elementaryintegral calculus we define the area under a curve by approximating the areaby a collection of rectangles called a Riemann sum. This is usually done inelementary calculus texts by creating a uniform partition of the interval [a, b]by defining

xi = a+ i(b− a)N

, i = 0, 1, 2, . . . , N.

for N ∈ N. This divides the interval [a, b] into N subintervals [xi−1, xi]. Fromeach of these subintervals we choose a sample point ci ∈ [xi−1, xi]. Usingthese we define the Riemann sum

N∑i=1

f(ci)(xi − xi−1) =N∑i=1

f(ci)∆xi.

This is simply the sum of the area of N rectangles with height f(ci) and width∆xi = (xi − xi−1). (We use the convention that area below the x-axis is nega-tive.)

If the limit

limN→∞

N∑i=1

f(ci)(xi − xi−1)

exists and is independent of the choice of sample points, we say that the functionf is Riemann integrable2 and write∫ b

a

f(x) dx = limN→∞

N∑i=1

f(ci)(xi − xi−1)

2More advanced texts (see, e.g. [1, Chapter 7]) differ in several ways from elementarypresentations of the integral. For instance, the integral is defined using arbitrary rather thanuniform partitions. And the concepts of greatest lower bound and least upper bound are usedin place of sequential limits. These changes allow for a much more rigorous presentation, butare somewhat less intuitive.

192

-

6

x

y

a = x0 x1 x2 x3 x4 x5 x6 = b

Figure 25.1: Approximating the area under a curve with a Riemann sum of thearea of rectangles.

The obvious question then arises: which functions are Riemann integrable?Fortunately, one can show that every continuous function is Riemann integrable.This result can be extended to functions with simple discontinuities. (Theseresults are often stated without proof in elementary texts since a rigorous proofusually uses a concept called “uniform continuity” which is seldom covered inelementary courses.)

Once we know that the definite integral or the area under a curve is welldefined for a large class of functions we are left with the problem of tryingto calculate it. The fundamental theorem of calculus provides us with arelatively easy way of performing this task. While we won’t be discussing vectorcalculus analogs of the fundamental theorem until Part IV, we will be using theone-dimensional version to calculate integrals in Rn, so we review it here3.

Theorem 25.1. Suppose f : [a, b] → R is Riemann integrable. If F :[a, b]→ R satisfies F ′(x) = f(x) for all x ∈ [a, b] then∫ b

a

f(x) dx = F (b)− F (a).

Thus, we can calculate an integral over an interval by finding (guessing really)an “anti-derivative” of the function we are trying to integrate and evaluating it

3Actually, this is only half of the fundamental theorem. The other half says that if f iscontinuous on [a, b] then g(x) =

R xa f(s) ds is differentiable on [a, b] and g′(x) = f(x).

193

at the boundary points of the interval. We will use this technique repeatedly inPart III, and we will generalize the theorem in Part IV.

194

Chapter 26

Riemann Volume in Rn

In this chapter, we define the n-dimensional Riemann volume of a set in Rn.This is a specific example of a measure – a type of function on a set designed torepresent the size of a set. More advanced courses on the theory of integrationconsider more sophisticated measures that can evaluate the size of rather strangesets. We consider one such measure in Chapter 29.

Let Ω ⊂ Rn be a bounded region. For a given N ∈ N we create a uniformgrid over all of Rn. We define

xk,ik =ikN, ik = 0,±1,±2, . . . , k = 1, 2, . . . , n.

This grid of orderN divides Rn into rectangles (specifically cubes). For indicesik = 0,±1,±2, . . . ,±∞, k = 1, 2, . . . , n we label the rectangles

Ri1,i2,...,in = (x1, x2, . . . , xn) ∈ Rn | xk,ik−1 ≤ xk ≤ xk,ik , k = 1, 2, . . . , n.

The volume of each n-dimensional rectangle is

∆VN = ∆x1,i1∆x2,i2 . . .∆xn,in

= (x1,i1 − x1,i1−1)(x2,i2 − x2,i2−1) . . . (xn,in − xn,in−1) =1Nn

.

We now define the following subsets of the collection of grid rectangles.

1. We say that Ri1,i2,...,in is an inner rectangle of Ω if it lies completely inΩ. That is,

Ri1,i2,...,in ⊂ Ω.

We use CI(Ω) to denote the union of all the inner rectangles of Ω.We let KI,N (Ω) be the number of inner rectangles in the grid of orderN . This number must be finite since Ω is bounded.

2. We say that Ri1,i2,...,in is an outer rectangle of Ω if there is at least onepoint in Ω inside of Ri1,i2,...,in . That is,

Ri1,i2,...,in ∩ Ω 6= ∅.

195

We use CO(Ω) to denote the union of all the outer rectangles of Ω.We let KO,N (Ω) be the number of outer rectangles in the grid of orderN . Again, this number must be finite since Ω is bounded.

Note the following.

• Every inner rectangle is also an outer rectangle. Furthermore,

CI(Ω) ⊆ Ω ⊆ CO(Ω),

andKI,N (Ω) ≤ KO,N (Ω).

• The volume of CI(Ω) is simply the sum of the volumes of all the rectanglesthat are included in the set. Since we have a uniform grid, this has a simpleformula ∑

Ri1,i2,...,in⊂CI(Ω)

∆VN = ∆VNKI,N (Ω).

• Similarly, the volume of CO(Ω) is given by∑Ri1,i2,...,in⊂CO(Ω)

∆VN = ∆VNKO,N (Ω).

We can now define the volume of Ω.

Definition 26.1. If

V (Ω) = limN→∞

∆VNKI,N (Ω) = limN→∞

∆VNKO,N (Ω)

(that is, if both limits exist and are equal) then we say V is the n-dimensional Riemann volume of Ω ⊂ Rn.

Example 26.2. Let Ω be the triangle

Ω = (x, y) ∈ R2 | 0 ≤ y ≤ x, 0 ≤ x ≤ 1.

We divide the plane into a uniform grid. The inner rectangles all lie below thediagonal line. We can count them using the identity

1 + 2 + 3 + · · ·+ (n− 2) + (n− 1) =n(n− 1)

2.

We get

KI,N (Ω) =N(N − 1)

2.

196

-

6y

x -

6y

x

Figure 26.1: Inner and outer rectangles of a triangle.

Counting the outer rectangles (don’t forget the diagonal row of rectangles thattouch the triangle at one corner) gives us

KO,N (Ω) = 4(N + 1) +N(N − 1)

2.

We now calculate the 2-dimensional Riemann volume (area)

V (Ω) = limN→∞

∆VNKI,N (Ω) = limN→∞

N(N − 1)2N2

= limN→∞

∆VNKO,N (Ω) = limN→∞

8(N + 1) +N(N − 1)2N2

=12.

This is, of course, the same result as the traditional formula for the area of thetriangle.

Example 26.3. Consider the line

L = (x, 0, 0) ⊂ R3 | x ∈ (0, 1).

To see that this one-dimensional object in R3 has zero 3-dimensional Riemannvolume note that there are no inner rectangles in any grid of order N . (So thevolume of CI is zero.) The line is surrounded by four rows of outer rectanglesso that

CO(L) = (x, y, z) ⊂ R3 | x ∈ [−1/N, 1 + 1/N)], y, z ∈ [−1/N, 1/N ].

The volume of CO(L) is 4(1 + 2/N)/N2. This goes to zero in the limit, so the3-dimensional Riemann volume of L is zero.

It is pretty easy to see (if not prove) that all “lower-dimensional” objects inRn will have zero n-dimensional Riemann volume. Thus, we will need anothertool if we are to distinguish between the size of curves, surfaces, and other suchobjects.

197

Remark 26.4. In this exposition we have used a uniform grid on Rn. Thismakes our notation slightly easier to read and makes the grid easy to visualize.However, it isn’t the most general way of setting up an appropriate grid. Inaddition, once the more advanced machinery necessary for rigorous proofs ofour theorems is set up, the insistence on a uniform grid can make the proofssomewhat harder. All of the definitions above can be adapted to rectangulargrids with variable side lengths as long as the length of the longest side goes tozero.

Problems

Problem 26.1. Use the definition of Riemann volume to calculate the area ofthe open unit square S = (x, y) ∈ R2 | 0 < x < 1, 0 < y < 1. That is, findexplicit formulas for the volume of the inner and outer rectangles in a grid oforder N and show that these volumes have a common limit.

Problem 26.2. Use the definition of Riemann volume to calculate the area ofthe closed unit cube S = (x, y, z) ∈ R3 | 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, 0 ≤ z ≤ 1.That is, find explicit formulas for the volume of the inner and outer rectanglesin a grid of order N and show that these volumes have a common limit.

Problem 26.3. Let a > 0 and b > 0 be given. Use the definition of Riemannvolume to calculate the area of the open rectangle S = (x, y) ∈ R2 | 0 <x < a, 0 < y < b. That is, find explicit formulas for the volume of the innerand outer rectangles in a grid of order N and show that these volumes have acommon limit.

Problem 26.4. Consider the set Ω = (x, y) ∈ R2 | 0 < x < 1, 0 < y <1, x, y ∈ Q of all points in the unit square with rational coordinates. Findexplicit formulas for the volume of the inner and outer rectangles in a grid oforder N . Show that these volumes do not have a common limit. Conclude thatΩ does not have a well-defined Riemann area.

198

Chapter 27

Integrals Over Volumes inRn

In this chapter we will define the integral of a real-valued function over regionswith nonzero n-dimensional volume in Rn.

27.1 Basic Definitions and Properties

We begin with the basic definition of the Riemann integral. Let Ω ⊂ Rn bea bounded region with well defined, strictly positive Riemann volume. Letf : Ω → R be a real-valued function. As in the previous section we define auniform grid of order N on Rn, and we let CI(Ω) be the collection of innerrectangles contained in Ω. In each rectangle Ri1,i2,...,in ⊂ CI(Ω) we choose asample point

ci1,i2,...,in ∈ Ri1,i2,...,inWe can now define a Riemann sum over the inner rectangles

I(Ω, f,N, c) =∑CI(Ω)

f(ci1,i2,...,in)∆VN .

Here the sum is over the finite collection of rectangles in CI(Ω). Our notationemphasizes that this sum depends on the domain Ω, the function f , the grid oforder N , and set of sample points c.

199

Definition 27.1. If the limit

limN→∞

I(Ω, f,N, c)

exists and is independent of the choice of sample points, we say that thefunction f is Riemann integrable on Ω. We write∫

Ω

fdV = limN→∞

R(Ω, f,N, c).

Remark 27.2. It is important to note that our definition of the integral isbased on the fundamental notion of the volume of rectangular boxes. Thus, it iscrucial that we have used a Cartesian coordinate system to describe the domainand range of the function.

Remark 27.3. There are many notations for integrals:

• We will use dV for the differential element unless we wish to emphasize thedependence of the integral on the Cartesian coordinate system, in whichcase we will use dV (x1, x2, . . . , xn).

• In the case of integrals over sets in R2 we will use the symbol dA ratherthan dV.

• A variety of other symbols for the differential element such as

dV ∼ dVn ∼ dx ∼ dx1 dx2 . . . dxn.

Some dispense with it altogether, and in the present context it doesn’treally add any information that is not given by the integral sign and thespecification of the domain and the function. However, as we collect avariety of types of integrals over different domains, a little redundant in-formation can be helpful.

• While we have defined the integral over an n-dimensional volume in Rnusing a single integral symbol regardless of the dimension of the domain,it is very common to use two integral signs for Riemann integrals overregions Ω ⊂ R2 ∫∫

Ω

f dA ∼∫

Ω

f dA

and three integral signs for Riemann integrals over regions Ω ⊂ R3∫∫∫Ω

f dV ∼∫

Ω

f dV.

200

While such reminders can be helpful, this can clearly become cumbersomein higher dimensions. Furthermore, the dimension of the integral is usuallyclear from the nature of the domain. We will use both notations in thistext, choosing the one that seems to make the exposition clearer. (Ofcourse, when we define iterated integrals below, multiple integral signswill become a necessity.)

As was the case for functions of a single real variable, one can show that alarge class of functions (continuous functions) are Riemann integrable.

Theorem 27.4. Let Ω ⊂ Rn have a well defined positive Riemann volume.Suppose f : Ω→ R is continuous. Then f is Riemann integrable on Ω.

We will not prove this here. The proof is given in many advanced calculus texts,For example, see [1, 5].

27.2 Basic Properties of the Integral

The Riemann sum definition of the integral allows us to deduce many basicproperties. We will skip most of the proofs for the sake of brevity, but they arenot all that difficult and we will note some of the basic ideas.

The first theorem involves the basic property of linearity.

Theorem 27.5. Let Ω ⊂ Rn have a well defined positive Riemann volume,and suppose f : Ω → R and g : Ω → R are Riemann integrable. Then if αand β are any constants, αf + βg is Riemann integrable and∫

Ω

αf + βg dV = α

∫Ω

f dV + β

∫Ω

g dV.

The proof of this follows directly from the definition of a Riemann sum. Eachsum has a finite number of terms and each of the terms of the sum is linear inthe function f . Thus, we can simply use the distributive law to decompose thesum and then take the limit of both sides.

The next theorem involves splitting the domain of integration up into smallersubsets.

201

Theorem 27.6. Suppose Ω1 and Ω2 are disjoint sets in Rn with well definedpositive Riemann volume. Let

Ω = Ω1 ∪ Ω2,

and suppose f : Ω→ R is Riemann integrable. Then f is Riemann integrableover each of the subsets Ω1 and Ω2 and∫

Ω

f dV =∫

Ω1

f dV +∫

Ω2

f dV.

Similarly, if f is Riemann integrable over each of the subsets Ω1 and Ω2

then f is Riemann integrable union Ω and the equation above holds.

While most people consider this to be “obvious,” the proof is a bit more delicatesince the inner rectangles of Ω don’t split neatly into the inner rectangles of Ω1

and Ω2. We will leave it to more advanced texts. Note, however that it impliesthat functions with discontinuities at the boundary of Riemann volumes areRiemann integrable if they are integrable over the relevant subsets.

The next theorem involves inequalities between integrals. It states the nottoo surprising result that the (generalized) volume under the graph of a bigfunction is bigger than the volume under the graph of a small function.

Theorem 27.7. Let Ω ⊂ Rn have a well defined positive Riemann volumeV (Ω), and suppose f : Ω→ R and g : Ω→ R are Riemann integrable. If

f(x) ≤ g(x)

at every x ∈ Ω then ∫Ω

f dV ≤∫

Ω

g dV.

In particular, if m1 and m2 are constants such that

m1 ≤ f(x) ≤ m2

at every x ∈ Ω then

m1V (Ω) ≤∫

Ω

f dV ≤ m2V (Ω).

The proof of this follows directly from the formula for the Riemann sum.The following result follows immediately from the previous theorem.

202

Corollary 27.8. Let Ω ⊂ Rn have a well defined positive Riemann volume,and suppose f : Ω→ R is Riemann integrable. Then∣∣∣∣∫

Ω

f dV

∣∣∣∣ ≤ ∫Ω

|f | dV.

Proof. Note if one wants to prove an inequality involving absolute values of theform

|a| ≤ b

one effectively needs to prove two inequalities.

−b ≤ a ≤ b.

In our case this is easy, since by the basic properties of the absolute value wehave

−|f | ≤ f ≤ |f |.

Thus, by the previous theorem

−∫

Ω

|f | dV ≤∫

Ω

f dV ≤∫

Ω

|f | dV.

This gives us our result by the observation above.

Our final result is an integral version of the mean value theorem. Itsay that a continuous function must attain it average value somewhere in thedomain of integration.

Theorem 27.9. Let Ω ⊂ Rn have a well defined positive Riemann volumeV (Ω). Suppose f : Ω→ R is continuous. Then there is a point x0 ∈ Ω suchthat ∫

Ω

f dV = f(x0)V (Ω).

27.3 Integrals Over Rectangular Regions

While Riemann sums (or more sophisticated methods of estimating integrals)are standard tools for computer calculations, they are not easy to use for handcalculations. Furthermore, they provide us only an estimate for the integral, notits exact value. The next two sections will give us a method of exact calculationof the integral using the one-dimensional version of the fundamental theorem ofcalculus. We begin with the simplest situation where the domain of the functionis a rectangular region

R = x ∈ Rn | ai ≤ xi ≤ bi

203

and f : R → R is a real-valued function.As with the one-dimensional fundamental theorem, the method here is based

on finding antiderivatives. However, in this case we have to find anti-partialderivatives. We say that F : R → R is an antiderivative of f with respect to xiif

∂F

∂xi= f.

For example, if f(x, y, z) = x2y3z then an antiderivative with respect to x is13x

3y3z while an antiderivative with respect to z is 12x

2y3z2, and so on. If wethink of the functions xi 7→ f with all other variables fixed as functions of onevariable, then the elementary fundamental theorem of calculus gives us∫ bi

ai

f(x1, x2, . . . , xi, . . . , xn) dxi = F (x1, x2, . . . , bi, . . . , xn)

−F (x1, x2, . . . , ai, . . . , xn).

We refer to this calculation as the integral of f with respect to the single variablexi from bi to ai.

We can use this technique of integrating a function of several variables withrespect to a single variable to calculate an integral over an n-dimensional rect-angle. Our next theorem says two things.

1. The integral of any Riemann integrable function over an n-dimensionalrectangle can be calculated by an iterated integral in which we integratewith respect to each of the n variables - one at a time.

2. These n integrals can be performed in any order that is convenient. Everyorder yields the same result.

It’s worth remarking that the second part of the theorem had better be true if thefirst part is to be of any use. It would be pretty disquieting if the calculation ofan integral depended on how we numbered the axes of our Cartesian coordinatesystem.

Theorem 27.10 (Fubini). If f : R → R is Riemann integrable then∫Rf(x) dV =

∫ bn

an

(· · ·∫ b2

a2

(∫ b1

a1

f(x1, . . . , xn−1, xn) dx1

)dx2 . . .

)dxn

Furthermore, the integrations with respect to the n coordinates can be donein any order with the same result.

We won’t prove this theorem. However, it is pretty easy to see the generalidea of the proof. Essentially, we can group the factors in each term of ourRiemann sum so that they are arranged like the appropriate iterated integrals.

204

For instance, for a two-dimensional example we can write the Riemann sum inthe following two ways –∑

i1

(∑i2

f(ci1,i2)∆x2

)∆x1 =

∑i2

(∑i1

f(ci1,i2)∆x1

)∆x2.

Of course, the trick is to prove rigorously that in the limit as the grid becomesinfinitely fine this becomes∫ b1

a1

(∫ b2

a2

f(x1, x2) dx2

)dx1 =

∫ b2

a2

(∫ b1

a1

f(x1, x2) dx1

)dx2.

Though the proof is not easy, the basic approach is clear.

Example 27.11. Let R = (x, y) ∈ R2 | 0 ≤ x ≤ 2, 1 ≤ y ≤ 3. We first doan iterated integral with x followed by y.∫

R6x2y dA =

∫ 3

1

(∫ 2

0

6x2y dx

)dy

=∫ 3

1

2x3y∣∣x=2

x=0dy

=∫ 3

1

16y dy = 64.

Reversing the order of integration gives us the same outcome∫R

6x2y dA =∫ 2

0

(∫ 3

1

6x2y dy

)dx

=∫ 2

0

3x2y2∣∣y=3

y=1dx

=∫ 2

0

27x2 − 3x2 dx

= 8x3∣∣20

= 64.

For integration of functions of a single variable, by far the most commondomain of integration is an interval - the same type of domain used in the ba-sic definition of the integral. Unfortunately, for functions of several variables,we often wish to integrate over nonrectangular volumes. This causes signifi-cant problems in calculating these integrals. In this section we give the readerthe tools with which to do the job (though we we only describe a few simpleapplications).

27.4 Integrals Over General Regions in R2

We begin with R2. We describe two types of regions over which the calculationof the integral is relatively easy.

205

Definition 27.12. Suppose Ω ⊂ R2 has a well defined positive Riemannvolume.

1. If there exist constants a < b and functions y1 : [a, b] → R and y2 :[a, b]→ R such that

Ω = (x, y) ∈ R2 | a < x < b, y1(x) < y < y2(x)

we say that Ω is simple in the y-direction, or y-simple.

2. If there exist constants c < d and functions x1 : [c, d] → R and x2 :[c, d]→ R such that

Ω = (x, y) ∈ R2 | c < y < d, x1(y) < x < x2(y)

we say that Ω is simple in the x-direction, or x-simple.

While this is the most useful form of the definition, it can be summarized asfollows.

• A region is y-simple if

1. The region lie between two vertical lines,

2. Every vertical line between those two lines touches the boundary ateither one or two points.

• A region is x simple if

1. The region lies between two horizontal lines,

2. Every horizontal line between those two lines touches the boundaryat either one or two points.

In Figure 27.1 we graph a the y-simple domain

Ω1 = (x, y) | − 1 < x < 2, x2 < y < x+ 2.

Note that this is also an x-simple domain. However, it is much easier to describeas a y-simple domain since the function bounding the domain on the left wouldhave to be “defined piecewise,” using different formulas for different values of y.That is

Ω1 = (x, y) | 0 < y < 4, f(y) < x <√y,

where

f(y) =−√y, 0 < y < 1y − 2, 1 ≤ y < 4.

There is nothing wrong with this, but it can make calculation more difficult.

206

-

6y

x

Figure 27.1: The y-simple region Ω1 = (x, y) | − 1 < x < 2, x2 < y < x+ 2.

Figure 27.2 displays the graph of the x-simple region

Ω2 = (x, y) | − 1 < y < 1, 2y2 − 1 < x < y2.

Note that this is not a y-simple region since vertical lines can cross the boundaryat up to four places.

-

6

?

y

x

Figure 27.2: The x-simple region Ω2 = (x, y) | −1 < y < 1, 2y2−1 < x < y2.

Our basic theorem is a version of Fubini’s theorem given above for rectan-gular regions.

207

Theorem 27.13. Suppose Ω ⊂ R2 has a well defined positive Riemannvolume and f : Ω→ R is Riemann integrable.

1. If Ω is y-simple then∫∫Ω

f dA =∫ b

a

(∫ y2(x)

y1(x)

f(x, y) dy

)dx.

2. If Ω is x-simple then∫∫Ω

f dA =∫ d

c

(∫ x2(y)

x1(y)

f(x, y) dx

)dy.

The comments on the idea of the proof given for our first version of Fubini’stheorem apply here as well.

Example 27.14. We use the y-simple region Ω1 described above to calculate∫∫Ω1

2(x+ y) dA =∫ 2

−1

∫ x+2

x22(x+ y) dy dx

=∫ 2

−1

2xy + y2∣∣y=x+2

y=x2 dx

=∫ 2

−1

−x4 − 2x3 + 3x2 + 8x+ 4 dx

=18910

Example 27.15. Similarly, we can use the x-simple region Ω2 described aboveto calculate ∫∫

Ω2

2xy2 dA =∫ 1

−1

∫ y2

2y2−1

2xy2 dx dy

=∫ 1

−1

x2y2∣∣x=y2

x=2y2−1dy

=∫ 1

−1

−3y6 + 4y4 − y2 dy

=8

105

27.5 Change of Order of Integration in R2

Of course, there are lots of situations where a region is simple in both directions.In that case we can compute an iterated integral in either order and get the same

208

answer.

-

6y

x

x = 0

y = 3

y = 3x

x = y3

Figure 27.3: Triangular region of integration Ω.

For instance, suppose Ω is the triangle

Ω = (x, y) | 0 < x < 1, 3x < y < 3.

Of course we can also describe Ω as an x-simple region

Ω = (x, y) | 0 < y < 3, 0 < x < y/3.

Let’s integrate the function f(x, y) = 12xy2 using the two possible iteratedintegrals. We start by integrating y before x.∫∫

Ω

12xy2 dV =∫ 1

0

∫ 3

3x

12xy2 dy dx

=∫ 1

0

4xy3∣∣y=3

y=3xdx

=∫ 1

0

4x(27− 27x3) dx

= 54x2 − 1085x5

∣∣∣∣10

=1625.

209

As expected, doing the integration in the other order gives the same result.∫∫Ω

12xy2 dV =∫ 3

0

∫ y/3

0

12xy2 dx dy

=∫ 3

0

6x2y2∣∣x=y/3

x=0dy

=∫ 3

0

23y4 dy

=215y5

∣∣∣∣30

=1625.

As you might expect, sometimes there are advantages to choosing one orderof integration over the other. For instance, suppose we wish to integrate thefunction g(x, y) = 54x cos(y3) over the triangle Ω given above. One of theiterated integrals ∫ 1

0

∫ 3

3x

54x cos(y3) dy dx (27.1)

cannot be integrated in closed form. However, the other order of integration istractable. ∫ 1

0

∫ 3

3x

54x cos(y3) dy dx =∫ 3

0

∫ y/3

0

54x cos(y3) dx dy

=∫ 3

0

27x2 cos(y3)∣∣x=y/3

x=0dy

=∫ 3

0

3y2 cos(y3) dy

= sin(y3)∣∣30

= sin(27).

Remark 27.16. At the end of this chapter there are several problems in whichyou will be asked to do iterated integrals like (27.1) where you must change theorder of integration to do the computation. My best advice to you is alwaysdraw a picture of the region of integration. It is always worth the time no matterhow obvious you think the change in the limits.

Of course, not all regions in the plane are simple. For such regions, ourstrategy is to express the domain of integration as the union of a collection ofsimple regions as illustrated in Figure 27.4. Examples of this are left to theproblems.

Unfortunately, one can easily construct examples of domains that cannot bebroken up into a finite collection of simple domains. Figure 27.5 displays a pairof exponentially decaying spiral curves. The region between them cannot bebroken up into a finite collection of simple domains since the curves cross bothaxes infinitely often. Of course, this is rarely a problem in practice.

210

-

6y

x

Ω

-

6y

x

Figure 27.4: A non-simple region broken up into four y-simple regions.

x

y

Figure 27.5: The area between the two curves is meant to suggest an infinitespiraling domain that cannot be written as the union of a finite collection ofsimple domains.

27.6 Integrals over Regions in R3

In R2 we describe a region as simple if it lies between the graphs of two functionsof one variable defined on a common interval. In R3 we describe a region assimple if (1) it lies between the graphs of two functions of two variables witha common domain in a plane and (2) the common domain is a simple regionin the plane. Since there are three possible coordinate planes and two possibledirections for the planar domain to be simple, there would be six combinationsof coordinates for which we could describe a version of Fubini’s theorem for asimple region in R3. We will give one version and leave the rest to the reader.

211

Theorem 27.17. Suppose that a region Ω ∈ R3 can be described in thefollowing way. There are constants

a < b,

and continuous functions y1 : [a, b]→ R and y2 : [a, b]→ R with

y1(x) ≤ y2(x)

for all x ∈ [a, b]. These define a domain

Ω′ = (x, y) ∈ R2 | a ≤ x ≤ b, y1(x) ≤ y ≤ y2(x).

On the domain Ω′ there are two continuous functions z1 : Ω′ → R andz2 : Ω′ → R with

z1(x, y) ≤ z2(x, y)

for all (x, y) ∈ Ω. Finally, we can describe

Ω = (x, y, z) ∈ R3 | a ≤ x ≤ b, y1(x) ≤ y ≤ y2(x), z1(x, y) ≤ z ≤ z2(x, y).

Then if f : Ω→ R is Riemann integrable on Ω we have∫∫∫Ω

f dV =∫ b

a

∫ y2(x)

y1(x)

∫ z2(x,y)

z1(x,y)

f(x, y, z) dz dy dx.

Remark 27.18. Again, there is nothing special about the order of the coordi-nates. The same result is obtained for as long as the domain can be describedin the way indicated.

Remark 27.19. We can think of the common domain Ω′ as the “shadow” ofthe volume Ω in the xy-plane caused by a light shining down the z-axis. Theimportant thing is that Ω have a well defined “top” and “bottom” perpendicularto this axis.

Remark 27.20. Note that as we integrate each successive variable, the variableis eliminated from the calculation. Once we integrate with respect to z, theremaining calculation depends only on x and y. Once we integrate with respectto y the remaining calculation depends only on x.

Example 27.21. Let us consider the three-dimensional region inside the cylin-der x2 + y2 = 1, below the plane z = 4 + y and above the plane z = 2 + x.Suppose we wish to integrate the function f(x, y, z) = 1 − x over this region.Since we are inside the cylinder, it is easy to identify the “shadow” domain -the unit disk in the xy-plane. We can describe our domain as

Ω = (x, y, z) | − 1 ≤ x ≤ 1, −√

1− x2 ≤ y ≤√

1− x2, 2 + x ≤ z ≤ 4 + y.

212

Our integral becomes (with a little help from an integral table)∫∫∫Ω

1− x dV =∫ 1

−1

∫ √1−x2

−√

1−x2

∫ 4+y

2+x

1− x dz dy dx

=∫ 1

−1

∫ √1−x2

−√

1−x2(2 + y − x)(1− x) dy dx

=∫ 1

−1

(x2 − 3x+ 2)y + (1− x)y2/2∣∣y=√

1−x2

y=−√

1−x2 dx

=∫ 1

−1

2(x2 − 3x+ 2)√

1− x2 dx

=14

(√

1− x2(8 + 7x− 8x2 + 2x3) + 9 arcsin(x)))∣∣∣∣1−1

=9π4.

-1-0.5

00.5

1

x

-1

-0.5

0

0.5

1

y

0

2

4

6

z

-1-0.5

00.5

Figure 27.6: The region inside the cylinder x2 +y2 = 1 above the plane z = 2+xand below the plane z = 4 + y. Its “shadow” is the unit disk in the yz-plane.

Example 27.22. Suppose we wish to find the volume of the region in the firstoctant bounded by the planes z = y, x = y, and y = 1. (See Figure 27.7.) Wecan think of the “shadow” domain as the region

Ω′ = (x, y) | 0 ≤ x ≤ 1, x ≤ y ≤ 1.

Over this domain in the xy-plane, the three-dimensional region is bounded above

213

by the plane z = y and below by z = 0. Thus,

Ω = (x, y, z) | 0 ≤ x ≤ 1, x ≤ y ≤ 1, 0 ≤ z ≤ y.

We can set up our volume integral as∫∫∫Ω

1 dV =∫ 1

0

∫ 1

x

∫ y

0

dz dy dx

=∫ 1

0

∫ 1

x

y dy dx

=∫ 1

0

12

(1− x2) dx =13.

0

0.25

0.5

0.75

1

x

0

0.25

0.5

0.75

1

y

0

0.25

0.5

0.75

1

z

0

0.25

0.5

0.75

Figure 27.7: The region in the first octant bounded above by the plane z = yand by the planes y = 1 and x = y. Not shown in this figure are the sides x = 0and y = 0.

Example 27.23. As a simple example of using a different order of coordinatesconsider the problem of trying to find the volume of the sphere

(x− 2)2 + y2 + z2 ≤ 1.

Of course, we could describe this in the same way as above, but instead let’slook at the shadow in the yz-plane and describe the region as

Ω =

(x, y, z)

∣∣∣∣∣ −1 ≤ y ≤ 1, −√

1− y2 ≤ z ≤√

1− y2,

2−√

1− y2 − z2 ≤ x ≤ 2 +√

1− y2 − z2

.

214

Our volume integral becomes∫∫∫Ω

1 dV =∫ 1

−1

∫ √1−y2

−√

1−y2

∫ 2+√

1−y2−z2

2−√

1−y2−z2dx dz dy

=∫ 1

−1

∫ √1−y2

−√

1−y22√

1− y2 − z2 dz dy

= 2∫ 1

−1

z

2

√1− y2 − z2 +

1− y2

2arcsin

(z√

1− y2

)∣∣∣∣∣√

1−y2

−√

1−y2

dy

= π

∫ 1

−1

1− y2 dy =43π.

0

1

2

3

x

-1-0.5

00.5

1y

-1

-0.5

0

0.5

1

z

0

1

2

3

x

-1-0.5

00.5

1y

Figure 27.8: The sphere (x − 2)2 + y2 + z)2 = 1 and its “shadow” the in theyz-plane. The “shadow” is the common domain of the functions describing thehemispheres: x = 2 +

√1− y2 − z2 and x = 2−

√1− y2 − z2 respectively.

Problems

Problem 27.1. For the following, sketch the region of integration and evaluatethe integral. Reverse the order of integration if necessary.

(a) ∫ 4

1

∫ √x0

32ey√x dy dx.

(b) ∫ 1

0

∫ e2x

1

y dy dx.

215

(c) ∫ 4

0

∫ 2

y/2

ex2dx dy.

(d) ∫ 0

−2

∫ −vv

2 dp dv

(e) ∫ 1

0

∫ 1

y

x2exy dx dy.

Problem 27.2. Let R = (x, y) ⊂ R3 | 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, 0 ≤ z ≤ 1.Calculate ∫

Rz6y2x3ex

4dV.

Problem 27.3. Use a triple integral to find the volume of the tetrahedron cutfrom the first octant by the plane 2x+ 5y + 10z = 10.

Problem 27.4. Use a triple integral to find the volume bounded by the planez = 2x and the paraboloid z = x2 + y2 .

Problem 27.5. Find the volume of the region in the first octant bounded bythe coordinate planes, the plane z = 1− x and the surface y = cos(πx/2).

216

Chapter 28

The Change of VariablesFormula

One of the most important integration formulas in elementary calculus is thechange of variables or “u-substitution” formula.

Theorem 28.1. Suppose f : R → R is continuous and g : R → R isdifferentiable. Then∫ b

a

f(g(x))g′(x) dx =∫ g(b)

g(a)

f(u) du.

We make the change of variables by making the formal substitution

u = g(x),du = g′(x) dx.

For example, we can simplify the integral∫ 2

1

sin(πx3)x2 dx

by making the change

u = πx3,

du = 3πx2 dx,

so that the integral becomes∫ 2

1

sin(πx3)x2 dx =1

3π

∫ 8π

π

sinu du =−13π

cosu∣∣∣∣8ππ

=−23π

.

217

In one dimension, it is pretty easy to think of the proof of this theorem1 with-out worrying about the geometry. In higher dimensions, the geometry is morecrucial. The geometric key to the one-dimensional version of the formula is the“fudge factor” g′(u) that relates the length of the grid dx on the x-axis to thegrid du on the u-axis.

What is the correct analog for this fudge factor in higher dimensions? Forexample, suppose we have an invertible transformation

x(u) =(xy

)=(x(u, v)y(u, v)

)that maps a region Ω(u,v) in the uv-plane into a region Ω(x,y) in the xy-plane.(See Figure 28.1.) Can we derive a formula analogous to the change of variablesformula in one dimension? That is, a formula of the form∫∫

Ω(x,y)

f(x, y) dA(x, y) =∫∫

Ω(u,v)

f(x(u, v), y(u, v)) Fudge Factor dA(u, v).

-

6 6

-u

v

x

y

Ω(u,v) Ω(x,y)

Figure 28.1: Transformation from the uv-plane to the xy-plane. This is a typ-ical situation in changing variables in multiple dimensions. The reason for thechange is that the domain in the xy-plane is complicated. We have a muchsimpler rectangular domain in the uv-plane. This is never a consideration inone dimension, where domains are almost always intervals.

In order to make a guess at the fudge factor consider the linear transforma-tion

x = x(u) = Au =(a bc d

)(uv

)where

A =(a bc d

)is a nonsingular matrix. Let us consider what the transformation does to thedomain

Ω(u,v) = (x, y) | 0 ≤ u ≤ 1, 0 ≤ v ≤ 1.

It is pretty easy to see the following.1The proof is obtained by applying the fundamental theorem of calculus to the chain rule.

218

• The vertices of Ω(u,v) get mapped as follows.(00

)7→(

00

).(

10

)7→(ac

).(

01

)7→(bd

).(

11

)7→(ac

)+(bd

).

• The sides of Ω(u,v) transform into lines connecting the respective vertices.For example a point on the line segment (1, t) t ∈ [0, 1] transforms to(

ac

)+ t

(bd

), t ∈ [0, 1].

The interior of Ω(u,v) transforms into the interior of the parallelogramformed by the vectors (a, c) and (b, d).

Problem 7.9 asks you to show that the area of a parallelogram formed bythe vectors (a, c) and (b, d) was given by the absolute value of the determinantof the matrix A with those vectors as columns. Thus, the square region Ω(u,v)

of area one was mapped to a parallelogram Ω(x,y) of area |detA|. In fact, onecan prove something much more general.

Theorem 28.2. Suppose

A =(a bc d

)is a nonsingular matrix and

x0 =(ef

),

so thatx = x(u) = Au + x0

is an invertible transformation from R2 to R2. Then if Ω(u,v) ⊂ R2 is anyregion in the uv-plane and Ω(x,y) = x(Ω(u,v)) is its image in the xy-planeunder the transformation we have

A(Ω(x,y)) = |detA|A(Ω(u,v)) =∣∣∣∣∂(x, y)∂(u, v)

∣∣∣∣A(Ω(u,v)).

Here A(Ω(x,y)) is the area of Ω(x,y), etc.

219

The details above give the basic idea of a proof. We place a uniform grid onΩ(u,v). The cubic cells of the grid get mapped to similar parallelograms asabove, and the ratio between the areas of the transformed parallelograms andthe original cubes is the absolute value of the determinant. This can be factoredout of sum of the areas of the interior cubes and the the interior parallelograms.In the limit we get the desired relationship.

Remark 28.3. Note that the matrix A is the total derivative matrix of the lin-ear transformation x at every point. Thus the determinant of A is the Jacobianof the transformation.

Remark 28.4. In fact, the same theorem as above is true for nonsingular linear(affine) transformations from Rn to Rn for any n. While we really haven’tstudied the tools necessary to prove this in general, it should be fairly obviousin R3 from the relationship between the determinant of a 3× 3 matrix and thescalar triple product.

Of course, the next step in deriving a general change of variables formula isto go from linear transformations to a general nonlinear transformation x(u).Not to give away the punch line, but here is our basic theorem.

Theorem 28.5. Suppose Ωu ⊂ Rn and Ωx ⊂ Rn and x : Ωu → Ωx is asmooth, invertible transformation. Then if f : Ωx → R is integrable, thecomposite function u 7→ f(x(u)) is integrable over Ωu and∫∫

Ωx

f(x) dV (x) =∫∫

Ωu

f(x(u))∣∣∣∣∂(x1, . . . , xn)∂(u1, . . . , un)

∣∣∣∣ dV (u).

The proof of this follows many of the basic ideas from the previous proof.Again, we break up the domain Ωu into a regular cubic grid in u-space in orderto approximate the integral of the composite f(x) over Ωu. However, insteadof using a regular cubic grid for the domain Ωx we use the curves formed bythe transformed coordinate line in u-space. Thus, small cubes in u-space aretransformed into small curved regions in x-space. While we can’t compute theratio of the volumes of the corresponding regions exactly, we can use the fact thatthe nonlinear transformation can be approximated by an affine transformation.As above, the affine transformation would transform the cube in u-space to a(generalized) parallelogram in x-space. Here the ratio of the volumes is known:the absolute value of the determinant of the total derivative matrix definingthe best affine approximation. That is, the ratio of the volumes (and hence thefudge factor we have been seeking) is the absolute value of the Jacobian of thetransformation.

220

Example 28.6. Suppose Ω(x,y) is the parallelogram bounded by the lines

y =x

2,

y =x

2+ 3,

y = 2x,y = 2x− 6.

(See Figure 28.2.) If we wished to compute∫Ω(x,y)

3y − 6x dA

directly it would be possible, but difficult. We would have to split the parallel-ogram into simple regions and do more than one double integral.

-

6

x

y

Ω(x,y)

Figure 28.2: Domain in the xy-plane with x2 ≤ y ≤

x2 + 3 and y

2 ≤ x ≤y2 + 3.

It will be much easier to create a transformation that will represent thedomain as the image of a rectangle. There are many transformations that willdo this. For instance, if we let

u = y − 2x, (28.1)v = 2y − x. (28.2)

Then the sides of our domain transform as follows.

y =x

2∼ v = 0,

y =x

2+ 3 ∼ v = 6,

y = 2x ∼ u = 0,

y = 2x− 6 ∼ u = −6.

221

Thus the equivalent domain in the uv-plane is the square

Ω(u,v) = (u, v) | 0 ≤ u ≤ 6, 0 ≤ v ≤ 6.

In order to use the change of variables formula we need to compute theJacobian ∂(x,y)

∂(u,v) . While there is more than one way to compute this, let’s invertour transformation to give x and y as functions of u and v. A little linear algebraon equations (28.1) and (28.2) gives us

x = −23u+

13v,

y = −13u+

23v.

This give us the Jacobian

∂(x, y)∂(u, v)

=∣∣∣∣ − 2

313

− 13

23

∣∣∣∣ = −13.

If we note that 3y − 6x = 3u we get∫Ω(x,y)

3y − 6x dA(x, y) =∫

Ω

3u∣∣∣∣∂(x, y)∂(u, v)

∣∣∣∣ dA(u, v)

=∫ 6

0

∫ 0

−6

3u∣∣∣∣−1

3

∣∣∣∣ du dv = −108.

Example 28.7. Consider the integral∫∫D(x,y)

(y4 − x4)exy dA(x, y)

where D(x,y) is the region in the xy-plane bounded by the hyperbolic curvesxy = 1, xy = 2, x2 − y2 = 1, and x2 − y2 = 2. (See Figure 28.3.)

Here both the integrand and the domain are problematic, but we concentrateon the domain first. It’s pretty easy to see that we can define a one-to-one, onto(and hence invertible) map from D(x,y) to the square

D(u,v) = (u, v) | 1 ≤ u ≤ 2, 1 ≤ v ≤ 2

using

u = xy,

v = x2 − y2.

While an inverse map (x(u, v), y(u, v)) exists, it is not necessary for us to find

222

1.3 1.4 1.5 1.6 1.7 1.8 1.9x

0.25

0.5

0.75

1

1.25

1.5

y

Figure 28.3: The region bounded by the hyperbolic curves xy = 1, xy = 2,x2 − y2 = 1 and x2 − y2 = 2.

it explicitly. Instead we note that

∂(x, y)∂(u, v)

=1

∂(u,v)∂(x,y)

=1∣∣∣∣ 2x −2y

y x

∣∣∣∣=

12(x2 + y2)

.

Thus we have∫∫D(x,y)

(y4 − x4)exy dA(x, y) =∫∫

D(u,v)

(y4 − x4)exy∣∣∣∣∂(x, y)∂(u, v)

∣∣∣∣ dA(u, v)

=12

∫∫D

y4 − x4

x2 + y2exy dA(u, v)

= −12

∫ 2

1

∫ 2

1

veu du dv = −34

(e2 − e).

Example 28.8. Consider the integral∫∫D(x,y)

sin(x2 + y2) dV (x, y)

where D(x,y) = (x, y) | x2 + y2 ≤ 1 is the unit disk. Given the circularsymmetry of the domain and the integrand it seems sensible to convert to polarcoordinates. We use the transformation(

xy

)= pp(r, θ) =

(x(r, θ)y(r, θ)

)=(r cos θr sin θ

).

223

Under this transformation, the disk (in the xy-plane) is the image of rectangle

Ω(r,θ) = (r, θ) | 0 ≤ r ≤ 1, 0 ≤ θ < 2π.

The Jacobian of this transformation is given by

∂(x, y)∂(r, θ)

=∣∣∣∣ ∂x∂r

∂x∂θ

∂y∂r

∂y∂θ

∣∣∣∣=

∣∣∣∣ cos θ sin θ−r sin θ r cos θ

∣∣∣∣= r(cos2 θ + sin2 θ) = r.

Using the relation x2 + y2 = r2, the integral transforms as follows.∫∫D(x,y)

sin(x2 + y2) dV (x, y) =∫∫

Ω(r,θ)

sin(r2)∣∣∣∣∂(x, y)∂(r, θ)

∣∣∣∣ dV (r, θ)

=∫∫

Ω(r,θ)

sin(r2) r dV (r, θ)

=∫ 2π

0

∫ 1

0

sin(r2)r dr dθ

=∫ 2π

0

−12

cos(r2)∣∣∣∣10

dθ

= π(1− cos 1).

Example 28.9. Suppose we wish to find the volume of the region Ω(x,y,z) abovethe cone

z =√x2 + y2

and below the parabolaz = 2− x2 − y2.

(See Figure 28.4.) These two surfaces intersect at the circle x2 + y2 = 1 whenz = 1. The common domain Ω′ of the functions describing the surfaces is theunit circle in the xy-plane. This can be described as a y-simple two-dimensionaldomain

Ω′ = (x, y) | − 1 ≤ x ≤ 1, −√

1− x2 ≤ y ≤√

1− x2.

We can describe

Ω(x,y,z) = (x, y, z) |(x, y) ∈ Ω′,√x2 + y2 ≤ z ≤ 2− x2 − y2.

So we have

V (Ω(x,y,z)) =∫∫∫

Ω(x,y,z)

1 dV (x, y, z)

=∫ 1

−1

∫ −√1−x2

−√

1−x2

∫ 2−x2−y2

√x2+y2

dz dy dx

224

This is a rather nasty integral to compute in Cartesian coordinates. However, incylindrical coordinates it is rather easy. Recall that the cylindrical coordinatetransformation is given by x

yz

= pc(r, θ, z) =

x(r, θ, z)y(r, θ, z)z(r, θ, z)

=

r cos θr sin θz

.

Under this transformation the cone is given by

z = r

while the parabola isz = 2− r2.

Thus, domain can be described by

Ω(r,θ,z) = (r, θ, z) | 0 ≤ θ ≤ 2π, 0 ≤ r ≤ 1, r ≤ z ≤ 2− r2.

We compute the Jacobian of the transformation

∂(x, y, z)∂(r, θ, z)

=

∣∣∣∣∣∣∂x∂r

∂x∂θ

∂x∂z

∂y∂r

∂y∂θ

∂y∂z

∂z∂r

∂z∂θ

∂z∂z

∣∣∣∣∣∣=

∣∣∣∣∣∣cos θ sin θ 0−r sin θ r cos θ 0

0 0 1

∣∣∣∣∣∣= r(cos2 θ + sin2 θ) = r.

The volume can be computed using the change of variables formula

V (Ω(x,y,z)) =∫∫∫

Ω(x,y,z)

1 dV (x, y, z)

=∫∫∫

Ω(r,θ,z)

r dV (r, θ, z)

=∫ 2π

0

∫ 1

0

∫ 2−r2

r

r dz dr dθ

=∫ 2π

0

∫ 1

0

(2− r2 − r)r dr dθ

=∫ 2π

0

r2 − 14r4 − 1

3r3

∣∣∣∣10

dθ

=10π12

.

225

-1-0.5

00.5

1x

-1 -0.5 0 0.5 1

y

0

0.5

1

1.5

2

z

-1-0.5

00.5

1x

-1 -0.5 0 0.5 1

Figure 28.4: The region between the cone z =√x2 + y2 and the parabola

z = 2 − x2 − y2. The common domain of the two functions is indicated in thexy plane.

Example 28.10. Consider the three-dimensional region Ω(x,y,z) bounded bythe spheres of radius one and two and the cones

z =√

3√x2 + y2

andz =

1√3

√x2 + y2

In Figure 28.5 we show this region and its cross section in the xz-plane.While computing the volume of this region as an integral would be a mess to

even describe in Cartesian coordinates, it is rather easy in spherical coordinates.Recall that the spherical coordinate transformation is given by x

yz

= ps(ρ, θ, φ) =

x(ρ, θ, φ)y(ρ, θ, φ)z(ρ, θ, φ)

=


Of course, in this system the spheres of radius one and two are described bythe equations ρ = 1 and ρ = 2 respectively. The cones are described by theequations φ = π/6 and φ = π/3 respectively. This can be seen by symmetryor we can determine this analytically as follows. We transform the equationz =√

3√x2 + y2 into spherical coordinates to get

ρ cosφ =√

3√ρ2 cos2 θ sin2 φ+ ρ2 sin2 θ sin2 φ.

Using the fact that ρ > 0 and sinφ > 0, this can be reduced to

cosφ =√

3 sinφ

226

-10

1x

-1 0 1

y

0.5

1

1.5

z

-10

1x

-1 0 1

-2 -1 1 2x

0.5

1

1.5

2

z

Figure 28.5: Region between the spheres ρ = 1 and ρ = 2 and the cones φ = π/6and φ = π/3. Both a perspective plot and the cross section of the region in thexz-plane are displayed

ortanφ =

1√3.

Which gives us φ = π/6. The cone φ = π/3 can be determined in a similar way.Thus we have

Ω(ρ,θ,φ) =

(ρ, θ, φ) | 1 ≤ ρ ≤ 2, 0 ≤ θ < 2π,π

6≤ φ ≤ π

3

.

227

We now compute the Jacobian of the transformation

∂(x, y, z)∂(ρ, θ, φ)

=

∣∣∣∣∣∣∣∂x∂ρ

∂x∂θ

∂x∂φ

∂y∂ρ

∂y∂θ

∂y∂φ

∂z∂ρ

∂z∂θ

∂z∂φ

∣∣∣∣∣∣∣=

∣∣∣∣∣∣cos θ sinφ −ρ sin θ sinφ ρ cos θ cosφsin θ sinφ ρ cos θ sinφ ρ sin θ cosφ

cosφ 0 −ρ sinφ

∣∣∣∣∣∣= cosφ

∣∣∣∣ −ρ sin θ sinφ ρ cos θ cosφρ cos θ sinφ ρ sin θ cosφ

∣∣∣∣−ρ sinφ

∣∣∣∣ cos θ sinφ −ρ sin θ sinφsin θ sinφ ρ cos θ sinφ

∣∣∣∣= cosφ(−ρ2 sin2 θ sinφ cosφ− ρ2 cos2 θ sinφ cosφ)

−ρ sinφ(ρ cos2 θ sin2 φ+ ρ sin2 θ sin2 φ)= −ρ2 sinφ.

We use this (after taking its absolute value) in the change of variables formulato compute the volume

V (Ω(x,y,z)) =∫∫∫

Ω(x,y,z)

dV (x, y, z)

=∫∫∫

Ω(ρ,θ,φ)

ρ sinφ dV (ρ, θ, φ)

=∫ 2π

0

∫ π/3

π/6

∫ 2

1

ρ2 sinφ dρ dφ dθ

= 2π(cosπ

6− cos

π

6)(

23

2− 13

2) =

7π2

(√

3− 1).

Problems

Problem 28.1. Consider the system

u = x− y, v = 2x+ y.

(a) Solve the system for x and y in terms of u and v. Compute the Jacobian

∂(x, y)∂(u, v)

.

(b) Using this transformation, find the region Ω(u,v) in the uv-plane correspond-ing to the triangular region Ω(x,y) with vertices (0, 0), (1, 1), and (1,−2) in thexy-plane. Sketch the region in the uv-plane.

228

(c) Use the calculations above to write the integral.∫ 1

0

∫ x

−2x

3x dy dx

as an integral in the uv-plane.(d) Compute both integrals and show they are the same.(e) Use the same transformation to evaluate the integral∫ ∫

Ω(x,y)

(2x2 − xy − y2) dx dy

where Ω(x,y) is the region in the first quadrant bounded by the lines y = −2x+4,y = −2x+ 7, y = x− 2, and y = x+ 1.

Problem 28.2. Consider the change of variables

(x, y) = x(u, v) = (4u, 2u+ 3v).

Let Ω(x,y) = (x, y) | 0 ≤ x ≤ 1, 1 ≤ y ≤ 2.(a) Find Ω(u,v) such that x(Ω(u,v)) = Ω(x,y),(b) Use the change of variables formula to calculate∫ ∫

Ω(x,y)

xy dx dy

as an integral over D(u, v).


(x, y) = x(u, v) = (u, v(1 + u)).

Let Ω(x,y) = (x, y) | 0 ≤ x ≤ 1, 1 ≤ y ≤ 2.(a) Find Ω(u,v) such that x(Ω(u,v)) = Ω(x,y),(b) Use the change of variables formula to calculate∫ ∫

Ω(x,y)

(x− y) dx dy

as an integral over Ω(u,v).


(x, y) = x(u, v) = (u2 − v2, uv).

Let Ω(u,v) = (u, v) | u2 + v2 ≤ 1, 0 ≤ u.(a) Find Ω(x,y) = x(Ω(u,v)),(b) Evaluate ∫ ∫

Ω(x,y)

dx dy.

229

Problem 28.5. Convert the following double integrals in Cartesian coordinatesto integrals in polar coordinates and evaluate the polar integral.

(a) ∫ 1

−1

∫ √1−x2

−√

1−x2dy dx.

(b) ∫ 1

−1

∫ √1−y2

−√

1−y2x2 + y2 dx dy.

(c) ∫ 1

0

∫ √1−x2

0

e−(x2+y2) dy dx.

Problem 28.6. Convert the integral below to an equivalent integral in cylin-drical coordinates and evaluate the integral.∫ 1

−1

∫ √1−y2

0

∫ x

0

(x2 + y2) dz dx dy.

Problem 28.7. Let Ω ⊂ R3 be the region bounded below by the cone z =√x2 + y2 and above by the plane z = 5. Set up (but do not evaluate) the

integral for the volume of this region as an integral in spherical coordinateswith the order of integration

dρ dφ dθ.

Problem 28.8. Find the volume of the portion of the ball of radius 3 in R3

above the plane z = 1.

Problem 28.9. Find the volume of the right circular cylinder in R3 whose baseis the circle r = 2 sin θ in the xy-plane and whose top is in the plane z = 4− y.

230

Chapter 29

Hausdorff Dimension andMeasure

In this chapter we consider a topic from measure theory. This is an advancedsubject usually covered in a graduate course,1 but I think the basic ideas areaccessible to undergraduate readers even if the details are difficult. My goal isto describe the Hausdorff measure of a set. To my mind this is the best way toprovide a fundamental definition of integrals over curved lower-dimensional setssuch as paths and surfaces. In later chapters we study practical formulas forcomputing such integrals, but I’ve never found the classical derivations of theseformulas to be terribly convincing. On the other hand, the Hausdorff measureprovides a fairly intuitive way of describing the dimension and size of a set,and it is general enough to be applied to very strange sets such as “fractals.”Furthermore, there is a clear, rigorous connection between the Hausdorff mea-sure and the practical formulas for arclength and surface area given below. Theproof of the connection is beyond the scope of this book, but detailed referencesare provided for those who wish to pursue this subject.

We will be using several new ideas without giving formal definitions. Thefollowing is a collection of informal definitions.

• A countable collection of objects is one that can be indexed by the nat-ural numbers. This can be done for the rational numbers – so they arecountable. The irrational numbers are not countable.

• The ideas of the supremum and infimum (abbreviated by sup and inf)of a set of numbers are related to the maximum and minimum of a set. Forexample, the open interval (0, 1) has no maximum of minimum element,

1It is interesting how seldom one has to look to advanced material to provide a rigorousfoundation for basic concepts in mathematics. Usually we can trace our ideas back to veryelementary concepts. Unfortunately, this is a case where the fundamental concepts (What isthe dimension of a set? How do we measure the size of the three dimensional boundary of aset in R4?) are genuinely difficult and the more advanced theory is the best way I’ve foundto deal with them.

231

but its infimum is zero and its supremum is one. A reader hoping to geta rough idea of what is going on in the discussion below without learningthe exact definitions would do well to think of the sup as the max and theinf as the min.

• The diameter of a set in Rn is the supremum of the distance between allpairs of points in the set. As discussed in the last item, thinking of thisas the maximum distance between points in the set gives the right idea.

• Let F be a subset of Rn. We say that a collection A of subsets of Rn is acover of F if

F ⊆ ∪A∈AA.

If the collection A is countable, we call it a countable cover.

• We call a cover A of F an ε-cover if every A ∈ A satisfies

diamA ≤ ε.

• The gamma function is defined to be

Γ(z) =∫ ∞

0

e−xxz−1 dx

for z > 0. The gamma function is an extension of the factorial function.That is, for a positive integer n

Γ(n) = (n− 1)!.

• We define the function

β(s) =πs/2

2sΓ(s2 + 1

) .One can show that the n-dimensional Riemann volume of a ball B(d, n)of diameter d in Rn is given by

V (B(d, n)) = β(n)dn

We now define for s ≥ 0 and ε > 0,

Hsε(F ) = inf∑A∈A

β(s)(diamA)s,

where the infimum is taken over all possible ε-covers of the set F . This is arather curious way to measure the size of the set F . We first cover it by acollection of small but arbitrarily-shaped sets A. We then use β(s)(diamA)s

to measure the size of each covering set. For s = 1, 2, 3 this is the size of ans-dimensional ball whose diameter is the diameter of the set A. So in some

232

sense we are replacing the arbitrary set with a ball of dimension s. Note thatthe formula makes sense even if s is not an integer. We will be interested onlyin integer dimensions, but sets with fractional dimension are possible and areof great interest elsewhere.

Since there are fewer possible ε-covers as ε gets smaller, (and ε/2-cover is alsoan ε-cover, but not vice versa) the infimum (and hence Hsε(F )) must increaseas ε decreases. With this in mind we define the following.

Definition 29.1. Let F ⊂ Rn and let s ≥ 0. Then

Hs(F ) = limε→0Hsε(F ) = sup

ε>0Hsε(F )

is the s-dimensional Hausdorff measure of the set F .

Note the following

• We would expect that the one-dimensional measure (length) of a two-dimensional set to be infinite.

• We would expect that the two-dimensional measure (area) of a one-dimensionalset to be zero.

In fact, one can prove something much more precise.

Theorem 29.2. For any set F ⊆ Rn there is a unique critical s0 ∈ [0, n]such that

Hs(F ) = ∞ for all s < s0;Hs(F ) = 0 for all s > s0.

The number s0 is called the Hausdorff dimension of the set F .

In a more complete and rigorous exposition on Hausdorff measure such as [4, 6]one would proceed to establish the following properties of the Hausdorff measureand dimension.

1. The Hausdorff dimension provides a rigorous definition that conforms toour intuitive notion of the dimension of a set.2 In particular, one can showthe following.

• Regions in Rn with positive, finite, n-dimensional Riemann volumehave dimension n.

2In fact, the Hausdorff dimension can go a good deal beyond our intuition and identify thenon-integer dimension of “fractals” as the famous mandelbrot set.

233

• Smooth curves in R3 are have Hausdorff dimension one.

• Smooth surfaces have Hausdorff dimension two.

• More generally, the images of smooth, nondegenerate3 mappings fromRm to Rn with m < n have Hausdorff dimension m.

2. The Hausdorff measure conforms to our traditional notions of length, area,and volume in situations where these are well defined. Particular cases ofthis include the following.

• The Hausdorff measure of an n-dimensional Riemann volume in Rnis equal to its Riemann volume.

• The one-dimensional Hausdorff measure of a line segment in Rn isits length.

• The two-dimensional Hausdorff measure of a portion of a plane is itsarea.

In light of these results, we take the Hausdorff dimension and measure tobe our fundamental notions of the dimension and size of sets inRn. We will consider formulas for quantities like arclength and surface areato be justified if they can be rigorously shown to agree with the Hausdorffmeasure.

Problems

Problem 29.1. Use an induction proof to show that

Γ(n) = (n− 1)!.

Start by showing directly that

Γ(1) = 1 = 0!.

Then show thatΓ(n) = (n− 1)Γ(n− 1).

Hint: Integrate by parts.

3We say that such a mapping is nondegenerate if its total derivative matrix has maximumpossible rank (m) at each point. This ensures that the m “coordinate curves” of the “surface”defined by the mapping are linearly independent.

234

Chapter 30

Integrals over Curves

In this chapter we study integrals over curves in Rn. We develop three types ofintegrals.

1. An integral formula describing the length of a curve. (This was alreadydiscussed briefly in Chapter 15.)

2. Integrals of scalar functions over curves or (more generally) along paths.These can be used (for example) to calculate the mass of a one-dimensionalrod by integrating its density.

3. Integrals of the component of a vector field tangent to a path. This can beused to calculate the work done by a particle moving along a path througha force field.

30.1 The Length of a Curve

Recall that in Chapter 15 we gave an intuitive definition of the length of atrajectory as the integral of the speed (length of the velocity vector) over thedomain of the trajectory.

Definition 30.1. The arclength of a differentiable trajectory f : [t0, t1]→Rn is given by

L(f) =∫ t1

t0

‖f ′(t)‖ dt.

In fact, one can show that this definition yields exactly the Hausdorff mea-sure of the curve traversed by a trajectory.

235

Theorem 30.2. Let f : [t0, t1] → Rn be any simple trajectory and let Cbe the (simple) curve traced by f . Then the arclength of f is exactly theone-dimensional Hausdorff measure of C. That is,

L(f) = H1(C).

This result requires measure theory to prove. See [6, p. 100]. However, if weaccept the one-dimensional Hausdorff measure as the fundamental definition ofthe “length” of a curved set it is good to see that it agrees with a more intuitivedefinition of length.

Since the Hausdorff measure of the curve C depends only on the set andnot on the trajectory that sweeps out the set, the previous theorem implies thefollowing.

Theorem 30.3. Let f : [t0, t1] → Rn be any simple trajectory and let C bethe (simple) curve traced by f . Any trajectory that is path equivalent to f orin the equivalence class of the reverse of the path represented by f has thesame arclength.

While this follows from the equivalence arclength and Hausdorff measure,it can also be proved using the arclength formula directly, and such a proofworks for paths that are not simple. In Theorem 15.4 we proved this result forequivalent paths. We leave the proof of the result for reverse paths to the readerin Problem 30.7.

Remark 30.4. Note that by this theorem it makes sense to talk about thelength of a path or the length of a simple curve rather than just the length of atrajectory. Thus we will write L(P) for a path P and L(C) for a curve C. Notethat with this notation we can write (for example)

L(P) = L(−P)

for the identity you are asked to prove directly in Problem 30.7.

Example 30.5. Let us compute the arclength of the curve

r(t) = (t2/2, cos(t), sin(t)), t ∈ [0, 3π].

(See Figure 30.1.) We compute r′(t) = (t,− sin(t), cos(t)) and ‖r′(t)‖ =√t2 + 1.

With the help of a table of integrals (or a bit of trig substitution) we get

L(r) =∫ 3π

0

√t2 + 1 dt =

3π2

√1 + 9π2 +

12

arcsinh (3π) .

236

010

2030

40x -1-0.500.51

y

-1-0.500.51

z010

2030

40x

-1-0.500.51

Figure 30.1: The trajectory r(t) = (t2/2, cos(t), sin(t)), t ∈ [0, 3π].

30.2 Integrals of Scalar Fields Along Curves

We now define the integral of a scalar field over a path. We will give only a“practical” formula for this integral. It could be defined in a more fundamentalway using Hausdorff measure, but we will not do this.

Definition 30.6. Let r : [t0, t1] → Rn be any differentiable trajectory andlet P be the path represented by r. Let the curve C be the range of r andlet f : C → R be a scalar field defined (at least) on C. Then, the scalarpath integral of f over P is∫

Pf dr =

∫ t1

t0

f(r(t))‖r′(t)‖ dt.

If the notation above is to make sense, we must show that the integral overa path doesn’t depend on which trajectory we use to represent the path.

Theorem 30.7. Suppose r1 : [t0, t1] → Rn and r2 : [t2, t3] → Rn areequivalent differentiable trajectories. Let the curve C be the range of thesetrajectories and let f : C → R be a scalar field defined (at least) on C. Then∫ t1

t0

f(r1(t))‖r′1(t)‖ dt =∫ t3

t2

f(r2(s))‖r′2(s)‖ ds.

Proof. Since r1 and r2 are equivalent, there exists a monotone increasing func-tion φ : [t0, t1]→ [t2, t3] such that φ(t0) = t2 and φ(t1) = t3, and

r1(t) = r2(φ(t))

237

for t ∈ [t0, t1]. Note that this also gives us

r′1(t) = r′2(φ(t))φ′(t),

and‖r′1(t)‖ = ‖r′2(φ(t))‖φ′(t)

since φ′ ≥ 0. Thus, we have (using the change of variables formula for scalarfunctions with s = φ(t))∫ t1

t0

f(r1(t))‖r′1(t)‖ dt =∫ t1

t0

f(r2(φ(t)))‖r′2(φ(t))‖φ′(t) dt

=∫ t3

t2

f(r2(s))‖r′2(s)‖ ds.

Remark 30.8. Since we have not given a fundamental definition of the integralof a scalar function over a physical curve in Rn, we will appeal to the arclength(where there is such a connection) to make an argument that our formula makessense. To define an integral we want to break the domain of integration intolittle bits and multiply the height of the scalar being integrated by the lengthof the little bits of the domain. In a rigorous definition, the length of those bitswould be defined by the Hausdorff measure. Fortunately, Theorem 30.2 suggeststhat ‖r′‖ is the right “fudge factor” to relate the length of little bits of the curvein space to little bits of the interval that is the domain of the trajectory tracingthe curve.

Remark 30.9. It is worth noticing a few things about the formula∫Pf dr =

∫ t1

t0

f(r(t))‖r′(t)‖ dt.

The left side contains only very general geometric information. It refers to a pathin space and a scalar function defined on the points of that path. No coordinatesystem is referred to. No method of computation is suggested. The right sideis all about computation. The formula is defined using a specific trajectory onspecific interval. The computation is an elementary calculus integral. If we wantto do any computation, we have to use the form of the right side by picking atrajectory and writing our integral as an elementary integration over an interval.

As with the arclength, the order in which the points in a path are traverseddoes not affect the scalar path integral

Theorem 30.10. Let r : [t0, t1] → Rn be any differentiable trajectory andlet P be the path represented by r. Let the curve C be the range of r and letf : C → R be a scalar field defined (at least) on C. Then∫

Pf dr =

∫−P

f dr.

238

Thus, for a simple curve C, any simple path traversing the curve will have thesame scalar path integral. The proof of this is left to the reader in Problem 30.10.

Example 30.11. Let r be the trajectory defined in Example 30.5 and dis-played in Figure 30.1. Let P be the path represented by r. The scalar functionf(x, y, z) =

√x is defined for all points of this path. We compute the scalar

path integral of f as follows.∫Pf dr =

∫ 3π

0

f(r(t))‖r′(t)‖ dt

=∫ 3π

0

√t2

2

√t2 + 1 dt

=1

3√

2

(((1 + 9π2

)3/2 − 1)

30.3 Integrals of Vector Fields Along Paths

We now define a type of integral of a vector field over a path called a “line inte-gral.” This is a fundamental tool in mechanic, fluid dynamics, thermodynamicsand electromagnetism.

Once again, we give only a “practical” formula for this integral.

Definition 30.12. Let r : [t0, t1]→ Rn be any differentiable trajectory andlet P be the path represented by r. Let the curve C be the range of r andlet v : C → Rn be a vector field defined (at least) on C. Then, the lineintegral of v over P is∫

Pv · dr =

∫ t1

t0

v(r(t)) · r′(t) dt.

Once again, if the notation above is to make sense, we must show that theintegral over a path doesn’t depend on which trajectory we use to represent thepath.

Theorem 30.13. Suppose r1 : [t0, t1] → Rn and r2 : [t2, t3] → Rn areequivalent differentiable trajectories. Let the curve C be the range of thesetrajectories and let v : C → Rn be a vector field defined (at least) on C. Then∫ t1

t0

v(r1(t)) · r′1(t) dt =∫ t3

t2

v(r2(s)) · r′2(s) ds.

This proof is left to the reader in Problem 30.8.

239

Remark 30.14. We can compare this integral to the scalar path integral inthe following way. Suppose we define

u(t) =r′(t)‖r′(t)‖

.

Of course this is just a normalization of r′, and as such it gives a unit tangentvector to the trajectory. Now if we write∫ t1

t0

v(r(t)) · r′(t) dt =∫ t1

t0

(v(r(t)) · u(t))‖r′(t)‖ dt.

we see that the line integral is just the scalar path integral of the componentof the vector field v that is tangent to the trajectory. In very rough terms, theline integral measures the tendency of a vector field to flow parallel to a path.

Unlike the arclength and the scalar path integral, the order in which thepoints in a path are traversed does affect the line integral. (Of course, we wouldexpect this from the previous remark.)

Theorem 30.15. Let r : [t0, t1] → Rn be any differentiable trajectory andlet P be the path represented by r. Let the curve C be the range of r and letv : C → Rn be a vector field defined (at least) on C. Then∫

−Pv · dr = −

∫P

v · dr.

Once again, this proof is left to the reader. (See Problem 30.9.)

Example 30.16. Let r(t) = (t + 1, t2, t3 − t) for t ∈ [0, 1]. We wish to com-pute the line integral of the vector field v(x, y, z) = (yx, z + x, xy) along thistrajectory. We note that

v(r(t)) · r′(t) =

t2(t3 − t)t3 + 1

(t+ 1)t2

· 1

2t3t2 − 1

= 4t5 + 5t4 − 2t3 − t2 + 2t.

Thus we get∫r

v · dr =∫ 1

0

v(r(t)) · r′(t) dt =∫ 1

0

4t5 + 5t4 − 2t3 − t2 + 2t dt =116.

Remark 30.17 (On notation). There are many different notations for integralsover curves. The most common are different choices for the “standard” differ-ential elements. For instance, where this text uses dr in scalar path integralsother texts use ds or dσ or dt. The reader is advised to be on the lookout forsuch variations, but most of these are pretty easy to decipher.

240

A few less intuitive bits of notation are worth more discussion. The first issimple. It is common to use the symbol

∮to indicate a line integral over a closed

path. Of course, the notation is redundant if the path has been specified, butsometimes redundancy can be a virtue (and a good thing). The second com-mon notation I’ll discuss here has my nomination for the worst bit of notationin mathematics. It is widely used, and it is not going away. But it is very mis-leading within the context of standard calculus, suggesting all sorts of absurdcomputations for line integrals. I’ll describe it in three dimensions. Suppose wewish to integrate a vector field

v(x, y, z) = (v1(x, y, z), v2(x, y, z), v3(x, y, z))

over the path P. Then it is common to write∫P

v · dr =∫Pv1(x, y, z) dx+ v2(x, y, z) dy + v3(x, y, z) dz

For example, in Problem 30.5 you are asked to evaluate the line integral∫P

2xyz dx+ x2z dy + x2y dz

where P is line segment connecting (2,−1, 1) and (1, 0,−1). Of course, thenotation comes from the formula∫

Pv · dr =

∫ t1

t0

v1x′(t) + v2y

′(t) + v3z′(t) dt

where r(t) = (x(t), y(t), z(t)) is a trajectory representing P. The notation makesperfect sense in the context of differential geometry in the language of differentialforms. Also, it is easy to see the practical advantages of the notation. It iscompact and simple to write and doesn’t require any special fonts for vectors.However, since it looks exactly like our notation for partial integration, it canlead anyone who is not steeped in the language of differential geometry to maketerribly wrong computations. To make correct computations one must specify atrajectory representing the path, compose that trajectory with the vector field,and dot the vector field with the tangent to the trajectory.

Example 30.18. Suppose we wish to compute∫Px sin y dx+ xy2 dy

where P is the path in the xy-plane connecting (0, 0) to (1, 1) along the parabolay = x2. To do this we give a trajectory representing the path

x(t) = t,

y(t) = t2,

241

for t ∈ [0, 1]. We then compute∫Px sin y dx+ xy2 dy =

∫ 1

0

(x(t) sin y(t) x′(t) + x(t)y(t)2 y′(t)) dt

=∫ 1

0

(t sin(t2) + t(t2)2(2t) dt

=1− cos(1)

2+

27.

Problems

Problem 30.1. Evaluate the scalar path integral∫Pf dr

where f(x, y, z) = x+ y + z and the path P is defined by the trajectory

r(t) = (t, sin t, cos t), t ∈ [0, 4π].


where f(x, y, z) = e√z and P is defined by the trajectory

r(t) = (1, 2, t2), t ∈ [0, 1].


where f(x, y, z) = zey and P is defined by the trajectory r(t) = (t2, 0, t), t ∈[0, 1].

Problem 30.4. Evaluate the line integral∫P

f · dr

where f(x, y) = (x, y) and P is defined by the trajectory r(t) = (cos3 t, sin3 t),t ∈ [0, 2π].

Problem 30.5. Evaluate the line integral∫P

2xyz dx+ x2z dy + x2y dz

where P is line segment from (2,−1, 1) to (1, 0,−1).

242

Problem 30.6. Evaluate the line integral∫Py dx+ (3y3 − x) dy.

where P is defined by the trajectory c(t) = ti + t6j, t ∈ [0, 1].

Problem 30.7. Let r : [t0, t1]→ Rn be a differentiable trajectory. Let r−[0, 1]→Rn be a reverse of the trajectory given by

r−(s) = r((1− s)t0 + st1).

Show thatL(r) = L(r−).

Explain how to combine this result with Theorem 15.4 to complete the proof ofTheorem 30.3.

Problem 30.8. Prove Theorem 30.13.

Problem 30.9. Prove Theorem 30.15.

Problem 30.10. Prove Theorem 30.10

243

Chapter 31

Integrals Over Surfaces

In this chapter we discuss integrals over (n−1)-dimensional surfaces in Rn. Sincethe overwhelming body of applications of these integrals are to two-dimensionalsurfaces in R3 we concentrate on this case and leave the general case to a shortsection at the end. As with line integrals, there are three types of integrals wewish to compute.

1. An integral formula describing the area of a surface.

2. Integrals of scalar functions over surfaces. These can be used (for example)to calculate the total charge on a two-dimensional plate by integrating itssurface charge density.

3. Integrals of the component of a vector field flowing through a surface. Thiscan be used to evaluate quantities such as the fluid flow through a pipe.

Before this discussion, we need to do some further description of surfacesand their boundaries.

31.1 Regular Regions and Boundary Orienta-tion

We need to make some assumptions that guarantee that the surfaces we will beworking with won’t be too strange. Our assumptions really are not that limitingin practice. We begin with some more language describing regions in R2.

244

Definition 31.1. We say that Ω′ ⊂ R2 is a regular region if it has thefollowing properties.

1. Ω′ is a bounded, open set with positive Riemann volume.

2. The boundary of Ω′ is the same as the boundary of it’s closure.

3. The boundary consists of the union of a finite collection of piecewisesmooth, simple, closed curves. At most two boundary curves can in-tersect at a single point, and each pair of boundary curves can haveat most a finite collection of intersections points.

Each portion of the boundary is said to have a positive orientation if itis oriented so that the interior of the region Ω′ is to the left of the curveand the exterior is to the right as the path defined by the orientation istraversed.

Remark 31.2. The assumption that the boundary of the region is also theboundary of its closure rules out regions such as the punctured disk or a regionwith a “slit” cut in its interior. The puncture and the slit would not be in theboundary of the closure.

Example 31.3. For a simple, convex region in the plane like a disk or a rectan-gle, its boundary has positive orientation if it is traversed counter-clockwise.

Example 31.4. Figure 31.1 shows a fairly complicated polygonal domain witha hole in the center

Ω = (x, y) | 1 ≤ max|x|, |y| ≤ 3.

The boundary has two pieces. For a positive orientation, the outer portion ofthe boundary must be traversed counter-clockwise. The inner portion must betraversed clockwise.

31.2 Parameterized Regular Surfaces and Nor-mals

We now take our regular regions in R2 and map them to smooth surfaces in R3.

245

-

6

?

x

y

Ω

C5

C6

C7

C8

C1

C2

C3

C4

?

?

- -

6

6

?

?--

6

6

Figure 31.1: A regular domain Ω ⊂ R2 with oriented boundary composed oftwo piecewise smooth simple closed curves. The curves are composed of eightsmooth segments.

246

Definition 31.5. Let Ω′ ⊂ R2 be a regular region, and suppose s : Ω′ → R3

is a C1, one-to-one function with

s(u, v) =

x(u, v)y(u, v)z(u, v)

.

Let S ⊂ R3 be the range of s. Let

n(u, v) = su × sv =

∣∣∣∣∣∣i j k∂x∂u

∂y∂u

∂z∂u

∂x∂v

∂y∂v

∂z∂v

∣∣∣∣∣∣ .Then if n(u, v) 6= 0 at every (u, v) ∈ Ω′ we call s a regular parametriza-tion of the two-dimensional regular surface S. We call n the normal toS induced by the parametrization s.

Remark 31.6. The boundary of a surface defined by a regular parametrizationis generically a curve in R3. (Though we will see below that it can degenerate toa point.) Such a boundary curve has a natural orientation induced by a positiveorientation of ∂Ω′ ⊂ R2.

Remark 31.7. While this definition seems be be fairly general, there are allkinds of important and useful surfaces that it does not fit. Surfaces like conesand cubes have corners where the normal is not continuous. Smooth surfacescan have singular parameterizations (e.g. polar coordinates of a sphere). Wediscuss some of these cases in Section 31.3.

Remark 31.8. We note some convenient notation for the vector function n

n(u, v) =∣∣∣∣ yu zuyv zv

∣∣∣∣ i− ∣∣∣∣ xu zuxv zv

∣∣∣∣ j +∣∣∣∣ xu yuxv yv

∣∣∣∣k=

∂(y, z)∂(u, v)

i− ∂(x, z)∂(u, v)

j +∂(x, y)∂(u, v)

k

= (yuzv − zuyv, zuxv − xuzv, xuyv − yuxv).

Here we note that

∂(x, y)∂(u, v)

=∣∣∣∣ xu xvyu yv

∣∣∣∣ =∣∣∣∣ xu yuxv yv

∣∣∣∣ .Note also that we can write

‖n‖ =

√∣∣∣∣∂(y, z)∂(u, v)

∣∣∣∣2 +∣∣∣∣∂(x, z)∂(u, v)

∣∣∣∣2 +∣∣∣∣∂(x, y)∂(u, v)

∣∣∣∣2.Observe that the direction of n depends on the order of the pair (u, v), but ‖n‖does not.

247

Remark 31.9. We need to justify use of the term “normal” for the vector nabove. Suppose we define a curve in the surface S by defining a trajectory

t 7→ (u(t), v(t)) ∈ Ω

so that we can define a composite function

t 7→ s(u(t), v(t))

lies entirely in S. Then the vector

d

dts(u(t), v(t)) = suu′ + sv v′

is tangent to the surface S. In fact, one can show that any tangent vector canbe obtained in this way. A quick calculation (left to the reader in Problem 31.1)shows that

su(u, v) · n(u, v) = sv(u, v) · n(u, v) = 0.

Combined with the previous equation, this shows that n is perpendicular to anytangent vector to S and is therefore normal to S.

Remark 31.10. We have defined the normal to be the cross product of thetangent vectors to the surface su and sv. The length of the normal vector istherefore the area of the parallelogram formed by the vectors su and sv.

Example 31.11. In Example 12.6 we described a right circular cylinder ofradius r centered on the y-axis using the function h : [0, 2π)× R→ R3

h(θ, s) =

r cos θs

r sin θ

,

where r > 0 was fixed. To fit the definition of a regular surface the domain ofthe mapping must be open and bounded. So we choose an L > 0 and defineh : (0, 2π)× (−L,L)→ R3 as above. Note that the cylinder is of finite length1

and has a “slit” along the line x = r, z = 0.The normal vector induced by this parametrization is computed as follows.

n(θ, s) =

∣∣∣∣∣∣i j k∂x∂θ

∂y∂θ

∂z∂θ

∂x∂s

∂y∂s

∂z∂s

∣∣∣∣∣∣=

∣∣∣∣∣∣i j k

−r sin θ 0 r cos θ0 1 0

∣∣∣∣∣∣= r(cos θi + sin θk).

1Most of the theorems we explore here are easiest to state for bounded domains. We leaveit to the reader to determine if they apply to particular unbounded domains on a case-by-casebasis.

248

This normal vector points outward, toward the exterior of the cylinder.The boundary of the cylinder has an orientation that is induced by a positive

orientation of the domain in the θs-plane. Like the direction of the normal, thisorientation depends on the (arbitrary) choice of the order (θ, s). The boundarysegments of the domain s = ±L map to circles of radius r about the y axisthat are boundary curves of the cylinder. The segment s = L in the domain isoriented right to left (decreasing θ) while the segment s = −L is oriented left toright (increasing θ). This induces the opposite orientation on the two circles inR3 that they map to. The two segments of the boundary θ = 0 and θ = 2π mapto the same line segment in R3: the segment (r, s, 0), s ∈ (−L,L). However,since the boundary segment θ = 2π is oriented “up” (increasing s) while thesegment θ = 0 is oriented “down” (decreasing s), the two image curves are havethe opposite orientation. This will be important in the next section.

Example 31.12. In Example 12.7 we parameterized a sphere of radius ρ > 0.Again, we modify the domain slightly and use the function g : (0, 2π)× (0, π)→R3 given by

g(θ, φ) =


,

to parameterize sphere of (fixed) radius ρ > about the origin with the “primemeridian” and the north and south poles deleted.

The normal vector induced by this parameterization can be computed asfollows.

n(θ, φ) =

∣∣∣∣∣∣i j k∂x∂θ

∂y∂θ

∂z∂θ

∂x∂φ

∂y∂φ

∂z∂φ

∣∣∣∣∣∣=

∣∣∣∣∣∣i j k

−ρ sin θ sinφ ρ cos θ sinφ 0ρ cos θ cosφ ρ sin θ cosφ −ρ sinφ

∣∣∣∣∣∣= −ρ2 sinφ(cos θ sinφ i + sin θ sinφ j + cosφ k)= −ρ2 sinφ eρ(θ, φ).

Note that the normal induced by the parametrization points to the interior ofthe sphere.

The boundary segments of the domain φ = 0 and φ = π each map to asingle point, the north and south poles respectively. The other two segments,θ = 0 and θ = 2π intersect at the “prime meridian.” As with the cylinder,the orientation of these portions of the boundary of the sphere run in oppositedirections.

Example 31.13. Of course, there is more than one way to represent a surface.Let use consider a surface represented by the graph of a function

z = f(x, y), (x, y) ∈ D ⊂ R2.

249

Note that we can represent this as a parameterized surface by defining g : D →R3 by

g(x, y) =

x(x, y)y(x, y)z(x, y)

=

xy

f(x, y)

.

The normal induced by this parametrization is

n(x, y) =

∣∣∣∣∣∣i j k∂x∂x

∂y∂x

∂z∂x

∂x∂y

∂y∂y

∂z∂y

∣∣∣∣∣∣=

∣∣∣∣∣∣i j k1 0 fx(x, y)0 1 fy(x, y)

∣∣∣∣∣∣= −fx(x, y)i− fy(x, y)j + k.

Note that this normal points “above” the graph in the positive z direction.As a specific example consider describing the upper hemisphere of a sphere

of radius ρ > 0 a the graph of the function

z = f(x, y) =√ρ2 − x2 − y2,

for (x, y) ∈ D = (x, y) | x2 + y2 ≤ ρ2. For this graph we get the normal

n(x, y) =

(x√

ρ2 − x2 − y2,

x√ρ2 − x2 − y2

, 1

).

In Problem 31.5 the reader is asked to compare the normals induced on theupper hemisphere by the two parameterizations given above.

31.3 Oriented Surfaces with Corners

All regular surfaces are smooth, open sets with boundary in R3. In many ap-plications we need to consider surfaces that are closed and have no boundary(e.g., the boundary surface of a three-dimensional volume). We also will rou-tinely encounter surfaces that have corners and therefore are not smooth. Thefollowing definition describes a class of surfaces that is rich enough to allow usto consider a large collection of reasonable examples encountered in practice.More general definitions are possible. (See, e.g. [3, Section 8.5] for a definitionof an orientable manifold.)

250

Definition 31.14. We say that S ⊂ R3 is an oriented surface if it isthe union of a finite collection of nonintersecting parameterized regular sur-faces (called surface patches) and their boundaries (which are allowed tointersect). The boundaries have the following properties.

• At most a finite collection of points can be the intersection of morethan two pieces of the boundaries.

• If two portions of the boundary intersect in a curve, then the individualportions must have the opposite orientation.

Remark 31.15. The most curious part of this definition is the “edge condition”that when two surface patches meet in a curve the intersecting boundary curvesmust have opposite orientation. It is pretty easy to see that this is the correctthing to do if we are to keep the normals of the patches “aligned.” Suppose wehave a single region Ω′ ⊂ R2 and we arbitrarily split it into two subregions. (SeeFigure 31.2.) Note that if the boundaries of the two subregions are positivelyoriented, the orientation runs in opposite directions along the newly introducedpiece of the boundary.

-

?

6

-

6

-

?

6

-

6

@@@@@@

@@@I@@@R

Figure 31.2: A regular domain with positive oriented boundary is split into tosubdomains. The orientation of the two boundaries is opposite on the segmentwhere they intersect.

Remark 31.16. Roughly speaking, the definition of an oriented surface ensuresthat our surfaces will have two distinct sides. Essentially all of the surfaces thatwe encounter in daily life have this property. We commonly refer to the twosides as “inside” and “outside’ in the case of a closed surface like a sphere or“top” and “bottom” or ‘left” and “right” in the case of a surface with boundarylike cone or a hemisphere. The only exception that most people can think ofis a Mobius strip. (See Figure 31.3.) You can form such a surface by taking along, narrow piece of paper. Of course, you can join the two ends together in

251

-1

-0.5

0

0.5

1-1

0

1

-0.2

0

0.2

-1

-0.5

0

0.5

1

Figure 31.3: A Mobius strip.

the “normal” way to form a hoop - a portion of a cylinder. The hoop has twosides - an inside and an outside. An ant walking on the inside of the surfacewould have to jump over the edge to get to the outside. However, if you gaveone on the ends of the strip of paper a half-twist before attaching them, youwould have a Mobius strip. The strip only has one side in the sense that anant could get from any point to any other by simply walking along the strip. Itwould never have to jump over the edge.

Note that we can give the boundary of one side of the original two-dimensionalstrip (think of it as the domain of the surface map) a positive orientation. Tryit. Take a piece of paper and draw arrows around the edge of one side. The twoends are oriented in the opposite direction when the strip is lying flat. Theyalso have the opposite orientation when we attach the sides in a hoop. How-ever, when we give the strip a half twist and attach the ends to form the Mobiusstrip, they have the same orientation - violating the requirements of an orientedsurface.

Thus, any oriented surface has two unit normal vectors ±n that vary contin-uously on any of the surface patches. There may, of course, be discontinuities inthe unit normals at the boundaries of the patches, but the cancelation propertyensures that the normal always points to the same “side” of the surface.

Also note that our previous examples fit the definition nicely.

Example 31.17. The closure of the portion of a cylinder of radius r aboutthe y-axis described in Example 31.11 fits this definition since the overlappingboundary segments on the line segment along x = r, z = 0 have the oppositeorientation.

Example 31.18. The polar coordinate parameterization of the sphere de-scribed in Example 31.12 also fits the definition since the boundary segments ofthe domain corresponding to φ = 0, π degenerate to points and the overlapping

252

segments corresponding to θ = 0, 2π have the opposite orientation as they runalong the prime meridian.

Example 31.19. Consider the surface of a three-dimensional cone

C = (x, y, z) |√x2 + y2 < z < 2.

We need two patches to construct the surface. We define s1(r, θ) : (0, 2) ×(0, 2π)→ R3 by

s1(r, θ) = (r cos θ, r sin θ, 2),

and s2(s, ω) : (0, 2)× (0, 2π)→ R3 by

s2(s, ω) = (s sinω, s cosω, s)

The first patch parameterizes the flat “top” while the second parameterizes thecone. Note that we have chosen the parameterizations so that along the circle ofradius two where the two patches intersect, the induced boundaries of the twopatches are oriented in the opposite direction as required. The reader shouldverify that all other portions of the boundary of the two domains behave asrequired.

Computing the two normals will show us that they have the “same orienta-tion” as well. We compute

n1(r, θ) ==

∣∣∣∣∣∣i j k

cos θ sin θ 0−r sin θ r cos θ 0

∣∣∣∣∣∣ = rk

n2(s, ω) ==

∣∣∣∣∣∣i j k

sinω cosω 1s cosω −s sinω 0

∣∣∣∣∣∣ = s(sinωi + cosωj− k).

Note that note that on both of the patches, the normal points to exterior of thecone C.

Remark 31.20. Surfaces in R3 and their boundary curves are geometric setsthat don’t really depend on the parameterizations that we use to describe them.In many ways it would be preferable to give “invariant” definitions an orientedsurface that do not depend on a parameterization. These definition can be abit vague as is the one that follows.

Definition 31.21. Let S ⊂ R3 be an oriented surface with unit normal n.Suppose its boundary ∂S is composed of a finite number of simple closedcurves. Then we say that a boundary curve is oriented in the directionof n if it obeys the following right-hand rule: if you place your typical righthand with thumb pointing in the direction of n and fingers in the directionof the boundary path then the palm should face the surface. Equivalently,if you (appropriately sized) were to walk in the direction of the boundarypath standing “up” in the direction of n then surface would be to your left.

253

This definition is simply an invariant way of saying that the orientation ofthe boundary of a surface induced by a positive orientation of the boundary ofthe domain of a parameterization “agrees” with the direction of the normal tothe surface induced by the parameterization.

31.4 Surface Area

We now proceed in a manner parallel to our development of integrals over curves.There are a couple of important points of comparison.

• Computations of integrals over curves always reduce to integrals over aninterval on the real line. Integrals over two-dimensional surfaces in R3

always reduce to integrations over regions in the plane or the sum of suchintegrals.

• For integrals over curves the key “fudge factor” relating small bits of thetrajectory to small bits on the real line was the length of the tangentvector ‖r′‖. For integrals over surfaces the key fudge factor relating smallbits of the curved surface to small bits of the plane is the length of thenormal vector induced by the parametrization ‖n‖. Recall that the lengthof the normal vector is the area of the parallelogram formed by the tangentvectors to the coordinate curves on the surface.

We begin with the definition of surface area.

Definition 31.22. Let Ω′ ⊂ R2 be a regular region and suppose s : Ω′ → R3

is a regular parametrization of the surface S. We then define the surfacearea of S to be

A(S) =∫∫

Ω′‖n(u, v)‖ dA(u, v).

More generally, the surface area of an oriented surface is defined to be thesum of the areas of its surface patches.

As before, our justification for this definition is that it yields exactly thetwo-dimensional Hausdorff measure of S - our fundamental notion of surfacearea.

Theorem 31.23. Let Ω′ ⊂ R2 be a regular region and suppose s : Ω′ → R3

is a regular parametrization of the surface S. Then

A(S) = H2(S).

Once again, we refer to more advanced texts for the proof of this theorem, e.g.[6, p. 101].

254

Example 31.24. Let’s begin by computing the surface area of S+ρ the upper

half of a sphere of radius ρ > 0. We will first use the parametrization

g(θ, φ) =


,

with 0 ≤ θ < 2π and 0 < φ ≤ π2 . We have computed the normal induced by

this parametrization in Example 31.12.

n(θ, φ) = −ρ2 sinφ(cos θ sinφi + sin θ sinφj + cosφk).

Its norm is simply‖n(θ, φ)‖ = ρ2 sinφ.

Thus, the area of the hemisphere is

A(S+ρ ) = ρ2

∫ 2π

0

∫ π/2

0

sinφ dφ dθ = 2πρ2.

We can compute the same surface area using the parametrization given inExample 31.13. There, we used the graph

z = f(x, y) =√ρ2 − x2 − y2,

for (x, y) ∈ Dρ = (x, y) | x2 +y2 < ρ2 describe the hemisphere. We computedthe normal

n(x, y) =

(x√

ρ2 − x2 − y2,

y√ρ2 − x2 − y2

, 1

),

whose norm is given by

‖n(x, y)‖ =ρ√

ρ2 − x2 − y2.

Our surface area is given by the integral

A(S+ρ ) =

∫∫Dρ

ρ√ρ2 − x2 − y2

dA(x, y).

We use the change of variable theorem to change this to polar coordinates toget

A(S+ρ ) =

∫ ρ

0

∫ 2π

0

ρ√ρ2 − r2

r dθ dr

= 2πρ∫ ρ

0

r√ρ2 − r2

dr

= −πρ∫ 0

ρ2

1√udu

= −πρ 2√u∣∣0ρ2

= 2πρ2.

Here we have made the single variable substitution u = ρ2 − r2, du = −2r dr.

255

31.5 Scalar Surface Integrals

We now define the integral of a scalar field over a surface. Once again we giveonly a “practical” formula for this integral.

Definition 31.25. Let Ω′ ⊂ R2 be a regular region and suppose s : Ω′ → R3

is a regular parametrization of the surface S. Let f : S → R be a scalar fielddefined (at least) on S. Then, the scalar surface integral of f over S is∫

Sf dS =

∫∫Ω′f(s(u, v))‖n(u, v)‖ dA(u, v),

where n(u, v) is the normal induced by the parametrization s. Again, theintegral of f over an oriented surface is the sum of its integrals over thesurface patches.

If the notation above is to make sense, we must show that the integral over asurface doesn’t depend on the parametrization we use to represent the surface.We will state, but not prove, such a theorem.

Theorem 31.26. Suppose s1 : Ω′1 → R3 and s2 : Ω′2 → R3 are regularparameterizations of the surface S. Let f : S → R be a scalar field defined(at least) on S. Then∫∫

Ω′1

f(s1(u1, v1))‖n1(u1, v1)‖ dA(u1, v1)

=∫∫

Ω′2

f(s2(u2, v2))‖n2(u2, v2)‖ dA(u2, v2),

where n1 is the normal induced by the parametrization s1 and n2 is thenormal induced by the parametrization s2.

Remark 31.27. Since we have not given fundamental definition of the integralof a scalar function over a physical surface in R3, we appeal to the surface area(where there is such a connection) to make the argument that our formula makessense (just as we did in the case of integrals over curves).

Example 31.28. Suppose we wish to integrate the function

f(x, y, z) = z3(x4 + 2x2y2 + y4)

over S+ρ the upper half of a sphere of radius ρ. We again use the parametrization

g(θ, φ) =

x(θ, φ)y(θ, φ)z(φ)

=


,

256

and we computef(g(θ, φ)) = ρ5 cos3 φ sin2 φ.

Using this and ‖n(θ, φ)‖ = ρ2 sinφ we get,∫S+ρ

f dS =∫ 2π

0

∫ π/2

0

ρ7 cos3 φ sin3 φ dφ dθ =π

6.

31.6 Surface Flux Integrals

In this section we define the integral of the “flux” of a vector field through asurface. That is, the integral of the component of the vector field normal tothe surface. This integral is used heavily in computations of fluid flow andelectromagnetism.

In the computation of the surface area and the integral of a scalar functionthe direction of the normal vector did not matter. We used only the lengthof the normal induced by the parametrization as our fudge factor. For a fluxintegral, the direction of the normal (and its relation to the direction of thevector field) is crucial. Thus, these integrals will be defined only on orientablesurfaces.

Definition 31.29. Let S be an orientable surface and let ±n be the unitnormals on the two sides of the surface. Let s : Ω′ → R3 be any regularparametrization of S. Let n(u, v) be the normal induced by the parametriza-tion. Let v : S → R3 be a vector field defined (at least) on S. Then, thesurface flux integral of v over S in the direction of n is∫

Sv · n dS = sgn(n · n)

∫∫Ω′

v(s(u, v)) · n(u, v) dA(u, v),

Here sgn(n · n) is the sign of n · n. That is, sgn(n · n) = +1 if n · n > 0 andsgn(n · n) = −1 if n · n < 0.Note that if S has multiple patches, the integral must be computed by takingthe sum over the patches.

Once again, if the notation above is to make sense, we must show thatthe integral over a surface doesn’t depend on which parametrization we use torepresent the surface. We state such a result without proof.

257

Theorem 31.30. Suppose S is an orientable surface with unit normal n,and s1 : Ω′1 → R3 and s2 : Ω′2 → R3 are regular parameterizations of Swhich induce normals n1 and n2 respectively. Let v : S → R3 be a vectorfield defined (at least) on S. Then

sgn(n1 · n)∫∫

Ω′1

v(s1(u1, v1)) · n1(u1, v1) dA(u1, v1)

= sgn(n2 · n)∫∫

Ω′2

v(s2(u2, v2)) · n2(u2, v2) dA(u2, v2).

Remark 31.31. To compare this integral to the scalar surface integral notethat

n(u, v) = sgn(n · n)n(u, v)‖n(u, v)‖

.

That is, the unit normal in the direction we are looking for can be obtainedby normalizing the normal induced by the parameterization and attaching theappropriate sign. After doing this we get∫S

v · n dS = sgn(n · n)∫∫

Ω′v(s(u, v)) · n(u, v) dA(u, v)

=∫∫

Ω′v(s(u, v)) ·

(sgn(n · n)

n(u, v)‖n(u, v)‖

)‖n(u, v)‖ dA(u, v)

=∫∫

Ω′v(s(u, v)) · n(u, v)‖n(u, v)‖ dA(u, v).

Example 31.32. Let S be a the portion of the cylindrical surface of radius oneabout the z-axis between the planes z = 0 and z = 4. We wish to compute theflux of the vector field

v(x, y, z) = (x− y)i + xj + 2zk

through this surface in the direction of the outward unit normal. We parame-terize the surface using

h(θ, s) =

cos θsin θs

,

for (θ, s) ∈ D = (θ, s) | θ ∈ (0, 2π), s ∈ (0, 4). The normal vector induced bythis parametrization is computed as follows.

n(θ, s) =

∣∣∣∣∣∣i j k∂x∂θ

∂y∂θ

∂z∂θ

∂x∂s

∂y∂s

∂z∂s

∣∣∣∣∣∣=

∣∣∣∣∣∣i j k

− sin θ cos θ 00 0 1

∣∣∣∣∣∣= (cos θi + sin θj).

258

The normal induced by the parametrization points outward - the desired direc-tion. Thus we compute∫

Sv · n dS =

∫∫D

v(h(θ, s)) · n(θ, s) dA(θ, s)

=∫ 4

0

∫ 2π

0

(cos θ − sin θ, cos θ, 2s) · (cos θ, sin θ, 0) dθ ds

=∫ 4

0

∫ 2π

0

cos2 θ dθ ds = 4π.

Example 31.33. Let us compute the flux of the vector field

u(x, y, z) = yi + xj + zk

through the closed unit sphere S in the direction of the unit outward normal n.We can use the parametrization and induced normal defined in Example 31.12,

g(θ, φ) =

cos θ sinφsin θ sinφ

cosφ

,

n(θ, φ) =

∣∣∣∣∣∣i j k

− sin θ sinφ cos θ sinφ 0cos θ cosφ sin θ cosφ − sinφ

∣∣∣∣∣∣= − sinφ(cos θ sinφi + sin θ sinφj + cosφk.

Here (θ, φ) ∈ D = (θ, φ) | θ ∈ (0, 2π), φ ∈ (0, π). We note that the normalinduced by the parametrization points inward, so sgn(n·n) = −1. In preparationfor the integral we compute

u(g(θ, φ)) · n(θ, φ) = − sinφ

sin θ sinφcos θ sinφ

cosφ

· cos θ sinφ

sin θ sinφcosφ

= −(2 sin θ cos θ sin2 φ+ sinφ cos2 φ).

This gives us∫S

v · n dS = (−1)∫∫

D

u(g(θ, φ)) · n(θ, φ) dA(θ, φ)

=∫ 2π

0

∫ π

0

(2 sin θ cos θ sin2 φ+ sinφ cos2 φ) dφ dθ =4π3.

259

31.7 Generalized (n− 1)-Dimensional Surfaces

While two-dimensional surface in R3 are by far the most important case of an(n−1)-dimensional surface in Rn, there are occasions when more general resultsare needed. We will give some basic definitions and theorems here.

Definition 31.34. Let Ω′ ⊂ Rn−1 be a bounded, open set with positiveRiemann volume such that the boundary of Ω′ is the boundary of its closure.Suppose s : Ω′ → Rn is a C1, one-to-one function with

s(u1, u2, . . . , un−1) =

x1(u1, u2, . . . , un−1)x2(u1, u2, . . . , un−1)

...xn(u1, u2, . . . , un−1)

.

Let S ⊂ Rn be the range of s. Let

n(u1, u2, . . . , un−1) =

∣∣∣∣∣∣∣∣∣∣∣∣

e1 e2 · · · en∂x1∂u1

∂x2∂u1

· · · ∂xn∂u1

∂x1∂u2

∂x2∂u2

· · · ∂xn∂u2

......

. . ....

∂x1∂un−1

∂x2∂un−1

· · · ∂xn∂un−1

∣∣∣∣∣∣∣∣∣∣∣∣.

Then if n(u1, u2, . . . , un−1) 6= 0 at every (u1, u2, . . . , un−1) ∈ Ω′ we call s aregular parametrization of the (n− 1)-dimensional surface S. We call nthe normal to S induced by the parametrization s.

The definition of “surface area” follows directly from the definition of two-dimensional surface area.

Definition 31.35. Let Ω′ ⊂ Rn−1 be an open set with positive Riemannvolume and suppose s : Ω′ → Rn is a regular parametrization of the surfaceS. We then define the (n− 1)-dimensional surface area of S to be

An−1(S) =∫

Ω′‖n(u1, u2, . . . , un−1)‖ dV (u1, u2, . . . , un−1),

where n(u1, u2, . . . , un−1) is the normal induced by the parametrization s.

As with the case of two-dimensional surfaces, one can show that this defini-tion yields exactly the (n− 1)-dimensional Hausdorff measure of S.

260

Theorem 31.36. Let Ω′ ⊂ Rn−1 be an open set with positive Riemannvolume and suppose s : Ω′ → Rn is a regular parametrization of the surfaceS. Then

An−1(S) = Hn−1(S).

Once again, we refer to more advanced texts for the proof of this theorem,e.g. [6, p. 101].

As with surface area, we define analogs for the integrals of scalar functionand the flux of vector fields.

Definition 31.37. Let Ω′ ⊂ Rn−1 be an open set with positive Riemannvolume and suppose s : Ω′ → Rn is a regular parametrization of the surfaceS. Let f : S → R be a scalar field defined (at least) on S. Then, the scalarsurface integral of f over S is∫

Sf dS =

∫Ω′f(s(u1, . . . , un−1))‖n(u1, . . . , un−1)‖ dV,

where n(u1, . . . , un−1) is the normal induced by the parametrization s.

Definition 31.38. Let S be an orientable (n − 1)-dimensional surface inRn and let ±n be the unit normals on the two sides of the surface. Lets : Ω′ → Rn be any regular parametrization of S. Let n(u1, u2, . . . , un−1)be the normal induced by the parametrization. Let v : S → Rn be a vectorfield defined (at least) on S. Then, the surface flux integral of v over Sin the direction of n is∫

Sv · n dS = sgn(n · n)

∫Ω′

v(s(u1, . . . , un−1)) · n(u1, . . . , un−1) dV.

Remark 31.39. We have used the term “orientable (n−1)-dimensional surfacein Rn” without giving it a formal definition. Such a definition would have tobe rather technical, but it’s purpose is clear - to ensure a “two-sided” surface.We will content ourselves with the assertion that the boundary of any regularregion with positive n-dimensional Riemann volume has an oriented surface asits boundary, and the concepts of interior and exterior unit normal consistentlymake sense on all patches of the boundary.

Problems

261

Problem 31.1. Let s(u, v) be a regular parametrization of a surface. Letn(u, v) be the normal induced by that parametrization. Show that

su(u, v) · n(u, v) = sv(u, v) · n(u, v) = 0.

Problem 31.2. Find the surface area of the cone

z = 3√x2 + y2, x2 + y2 ≤ 4,

by first expressing the surface parametrically and then using the surface areaformula. You may use any parametric representation you wish.

Problem 31.3. Write a formula for the surface area of the ellipsoid

x2

a2+y2

b2+z2

b2= 1

by first expressing the surface parametrically and then using the surface areaformula. Do not evaluate the integral, but be sure to express it as an iteratedintegral of a clearly defined function with well defined limits of integration.

Problem 31.4. Let S be the upper (z > 0) half of the unit sphere in R3.Compute the following(a) ∫

Sf dS

where f(x, y, z) = x2.(b) ∫

Sv · n dS

where v(x, y, z) = xi + yj + zk and where n is the unit outward normal to thesphere.(c) ∫

S(∇× v) · n dS

where v(x, y, z) = −yi + xj + z2k and where n is the unit outward normal tothe sphere.

Problem 31.5. In Examples 31.12 and 31.13 we developed two parameteriza-tions of the upper hemisphere that induced the two normals

n1(θ, φ) = −ρ2 sinφ(cos θ sinφi + sin θ sinφj + cosφk),

n2(x, y) =

(x√

ρ2 − x2 − y2,

x√ρ2 − x2 − y2

, 1

).

Explain the differences between the two. What is their relationship to the twounit normals to the surface?

262

Part IV

The Fundamental Theoremsof Vector Calculus

263

Chapter 32

Introduction to theFundamental Theorem ofCalculus

In elementary calculus of one dimension, the fundamental theorem of calculuscan be expressed in the following form.

Theorem 32.1. Suppose f : [a, b]→ R is C1. Then∫ b

a

f ′(x) dx = f(x)|ba = f(b)− f(a).

Thus, the integral of the derivative of a function over an interval is equal theoriginal function evaluated over the boundary (endpoints) of the interval.

In this chapter we will discuss several generalizations of this theorem. Allwill have the form∫

AThe derivative of a function =

∫∂A

The original function

Where A is some sort of set and ∂A is its boundary. The theorems of the nextchapters are all intimately related.

• The type of sets will vary from curves to surfaces to three-dimensionalregions to areas in the plane.

• The type of functions will vary between scalar and vector fields.

• The type of derivative will vary between partial derivatives, the gradient,the curl, and the divergence.

264

However, all of the theorems have the same basic form and are founded on thesame basic ideas as the original fundamental theorem.

Our three “main” theorems fall in a clear progression. Let us look at simpli-fied statements of these theorems in R3. In the following φ : R3 → R is a scalarfield and v : R3 → R3 is a vector field.

1. The fundamental theorem of gradients.∫P∇φ · dr = φ(b)− φ(a).

Here P is a path in the domain of φ with initial point a and terminal pointb.

2. Stokes’ theorem. ∫∫S

(∇× v) · n dS =∫∂S

v · dr.

Here S ⊂ R3 is a surface surface with unit normal n and ∂S is its boundary.

3. The divergence theorem.∫∫∫Ω

∇ · vdV =∫∫

∂Ω

v · n dS.

Here Ω ⊂ R3 be a region with positive Riemann volume and ∂Ω is itsboundary with unit outward normal n.

The hierarchy of results can be seen in the increasing dimension of the do-mains and their boundaries as seen in the following table.

Theorem Domain Boundary DerivativeGradient 1-D path 0-D points gradStokes 2-D surface 1-D path curlDivergence 3-D volume 2-D surface div

This hierarchy can be presented in a very elegant fashion in the language ofdifferential forms. Differential forms have the additional advantage that theymake it easy to extend these results to higher dimensions. Because of theseadvantages, some authors choose to present the fundamental theorems initiallyin the language of differential geometry. (See, e.g. [3, Chapter 9].) We willcontent ourselves with the language of traditional calculus.

265

Chapter 33

Green’s Theorem in thePlane

Before dealing with the hierarchy of fundamental theorems described above wedeal with a special case of the fundamental theorem in the plane. We will seethat this theorem, named after the British mathematician George Green, is aspecial case of both Stokes’ theorem and the divergence theorem. However, itis quite useful in its own right and it is relatively easy to prove, so we will useit to start our discussion.

Theorem 33.1 (Green’s theorem in the plane). Let D ⊂ R2 be a regionin the plane that is the union of a finite number of simple regions. SupposeP : D → R and Q : D → R are C1. Then∫∫

D

(∂Q

∂x− ∂P

∂y

)dA =

∫∂D

P dx+Q dy.

The notation is traditional. While I have expressed my distaste for thisnotation for line integrals, this is one situation where it is almost universallyused in the literature.

More general hypotheses on D are possible, but ours suffices in most situa-tions and can usually be easily extended when necessary.

Proof. We begin by proving the theorem for simple regions D ⊂ R2. Since D issimple, it is both x-simple and y-simple. Thus there exist constants a, b, c, andd and functions y1, y2 : [a, b]→ R, x1, x2 : [c, d]→ R such that

D = (x, y) ∈ R2 | a < x < b, y1(x) < y < y2(x)= (x, y) ∈ R2 | c < y < d, x1y) < x < x2(y).

266

We begin by calculating

−∫∫

D

∂P

∂ydA = −

∫ b

a

∫ y2(x)

y1(x)

∂P

∂y(x, y) dy dx =

∫ b

a

P (x, y1(x))−P (x, y2(x)) dx.

Here we have used Fubini’s theorem and the fact that D is y-simple.Now, in general, a y-simple region has a positively oriented boundary con-

sisting of four parts. (See Figure 33.1.)

-

6y

xba

C2

C1

C3

C4? 6

-

Figure 33.1: A positively oriented boundary around a y-simple region.

1. The bottom curve C1 is oriented left to right.

[a, b] 3 t 7→ r1(t) = (t, y1(t)).

2. The line segment C2 on the right is oriented “up.”

[y1(b), y2(b)] 3 t 7→ r2(t) = (b, t).

3. The top curve C2 is oriented right to left.

[a, b] 3 t 7→ r3(t) = (a+ b− t, y2(a+ b− t)).

4. The line segment C4 on the left is oriented “down.”

[y1(a), y2(a)] 3 t 7→ r4(t) = (a, y2(a) + y1(a)− t).

Of course, the line segments C2 and C4 could degenerate to single points.Let us compute the line integral∫

∂D

P dx

around this curve. Translating into the notation that I prefer, this means cal-culating the line integral of the vector field

v(x, y) = (P (x, y), 0)

over the four portions of the boundary. The calculations go as follows.

267

1. Over C1 we haver′1(t) = (1, y′1(t)),

so v · r′1 = (P, 0) · (1, y′1) = P (t, y1(t)) and∫C1P dx =

∫C1

v · dr1 =∫ d

c

P (t, y1(t)) dt.

2. Over C2 we haver′2(x) = (0, 1),

so v · r′2 = (P, 0) · (0, 1) = 0 and∫C2P dx =

∫C2

v · dr2 = 0.

3. Over C3 we haver′3(t) = (−1,−y′2(a+ b− t)),

so v · r′3 = (P, 0) · (−1,−y′2) = −P (a+ b− t, y2(a+ b− t)) and∫C3P dx =

∫C3

v · dr3

=∫ b

a

−P (a+ b− t, y2(a+ b− t)) dt

= −∫ b

a

P (x, y2(x)) dx.

Here we have made the change of variable x = a+ b− t.

4. Over C4 we haver′4(t) = (0,−1),

so v · r′4 = (P, 0) · (0,−1) = 0 and∫C4P dx =

∫C4

v · dr4 = 0.

Putting these together, we see that for any y-simple region we have

−∫∫

D

∂P

∂ydA =

∫∂D

P dx.

We now use the fact that D is x-simple and compute∫∫D

∂Q

∂xdA = −

∫ d

c

∫ x2(y)

x1(y)

∂Q

∂x(x, y) dx dy =

∫ d

c

Q(x2(y), y)−Q(x2(y), y) dy.

268

In a similar way to the calculation above, we will show that this is the sameas the line integral ∫

∂D

Q dy.

This time we use the fact that the positively oriented boundary of an x-simpleregion can be split into four portions (see Figure 33.2), and calculate∫

∂D

Q dy =∫∂D

v · dr,

where v = (0, Q), as follows.

-

6y

x

d

c

C1

C4

C2

C3

-

6?

Figure 33.2: A positively oriented boundary around an x-simple region.

1. The “right” curve C1 is oriented “up” and we parameterize it as

[c, d] 3 t 7→ r1(t) = (x2(t), t).

We calculate v · r′1 = (0, Q) · (x′2, 1) = Q so that∫C1Q dy =

∫C1

v · dr =∫ d

c

Q(x2(t), t) dt.

2. The top curve C2 is a line segment oriented right to left and we parame-terize it as

[x1(d), x2(d)] 3 t 7→ r2(t) = (x2(d) + x1(d)− t, d).

We calculate v · r′2 = (0, Q) · (−1, 0) = 0 so that∫C2Q dy = 0.

3. The “left” curve C3 is oriented “down” and we parameterize it as

[c, d] 3 t 7→ r3(t) = (x1(c+ d− t), c+ d− t).

269

We calculate v · r′3 = (0, Q) · (x′1,−1) = −Q so that∫C3Q dy =

∫C3

v · dr = −∫ d

c

Q(x1(c+ d− t), c+ d− t) dt

= −∫ d

c

Q(x1(y), y) dy,

where we have made the substitution y = c+ d− t.

4. The bottom curve C4 is a line segment oriented left to right and we pa-rameterize it as

[x1(c), x2(c)] 3 t 7→ r4(t) = (t, c).

We calculate v · r′4 = (0, Q) · (1, 0) = 0 so that∫C4Q dy = 0.

Putting these together, we see that for any x-simple region we have∫∫D

∂Q

∂xdA =

∫∂D

Q dy.

In a simple region both of the relations above hold and we have∫∫D

(∂Q

∂x− ∂P

∂y

)dA =

∫∂D

P dx+Q dy.

We now claim that the theorem holds for any region that can be dividedinto simple regions. Clearly dividing D into subregions does not change the leftside of the equation, but what about the line integral on the right. Dividing aregion into subregions introduces new portions of the boundary. Fortunately, aswe noted in Figure 31.2, the “new” sections of the boundary will always havethe opposite orientation as portions of the boundaries of the two neighboringsubdomains that they divide. As such, their contributions to the total lineintegral will always cancel. (Recall that the line integral of any vector fieldalong the reverse of a path is the negative of the line integral along the forwardpath.) The only contribution to the total line integral will be the portions ofthe boundary of the original region in their original orientation.

Let us look at a few cases to see that the two calculations indeed yield thesame result.

Example 33.2. Consider the functions P (x, y) = −x2y and Q(x, y) = y2x onthe disk of radius R,

DR = (x, y) | x2 + y2 < R2.

270

We first use polar coordinates to compute the double integral∫∫DR

(∂Q

∂x− ∂P

∂y

)dA =

∫∫DR

y2 + x2 dA

=∫ 2π

0

∫ R

0

(r2)r dr dθ =π

2R4.

We now parameterize the boundary with a positive orientation using r(t) =(R cos t, R sin t) for t ∈ [0, 2π]. We calculate the line integral∫

∂DR

P dx+Q dy

=∫ 2π

0

−(R cos t)2R sin t(−R sin t) + (R sin t)2R cos t(R cos t) dt

= R4

∫ 2π

0

2 sin2 t cos2 t dt =π

2R4.

Example 33.3. Consider the polygonal domain with a hole in the center

Ω = (x, y) | 1 ≤ max|x|, |y| ≤ 3,

described in Figure 31.1. We want to demonstrate Green’s theorem on thisdomain using P (x, y) = xy2 and Q(x, y) = x+ x2y. In this case,∫∫

Ω

(∂Q

∂x− ∂P

∂y

)dA =

∫∫Ω

1 dA = 36− 4 = 32.

This is simply the area of Ω – the area of a square of side six minus the area ofa square of side two.

To get the boundary integrals we parameterize each of the eight segmentsand compute the line integral.

1. Using r1(t) = (3, t), t ∈ [−3, 3] we get dx = 0, dy = dt and∫C1P dx+Q dy =

∫ 3

−3

(3 + 9t)dt = 18.

2. Using r2(t) = (−t, 3), t ∈ [−3, 3] we get dx = −dt, dy = 0 and∫C2P dx+Q dy =

∫ 3

−3

−t(9)(−dt) = 0.

3. Using r3(t) = (−3,−t), t ∈ [−3, 3] we get dx = 0, dy = −dt and∫C3P dx+Q dy =

∫ 3

−3

(−3− 9t)(−dt) = 18.

271

4. Using r4(t) = (t,−3), t ∈ [−3, 3] we get dx = dt, dy = 0 and∫C4P dx+Q dy =

∫ 3

−3

t(9) dt = 0.

5. Using r5(t) = (1,−t), t ∈ [−1, 1] we get dx = 0, dy = −dt and∫C5P dx+Q dy =

∫ 1

−1

(1− t)(−dt) = −2.

6. Using r6(t) = (−t,−1), t ∈ [−1, 1] we get dx = −dt, dy = 0 and∫C6P dx+Q dy =

∫ 1

−1

−t(−dt) = 0.

7. Using r7(t) = (−1, t), t ∈ [−1, 1] we get dx = 0, dy = dt and∫C7P dx+Q dy =

∫ 1

−1

(−1 + t) dt = −2.

8. Using r8(t) = (t, 1), t ∈ [−1, 1] we get dx = dt, dy = 0 and∫C8P dx+Q dy =

∫ 1

−1

t dt = 0.

Summing these gives us∫∂Ω

P dx+Q dy = 18 + 18− 2− 2 = 32,

as expected.

Problems

In the following problems verify Green’s theorem in the plane for the givenfunctions P and Q over the given region D ⊂ R2. That is, compute both sidesof this version of the fundamental theorem and show that they yield the sameresult.

Problem 33.1. P (x, y) = 3y, Q(x, y) = 5x, D = (x, y) | 9 ≤ x2 + y2 ≤ 16.

Problem 33.2. P (x, y) = x2, Q(x, y) = y2, D = (x, y) | 1 ≤ x2 + y2 ≤ 4.

Problem 33.3. P (x, y) = 2x, Q(x, y) = 3y, D = (x, y) | y > 0, y < x < 1.

Problem 33.4. P (x, y) = 2y, Q(x, y) = x2, D = (x, y) | − 1 ≤ x ≤ 1, −1 ≤y ≤ 1.

272

Problem 33.5. P (x, y) = x + 2y, Q(x, y) = 3x + y, D = (x, y) | 0 < y <1, y2 < x < y.

In the following problems compute the value of the line integral of the vectorfield v over the given oriented, closed curve C. Use any method you wish.

Problem 33.6. v(x, y) = y2i + xj. C is the boundary of the rectangle x ∈[−2, 3], y ∈ [1, 5] oriented in a counterclockwise direction.

Problem 33.7. v(x, y) = (ey − x2y)i + (xey + xy2)j. C is the circle of radiustwo about the origin oriented in a positive direction.

Problem 33.8. v(x, y) = (x2y + 13y

3)i + (xy2 + 3)j. C is the triangle withvertices (0, 1), (1, 0), and (−1, 0), traversed in a clockwise direction.

Problem 33.9. v(x, y) = (x2y+y3+tanx)i+(xy2+x2y+x3+sec y2)j. C is theboundary of the square x ∈ [−1, 1], y ∈ [−1, 1] oriented in a counterclockwisedirection.

273

Chapter 34

Fundamental Theorem ofGradients

The first theorem in our “hierarchy” of fundamental theorems involves lineintegrals of the gradient of a scalar function over a path. Its proof followsdirectly from the chain rule and the fundamental theorem from elementarycalculus.

Theorem 34.1 (Fundamental theorem of gradients). Let Ω ⊂ Rn, and letφ : Ω→ R be C1. Suppose P is a path contained completely in Ω with initialpoint a and terminal point b. Then∫

P∇φ · dr = φ(b)− φ(a).

Proof. Let r : [t0, t1]→ Ω be any trajectory describing P. Note that this impliesr(t0) = a and r(t1) = b. Then we simply use the chain rule and the elementaryform of the fundamental theorem of calculus in one dimension to compute∫

P∇φ · dr =

∫ t1

t0

∇φ(r(t)) · r′(t) dt

=∫ t1

t0

d

dtφ(r(t)) dt

= φ(r(t1))− φ(r(t0))= φ(b)− φ(a).

Of course, since the line integral of a gradient over a path depends only onthe initial and terminal points of a path, it as the following property.

274

Corollary 34.2. Let Ω ⊂ Rn. Let

v = ∇φ

where φ : Ω→ R is C1. Then line integrals of v are independent of path.That is, if P1 and P2 are any two paths in Ω with the same initial andterminal points then ∫

P1

v · dr =∫P2

v · dr.

Example 34.3. Let φ(x, y) = x2y and let v(x, y) = ∇φ(x, y) = (2xy, x2). Thetwo trajectories r1(t) = (t3, t), t ∈ [0, 1] and r2(s) = (s, s2), s ∈ [0, 1] bothhave initial point (0, 0) and terminal point (1, 1). Computing the line integralsdirectly gives us ∫

r1

v · dr =∫ 1

0

v(r1(t)) · r′1(t) dt

=∫ 1

0

(2t3t, (t3)2) · (3t2, 1) dt

=∫ 1

0

7t6 dt = 1,

and ∫r2

v · dr =∫ 1

0

v(r1(s)) · r′2(s) ds

=∫ 1

0

(2s s2, s2) · (1, 2s) ds

=∫ 1

0

4s3 ds = 1.

Of course, these agree exactly with the value φ(1, 1)− φ(0, 0) = 1.

We will see further consequences of the fundamental theorem of gradients inChapter 38 on conservative vector fields.

Problems

In the following problems verify the fundamental theorem of gradients for thescalar field φ and the oriented curve C ⊂ R3. That is, compute both sides of thisversion of the fundamental theorem and show that they yield the same result.

Problem 34.1. φ(x, y, z) = x2yz3. C is the line segment connecting (1, 2, 1)to (−2, 4, 0).

275

Problem 34.2. φ(x, y, z) = (x2 + y2 + z2)−1/2. C is the portion of the circleof radius two about the z-axis in the plane z = 1 with y ≥ 0 connecting thepoints (−2, 0, 1) and (2, 0, 1).

Problem 34.3. φ(x, y, z) =√x2 + y2 + z2. C is the path of the trajectory

r(t) = t cosπt (i) + t sinπt j + tk, t ∈ [0, 4].

Problem 34.4. Show that line integrals of the vector field v(x, y) = (y,−x)are not in general independent of path by describing two parameterized curvesfrom (1, 0) to (−1, 0) and for which the line integrals of v along the curvesare different. This shows it is impossible to find a scalar function φ for whichv = ∇φ. Is there another way to show this? That is, can you think of acondition that v must satisfy if v = ∇φ?

276

Chapter 35

Stokes’ Theorem

For our next version of the fundamental theorem we move up one dimension.The fundamental theorem of gradients dealt with one-dimensional sets (ori-ented curves) with zero-dimensional boundary (the initial and terminal points).Stokes’ theorem will concern a two-dimension set (an oriented surface) with aone-dimensional boundary (a finite collection of oriented, simple, closed curves).The theorem is as follows.

Theorem 35.1 (Stokes’ theorem). Let S ⊂ R3 be an oriented surface withunit normal n. Suppose its boundary ∂S is composed of a finite number ofsimple closed curves all oriented in the direction of n. Let v : S → R3 be aC1 vector field defined (at least) on S. Then,∫∫

S(∇× v) · n dS =

∫∂S

v · dr.

Remark 35.2. We will not prove this theorem. (See, e.g. [5, p. 555] for aproof based on Green’s theorem in the plane.) In Remark 35.5 below we showthat Stokes’ theorem is a generalization of Green’s theorem in the plane. InRemark 36.3 in the next chapter we note that divergence theorem gives goodevidence that something like Stokes’ theorem must be true.

Remark 35.3. Stokes’ Theorem is named after Sir George Gabriel Stokes (1819- 1903), though it is believed to have been first proved by William Thomson(Lord Kelvin). The theorem was named after Stokes because he asked for aproof of the result on Cambridge prize examinations. It is indeed unfortunatefor students that putting very hard problems on exams can lead to mathematicalimmortality.

Example 35.4. Let consider the portion K of the sphere of radius one aboutthe origin between the planes z = 0 and z = 1/

√2. See Figure 35.1. Suppose

277

-1

-0.5

0

0.5

1

x-1

-0.5

0

0.5

1

y

0

0.2

0.4

0.6z

-1

-0.5

0

0.5

1

x

Figure 35.1: Portion of the unit sphere with spherical coordinate φ ∈ [π/4, π/2]

we wish to calculate the flux of the curl of the vector field

v(x, y, z) = (y,−x,−z)

through K in the direction of the outward normal to the sphere.We can compute this directly. We first note that ∇×v = −2k. Our natural

inclination is to use spherical coordinates to parameterize the unit sphere. Thatis, we use the mapping g(θ, φ) : (0, 2π)× (π/4, π/2) given by

g(θ, φ) =

x(θ, φ)y(θ, φ)z(φ)

=

cos θ sinφsin θ sinφ

cosφ

.

However, the normal induced by this parameterization is

n(θ, φ) = − sinφ(cos θ sinφi + sin θ sinφj + cosφk,

and this points inward. Thus we get∫K

(∇× v) · n dS = (−1)∫∫

D

(−2k) · b(θ, φ) dA(θ, φ)

= −2∫ 2π

0

∫ π/2

π/4

sinφ cosφ dφ dθ = −π.

Of course, Stokes’ theorem implies that we will get the same result by com-puting the line integral of v over the boundary. Since the boundary is composedof two curves, we will have to compute two integrals in the correct orientationto get the result.

The two boundary curves will be parameterized as follows.

278

1. The first boundary curve we consider circle of radius one in the xy-plane(z = 0). We call this the bottom circle. If the orientation of the curveis to agree with the outward normal of the surface, we must traverse thecurve counter-clockwise (looking down on the xy-plane). We choose thetrajectory

r1(t) = (cos t, sin t, 0), t ∈ [0, 2π].

2. The other boundary curve is the circle of radius 1/√

2 in the plane z =1/√

2. We refer to this as the top circle. If the orientation of the curveis to agree with the outward normal of the surface, we must traverse thecurve clockwise (looking down on the plane). We choose the trajectory

r2(s) =(

1√2

cos s,− 1√2

sin s,1√2

), s ∈ [0, 2π].

Integrating over the bottom circle gives us∫r1

v · dr =∫ 2π

0

v(r1(t)) · r′1(t) dt

=∫ 2π

0

(sin t,− cos t, 0) · (− sin t, cos t, 0) dt

=∫ 2π

0

− sin2 t− cos2 t = −2π.

Integrating over the top curve gives∫r2

v · dr =∫ 2π

0

v(r2(s)) · r′2(s) ds

=∫ 2π

0

12

(− sin 2,− cos s,−1) · (− sin s,− cos s, 0) ds

=∫ 2π

0

12

(sin2 2 + cos2 s) = π.

This give a total of −π for the line integral of v over the entire boundary – thesame as the flux of the curl computed above.

Remark 35.5. Let Ω ⊂ R2 be a regular region and let

S = (x, y, 0) | (x, y) ∈ Ω ⊂ R3.

Note that the normal induced by the implied (trivial) parameterization is theunit vector n = k.

Let P and Q be smooth functions defined on Ω. Then we can define thevector field

v(x, y, z) = (P (x, y), Q(x, y), 0)

279

for (x, y, z) ∈ S. The curl of this vector field is

∇× v =(∂Q

∂x− ∂P

∂y

)k.

Thus, ∫S

(∇× v) · n dS =∫∫

Ω

(∂Q

∂x− ∂P

∂y

)dA.

Furthermore, ∫∂S

v · dr =∫∂Ω

P dx+Q dy.

Thus, Green’s theorem in the plane is a special case of Stokes’ theorem.

Problems

In the following problems verify Stokes’ theorem for the vector field v and theoriented surface S ⊂ R3. That is, compute both sides of this version of thefundamental theorem and show that they yield the same result.

Problem 35.1. v(x, y, z) = y2i + xj. S is the portion of the cone z =1 −

√x2 + y2 between z = 0 and z = 1. The surface is oriented so that the k

component of its normal is positive.

Problem 35.2. v(x, y, z) = xi+ zj−yk . S is the half of the sphere of radiuswith y > 0. The surface is oriented so that the y component of its normal isnegative.

Problem 35.3. v(x, y, z) = 2yi + 2xj + 2zk. S is the portion of the cylinderx2 + y2 = 9 between z = 0 and z = x + 9. The surface is oriented so that thenormal points to the exterior of the cylinder.

Problem 35.4. v(x, y, z) = 2xi+2yj+z3k. S is the portion of the paraboloidz = x2 + y2 inside the sphere x2 + y2 + z2 = 20. The surface is oriented so thatthe k component of its normal is positive.

Problem 35.5. v(x, y, z) = yi + zj + xk. S is the portion of the paraboloidy = 9− x2 − z2 with y > 0. The surface is oriented so that the j component ofits normal is positive.

In the following problems compute∫∫S

(∇× v) · n dS,

the value of the the surface flux integral of the curl of the vector field v throughthe given oriented surface S. Use any method you wish.

280

Problem 35.6. v(x, y, z) = (x+z)i+(y−x)j+(x+y+z)k. S is the trianglewith vertices (1, 0, 0), (0, 1, 0), and (0, 0, 1) oriented so that the normal has apositive k component.

Problem 35.7. v(x, y, z) = zi+xj+yk. S is the hemisphere x2 +y2 +z2 = 9,z > 0, oriented so that the normal points to the interior of the sphere.

Problem 35.8. v(x, y, z) = zi + yj + xk. S is the portion of the conez =

√x2 + y2 between z = 0 and z = 4. The surface is oriented so that the k

component of its normal is negative.

281

Chapter 36

The Divergence Theorem

Our final version of the fundamental theorem is known as the divergence theo-rem.1 Like the fundamental theorem of gradients (and unlike Stokes’ theorem)this can be stated in Rn.

Theorem 36.1 (Divergence theorem). Let Ω ⊂ Rn be a regular region whoseboundary ∂Ω is an orientable (n− 1)-dimensional surface with unit outwardnormal n. Suppose further that Ω is the union of a finite collection of simpleregions. Let v : Ω → Rn be a C1 vector field defined at least on Ω and itsboundary. Then ∫

Ω

∇ · v dV =∫∂Ω

v · n dS.

Proof. We begin by proving the result for a simple region in R3. The general-ization to a simple region in Rn is very easy - really a matter of notation morethan any thing else. We will discuss the extension of the proof of regions thatare not simple at the end.

Since Ω ⊂ R3 is simple, there is a domain Ω′ ⊂ R2 and functions x1, x2 :Ω′ → R such that

Ω = (x, y, z) ∈ R3 | (y, z) ∈ Ω′, x1(y, z) < x < x2(y, z).

The boundary of Ω is composed of three pieces. The “left” surface is

S1 = (x1(y, z), y, z) ∈ R3 | (y, z) ∈ Ω′.1This is an important result and was discovered independently by many mathematicians

before if became well known. Various incarnations of the theorem are known as Gauss’stheorem, Green’s theorem, and Ostrogradsky’s theorem.

282

The normal induced by this parameterization is

n1(y, z) =

∣∣∣∣∣∣i j k∂x1∂y 1 0∂x1∂z 0 1

∣∣∣∣∣∣= i− ∂x1

∂yj− ∂x1

∂zk.

This normal points inward towards Ω. The “right” surface is

S2 = (x2(y, z), y, z) ∈ R3 | (y, z) ∈ Ω′.

The normal induced by this parameterization is

n2(y, z) =

∣∣∣∣∣∣i j k∂x2∂y 1 0∂x2∂z 0 1

∣∣∣∣∣∣= i− ∂x2

∂yj− ∂x2

∂zk.

This normal points outward from Ω. The third portion of the boundary is aportion of the cylinder

S3 = (x, y, z) ∈ R3 | (y, z) ∈ ∂Ω′, x1(y, z) < x < x2(y, z).

We won’t compute the normal of the surface explicitly other than to note thatit is the normal to ∂Ω′ and hence is parallel to the yz-plane.

We now show that ∫∫∫Ω

∂v1

∂xdV =

∫∫∂Ω

v1i · n dS,

wherev(x, y, z) = v1(x, y, z)i + v2(x, y, z)j + v3(x, y, z)k.

The left side of this can be computed using Fubini’s theorem.∫∫∫Ω

∂v1

∂xdV =

∫∫Ω′

∫ x2(y,z)

x1(y,z)

∂v1

∂xdx dA(y, z)

=∫∫

Ω′v1(x2(y, z), y, z)− v1(x1(y, z), y, z) dA(y, z).

The surface flux integral on the right side can be broken up into three pieces.

1. For the integral over S1 we note that since we are computing the surfaceflux in the direction of the outward normal and n1 points inward we have∫∫

S1v1i · n dS = −

∫∫Ω′v1(x1(y, z), y, z)i · n1(y, z) dA(y, z)

= −∫∫

Ω′v1(x1(y, z), y, z) dA(y, z).

283

2. Similarly, for the integral over S2 we note that n2 points outward. Thus,we have∫∫

S2v1i · n dS =

∫∫Ω′v1(x2(y, z), y, z)i · n2(y, z) dA(y, z)

=∫∫

Ω′v1(x2(y, z), y, z) dA(y, z).

3. On the surface S3, vii · n = 0 since n is parallel to the yz-plane. Thus∫∫S3v1i · n dS = 0.

Putting these together gives us∫∫∂Ω

v1i · n dS =∫∫

Ω′v1(x2(y, z), y, z)− v1(x1(y, z), y, z) dA(y, z).

This completes the proof of the claim.A very similar proof shows that∫∫∫

Ω

∂v2

∂ydV =

∫∫∂Ω

v2j · n dS,

and ∫∫∫Ω

∂v3

∂zdV =

∫∫∂Ω

v3i · n dS.

Putting these together yields the main result∫Ω

∇ · v dV =∫∫∫

Ω

∂v1

∂x+∂v2

∂y+∂v3

∂zdV =

∫∂Ω

v · n dS.

For domains that are not simple we have assumed that they can be di-vided into a finite collection of simple regions. Similarly to the proof of Green’stheorem in the plane, the surface flux integrals through any “new” surfaces in-troduced by dividing the region into simple subregions will cancel. This is truesince any new surface will have a subregion on both sides of the surface. (Thisis pretty easy to visualize in R3 - not so easy in Rn.) Since the outward fluxfrom the region on one side will be the negative of the outward flux from theother side, the net contribution from the new surface will be zero.

Example 36.2. In Example 31.33 we computed the flux of the vector field

u(x, y, z) = yi + xj + zk

through the closed unit sphere S in the direction of the unit outward normal tobe ∫∫

Su · n dS =

4π3.

284

According to the divergence theorem this should be equal to∫∫∫B

∇ · u dV

where B = (x, y, z) | x2 +y2 +z2 ≤ 1 is the unit ball. But since ∇·u = 1, theintegral above is just the volume of the unit ball in R3 which is 4π

3 , the valuewe computed for the flux.

Remark 36.3. The divergence theorem can be used to give a bit of insight intoStokes’ theorem. Let v be a smooth vector field on R3 Let Ω ⊂ R3 be a simpleregion and let S = ∂Ω be the closed surface bounding it. In Theorem 20.8 weshowed that for any smooth vector field ∇ · (∇× v) = div curl v = 0. Thus, bythe divergence theorem we have∫∫

S(∇× v) · n dS =

∫Ω

∇ · (∇× v) dV = 0.

Now, let C ⊂ R3 be an oriented, simple, closed curve and let S1 and S2 betwo nonintersecting surfaces that each has C as its boundary. Let n1 and n2

be the respective unit normals of the two surface oriented so that both normalsagree with the orientation of C. Let Ω be the volume bounded by the twosurfaces. Since the orientations of two surface patches with the same boundarymust be opposite we see that one of the two normals must point inward and onemust point outward from Ω. From the computation above, we have∫∫

S1(∇× v) · n1 dS −

∫∫S2

(∇× v) · n2 dS = 0.

Thus, we have shown the quantity∫∫S

(∇× v) · n dS

depends only on the vector field v and the oriented curve C bounding S – noton any other aspects of the shape of S. Of course, Stokes’ theorem tells us agood deal more. Namely, that the quantity is equal to∫

Cv · dr

where C is oriented in the direction or the normal n. But at least the divergencetheorem gives us a hint that something very much like this must be true.

Remark 36.4. We will see that Green’s theorem in the plane is a special caseof the divergence theorem as well as Stokes’ theorem. Let Ω ⊂ R2 be a regularregion and let P and Q be smooth functions defined on Ω. Then we define thevector field

v(x, y) = (Q(x, y),−P (x, y))

285

for (x, y) ∈ Ω. The divergence of this two-dimensional vector field is, of course

∇ · v =∂Q

∂x− ∂P

∂y.

Thus, ∫∫Ω

∇ · v dA =∫∫

Ω

(∂Q

∂x− ∂P

∂y

)dA.

To compute the flux of the vector field through the boundary of Ω we param-eterize the boundary using the function r(t) = (x(t), y(t)) defined on the intervalt ∈ [a, b]. (The boundary may in general be composed of multiple segments.)The normal induced by this parameterization is

n =∣∣∣∣ i jx′(t) y′(t)

∣∣∣∣ = y′i− x′j.

We compute ∫∂Ω

v · n ds =∫ b

a

(Q,−P ) · (y′,−x′) dt

=∫ b

a

P x′(t) +Q y′(t) dt

=∫∂Ω

P dx+Q dy.

So Green’s theorem arises as a special case.

Problems

In the following problems verify the divergence theorem for the vector field vand the region Ω ⊂ R3. That is, compute both sides of this version of thefundamental theorem and show that they yield the same result.

Problem 36.1. v(x, y, z) = x3i + z3k. Ω is the ball of radius one about theorigin.

Problem 36.2. v(x, y, z) = xi + yj + zk. Ω is the tetrahedron with corners(0, 0, 0), (1, 0, 0), (0, 1, 0), and (0, 0, 1).

Problem 36.3. v(x, y, z) = zi + zj. Ω = (x, y, z) |√x2 + y2 < z < 4..

Problem 36.4. v(x, y, z) = x2i+yj+zk. Ω = (x, y, z) | 1+x2+y2 < z < 5..

Problem 36.5. v(x, y, z) = xi + y2j + z2k. Ω is the solid bounded by thecylinder x2 + y2 = 1 and the planes z = −1 and z = 2.

286

In the following problems compute∫∫∂Ω

v · n dS,

the value of the the surface flux integral of the vector field v through the bound-ary of a region Ω in the direction of the exterior unit normal. Use any methodyou wish, but explain how you got your result.

Problem 36.6. v = (3x+ z3 + y2)i + 17j + (z+ cos(y))k. Ω = (x, y, z) | 0 <x < 1, −1 < y < 2, 5 < z < 7.

Problem 36.7. v = z3i + 3xj + eyk. Ω is the ellipsoid x2 + 4y2 + 9z2 = 1.

Problem 36.8. v(x, y, z) = (x2 +cos y)i+(y−ez)j+7k. Ω is the tetrahedronwith corners (0, 0, 0), (1, 0, 0), (0, 1, 0), and (0, 0, 1).

Problem 36.9. Let Ω ⊂ R3 be a regular region with smooth boundary. Sup-pose that u : R3 → R is smooth. We define the normal derivative of u at(x, y, z) ∈ ∂Ω to be

∂u

∂η(x, y, z) = ∇u(x, y, z) · n(x, y, z)

where n is the unit outward normal to ∂Ω. That is, ∂u∂η is the directional

derivative of u in the direction of the outward normal.Suppose

∆u = 0, in Ω.

Show that ∫∫∂Ω

∂u

∂ηdS = 0.

287

Chapter 37

Integration by Parts

Integration by parts is arguably the most important integration technique inelementary calculus. One of the reasons it is important is that it is derived fromone of the most important differentiation rules: the product rule1. Let’s reviewits statement and proof.

Theorem 37.1 (Integration by parts). Let u : [a, b]→ R and v : [a, b]→ Rbe C1. Then∫ b

a

u(x)v′(x) dx = −∫ b

a

v(x)u′(x) dx+ u(x)v(x)|ba .

This is often written in indefinite integral form as∫u dv = −

∫v du+ uv.

Proof. The product rule states that

d

dxu(x)v(x) = u(x)v′(x) + v(x)u′(x).

We can integrate both sides and use the fundamental theorem of calculus to get∫ b

a

(u(x)v′(x) + v(x)u′(x)) dx =∫ b

a

d

dxu(x)v(x) dx

= u(x)v(x)|ba .

We rearrange this equation to get the result.1Note that the other candidate for “most important elementary integration technique” is

“integration by substitution” which is derived from the chain rule. The pairing of integrationby parts with the product rule and integration by substitution with the chain rule is importantto developing a “big picture” understanding of elementary calculus.

288

So the derivation of the integration by parts formula involved only two in-gredients other than simple algebra:

1. the product rule and

2. the fundamental theorem of calculus.

We can use exactly the same process to derive a whole variety of integrationby parts formulas for vector calculus. We simply need to match the correcttype of product rule with the appropriate version of the fundamental theoremof calculus.

• A product rule for the gradient can be paired with the fundamental the-orem of gradients.

• A product rule for the curl can be paired with Stokes’ Theorem.

• A product rule for the divergence can be paired with the divergence the-orem.

We will give one example of this and leave other instances to the problems.Recall that Theorem 20.2 says that if Ω ⊂ Rn is the domain of the C1 functionsg : Ω→ R and f : Ω→ Rn then we have

∇ · (gf) = g∇ · f +∇g · f .

We pair this product rule for the divergence with the divergence theorem. Weintegrate both sides of this over Ω and get∫

Ω

(g∇ · f +∇g · f) dV =∫

Ω

∇ · (gf) dV

=∫∂Ω

gf · n dS

where n is the unit outward normal to ∂Ω. This can be rearranged to give thefollowing theorem.

Theorem 37.2. Let Ω ⊂ Rn be a regular region and let n be the unit outwardnormal to ∂Ω. Suppose g : Ω→ R and f : Ω→ Rn are C1 functions. Then∫

Ω

g∇ · f dV = −∫

Ω

∇g · f dV +∫∂Ω

gf · n dS.

Problems

289

Problem 37.1. Let f and g be C1 functions on Rn. Let P be a path in Rnwith initial point a and terminal point b. Find an integration by parts formulafor ∫

Pf∇g · dr.

Problem 37.2. Let Ω ⊂ R3 be a regular region, and let n be the unit outwardnormal to its boundary ∂Ω. Let f : Ω→ R be C1 and g : Ω→ R be C2.(a) Show that

∇ · (f∇g) = ∇f · ∇g + f∆g.

(b) Show that ∫Ω

f∆g dV = −∫

Ω

∇f · ∇g dV +∫∂Ω

f∇g · n dS.

Problem 37.3. Let S ⊂ R3 be an oriented surface with normal n. Supposethe boundary of S is a simple, closed curve C oriented in the direction of n.Suppose f : S → R and g : S → R are smooth functions, show that∫

C(f∇g) · dr =

∫S

(∇f ×∇g) · ndS.

Hint: What is the product rule for ∇× (fv)?

Problem 37.4. As in Problem 36.9 let Ω ⊂ R3 be a regular region with smoothboundary. Suppose that u : R3 → R is smooth and

∆u = 0, in Ω.

Show that ∫∫∂Ω

∂(u)2

∂ηdS ≥ 0.

Show that if u is not a constant, then the inequality is strict.

290

Chapter 38

Conservative vector fields

We call a vector field v : R3 → R3 conservative if it can be written as thegradient of a scalar field. That is, if there exists a function φ : R3 → R calledthe potential of the vector field v such that

v = ∇φ.

Perhaps the most important example of a conservative vector field is the grav-itational force. If we take a Cartesian coordinate system with origin at thecenter of the earth, the force on an object of mass m at a point (x, y, z) abovethe surface of the earth is given by the vector field

f(x, y, z) = − GmM

(x2 + y2 + z2)3/2

xyz

.

Here G is a constant depending on the units of mass and length called thegravitational constant, and M is the mass of the earth. This gravitational forcefield is the gradient of the gravitational potential

V (x, y, z) = − GmM√x2 + y2 + z2

.

Due in no small part to the importance of this example, mathematicianshave studied conservative fields extensively. The following theorem shows thatwe have several characterizations of conservative fields.

291

Theorem 38.1. Let Ω ⊂ R3 be an open region such that the following hold.

• Ω is path connected. That is, every two points in Ω can be connectedby a simple path contained in Ω.

• For every simple, oriented, closed curve C ⊂ Ω one can construct anoriented surface S with S ⊂ Ω and ∂S = C.

Suppose v : Ω→ R3 is C1. Then the following conditions are equivalent. (Ifone is true then all are true. If one is false then all are false.)

1. For any simple, oriented, closed curve C∫C

v · dr = 0.

2. Line integrals of v are independent of path. That is, if P1 and P2 areany two paths in Ω with the same initial and terminal points then∫

P1

v · dr =∫P2

v · dr.

3. There exists a scalar function φ : Ω→ R such that

v = ∇φ.

4. At each point in Ω∇× v = 0.

Proof. This proof proceeds in a simple cycle of implications. Other ways toshow the chain of implications are indeed possible.

• 1 implies 2.

We prove this only for a special case. Let P1 and P2 are any two paths inΩ with the same initial and terminal points. We form a closed path P byjoining P1 to the reverse of P2. We then use Theorem 30.15 to get∫

Pv · dr =

∫P1

v · dr +∫−P2

v · dr =∫P1

v · dr−∫P2

v · dr.

Now, in the special case where the closed path P is simple, Item 1. impliesthis integral is zero, and we are done. Of course, there is no reason tosuppose that P1 and P2 intersect only at their initial and terminal points.In the general case, one has to prove that the integral over P can bewritten as the sum of integrals over simple closed curves and integralsover overlapping curves that cancel. This can be done, but we won’t

292

attempt it here. Try to draw a few cases to see how complicated this canget. Even the case where there is a finite number of intersections can bechallenging.

• 2 implies 3.

Here we have to show the existence of a function φ. Of course, the bestway to show that a function exists is to write a formula for it. We assumewithout loss of generality that the origin lies in Ω. Then for any (x, y, z) ∈Ω we define

φ(x, y, z) =∫P

v · dr

where P is any path with initial point (0, 0, 0) and terminal point (x, y, z).Since by hypothesis v is path independent, this definition of φ is unam-biguous.We now need to show that ∇φ = v. We will show that

∂φ

∂x(x, y, z) = v1(x, y, z) = v(x, y, z) · i.

We leave the proof of the other two partial derivative to the reader. Todo this we construct a path from (0, 0, 0) to (x, y, z) composed of the linessegments: P1 from (0, 0, 0) to (0, y, z) and P2 from (0, y, z) to (x, y, z).These can be parameterized as follows.

r1(t) = (0, ty, tz), t ∈ [0, 1]r2(t) = (t, y, z), t ∈ [0, x]

Using this path we calculate

φ(x, y, z) =∫P1

v · dr +∫P2

v · dr

=∫ 1

0

v(0, ty, tz) · r′1(t) dt+∫ x

0

v(t, y, z) · r′2(t) dt

=∫ 1

0

yv2(0, ty, tz) + zv3(0, ty, tz) dt+∫ x

0

v1(t, y, z) dt

Since the first integral does not depend on x, the elementary fundamentaltheorem of calculus in one dimension gives us

∂φ

∂x(x, y, z) = v1(x, y, z).

Now, one might object that in general the path we have specified mightnot lie in Ω. However, since Ω is open there is alway some δ > 0 so thatthe line segment

r(t) = (t, y, z), t ∈ [x− δ, x]

lies completely in Ω. If we take P1 to be any path connecting the originto any point (x1, y, z) on that segment and P2 to be the line segmentconnection this point to (x, y, z), the proof follows as before.

293

• 3 implies 4.

This was proven in Theorem 20.8 and Problem 20.4.

• 4 implies 1.

By hypothesis, there exists oriented surface S with S ⊂ Ω and ∂S = C.Using Stokes’ theorem we have∫

Cv · dr =

∫S

(∇× v) · n dS = 0

since ∇× v = 0.

Remark 38.2. The hypothesis that for any simple closed curve in Ω we canconstruct an oriented surface that has the curve as its boundary is not simplytechnical. There are very simple domains for which this is not true. For instance,if our domain is a torus, any path the goes around the central “hole” can’t bespanned by a surface that stays in the domain. Since toroidal domains areimportant in many applications of electromagnetism, it is important to notethat our theorem does not hold in such regions.

Example 38.3. Consider the vector field

v(x, y, z) = (y2 − ez)i + 2xyj + (3z2 − xez)k.

We wish to find out if it is conservative and, if so, find a potential for it. Wetest it by taking its curl.

∇× v =

∣∣∣∣∣∣i j k∂x ∂y ∂z

(y2 − ez) 2xy (3x2 − xez)

∣∣∣∣∣∣=

(∂

∂y(2x2 − xez)− ∂

∂z(2xy)

)i−(∂

∂x(2x2 − xez)− ∂

∂z(y2 − ez)

)j

+(∂

∂x(2xy)− ∂

∂y(y2 − ez)

)k

= 0.

Thus, v is a conservative field.To find a potential φ for v we must solve the partial differential equations

∂φ

∂x= y2 − ez,

∂φ

∂y= 2xy,

∂φ

∂z= 3z2 − xez.

294

Solving the first of these we get

φ(x, y, z) = xy2 − xez + ψ(y, z),

Where ψ is an unknown function of only the variables y and z. Putting thisinto the second equation gives us

∂φ

∂y= 2xy +

∂ψ

∂y= 2xy,

or∂ψ

∂y= 0.

So, ψ = f(z), a function of z alone, and φ = xy2−xez + f(z). Putting this intothe final equation gives us

∂φ

∂y= −xez + f ′(z) = 3z2 − xez,

orf ′(z) = 3z2.

This gives us f(z) = z3 + C where C is an arbitrary constant. Thus, the set ofall possible potentials for v is given by

φ(x, y, z) = xy2 − xez + z3 + C.

This can be very useful in computing line integrals. For instance, considerthe trajectory

r(t) =√

2 sinπt i +√

2 cos(πt) j + ln(1 + t) k, t ∈ [0, 5/4].

Suppose we wish to compute ∫P

v · dr

where P is the path defined by the trajectory. A direct calculation of the lineintegral would be tedious. However, if we note that the initial and terminalpoints of the trajectory are (0,

√2, 0) and (−1,−1, ln(9/4)) we get∫

Pv · dr = φ(−1,−1, ln(9/4))− φ(0,

√2, 0) = 5/4 + ln(9/4)3.

Problems

In the following problems determine whether the vector field v is conservative.If so, find a scalar potential φ such that v = ∇φ.

Problem 38.1. v(x, y, z) = 14x

2y2z4i + 16x

3yz4j + 13x

3y2z3k.

295

Problem 38.2. v(x, y, z) = x2y2zi + xy2z2j + x2yz2k.

Problem 38.3. v(x, y, z) = x2i + ey cos zj− ey sin zk.

Problem 38.4. v(x, y) = cosx sin yi + sin y cosxj.

Problem 38.5. v(x, y) = −y2 sinxyi + (cosxy − xy sinxy)j.

In the following problems compute the line integral∫C v · dr of the vector field

v over the oriented curve C. Use any method you wish. Justify your answer.

Problem 38.6. v(x, y, z) = x3i + y3j + z3k. C is generated by the trajectoryr(t) = (2 cosπt, 3 sinπt, t3), t ∈ [0, 3].

Problem 38.7. v(x, y, z) = yi + xj + z7. C is the line segment from (3,−2, 1)to (5, 6,−1).

Problem 38.8. v(x, y, z) = sinxi + y3yj+ ezk. C is circle of radius one in theyz-plane about the x-axis oriented counter-clockwise with initial and terminalpoints at (0, 0, 1).

296

Bibliography

[1] Stephen Abbott. Understanding Analysis, Springer, New York, 2001.

[2] Jimmy T. Arnold, Introduction to Mathematical Proofs.

[3] R. Creighton Buck. Advanced Calculus, Third Edition, McGraw-Hill, NewYork, 1978.

[4] Gerald A. Edgar. Measure, Topology and Fractal Geometry, Springer, NewYork, 1990.

[5] Patrick M. Fitzpatrick. Advance Calculus. Second Edition, ThompsonBrooks/Cole, Belmont, CA, 2006.

[6] Lawrence C. Evans and Ronald F. Gariepy. Measure theory and fine prop-erties of functions, CRC Press, Boca Raton, 1992.

[7] Stephen H. Friedberg, Arnold J. Insel, and Lawrence E. Spence.LinearAlgebra, 4th Edition, Prentise Hall, 2002.

[8] Phillip E. Gill, Walter Murray, Margaret H. Wright. Practical Optimiza-tion, Academic Press, 1982.

[9] Werner Kohler and Lee Johnson. Elementary Differential Equations.Addison–Wesley Co., Inc., Boston, 2003.

[10] Angus E. Taylor. Advanced Calculus. Blaisdell Publishing Co., Waltham,Massachusetts, 1955.

297

Index

algebraic multiplicity, 144angle between vectors, 14arclength, 97, 229augmented matrix, 29

bac-cab rule, 53ball, open, 61ball, volume of, 226boundary point, 61boundary, oriented, 247bounded, function, 68bounded, set, 61

Cartesian axes, 7Cauchy-Schwartz inequality, 10chain of dependence, 106chain rule, mappings, 118chain rule, partial derivatives, 105change of order of integration, 202change of variables, 211characteristic polynomial, 143closed set, 61closure, 61cofactor, 41component, 6conservative vector field, 285constraint, 170continuity, 65coordinates, 7coordinates, cylindrical, 88coordinates, polar, 86coordinates, right-handed, 51coordinates, spherical, 90countable, 225cover, countable, 226critical point, 163cross product, 47

cross section, 77curl, 129curl, cylindrical coordinates, 133curl, polar coordinates, 133curl, spherical coordinates, 134curve, 71cycle, 73cyclic, trajectory, 72cycloid, 75cylindrical coordinates, 88

derivative, normal, 281derivative, partial, 101determinant, 2× 2, 38diagonal matrix, 22diameter, set, 226differentiable, mapping, 114differentiable, trajectory, 96directional derivative, 124distance, 9divergence, 127divergence theorem, 276divergence, cylindrical coordinates, 133divergence, polar coordinates, 133divergence, spherical coordinates, 134domain, 60

eigenspace, 142eigenvalue, 140eigenvector, 140elementary row operations, 29expansion by cofactors, 41extrema, 160

field, vector, 81fields, scalar, 76first derivative test, 162

298

flow line, 98Fubini’s theorem, 198fudge factor, 212function, 60function, continuous, 65fundamental theorem, curl, 271fundamental theorem, divergence, 276fundamental theorem, gradients, 268

gamma function, 226Gauss’s theorem, 276Gaussian elimination, 29gradient, 123gradient, cylindrical coordinates, 133gradient, polar coordinates, 133gradient, spherical coordinates, 134Gradients, fundamental theorem of, 268graph, 60Green’s theorem in the plane, 260grid, uniform, 189

Hausdorff dimension, 227Hausdorff measure, 227Hessian matrix, 154homogeneous systems, 28

identity matrix, 23implicit function theorem, 181implicitly defined function, 111independent of path, 269, 286infimum, 225initial point, 71injective, 61integrals, iterated, 198integration by parts, 282, 283interior point, 61inverse function theorem, 177inversions, 39invertible, function, 61invertible, matrix, 23

Jacobian, 176, 214

Kronecker delta, 8

Lagrange multiplier, 169Laplacian, 138

law of cosines, 17level curve, 77level set, 77limit, of function, 62line integral, 233line, parametric representation, 56line, solution set of linear system, 57linear approximation, 114linear combination, 7linear systems, 27linearity, differential operators, 136Lord Kelvin, 271

matrix, 19matrix, addition, 20matrix, block, 179matrix, multiplication, 21matrix, partitioned, 179matrix, skew, 24matrix, symmetric, 24maximizer, 160maximum, 160mean value theorem, 197measure, 189metric, 9minimizer, 160minimum, 160minor, 41multi-index, 156multiplicity, geometric, 142

negative definite, 149normal induced by parametrization, 241normal, to plane, 56normalized, 9

objective function, 170one-to-one, 61onto, 61open, ball, 61oriented boundary, 247oriented surface, 245oriented, curve, 74orthogonal, 15, 25orthogonal projection, 15orthonormal, 15

299

Ostrogradsky’s theorem, 276

parallel, 7parallelepiped, volume of, 52parallelogram, area of, 50partial derivative, scalar field, 101partial derivative, vector field, 113partition, uniform, 186path, 73path connected set, 286path equivalent, 73path independent, 269, 286path integral, 231path line, 98permutation, 38permutation symbol, 39plane, parametric representation, 57plane, solution set of linear equation,

56polar coordinates, 86positive definite, 149positive orientation, 239potential, scalar, 285product rule, 136product, dot, 9product, inner, 9product, scalar, 6

quadratic approximation, 154

range, 60regular parametrization, 241, 254regular region, 239reverse of path, 74Riemann integrable, 194Riemann sum, 186, 193Riemann volume, 190right-hand rule, 51row echelon form, 31row operations, 29

saddle point, 164sample point, 186, 193scalar multiplication, 6second derivative rules, 137second derivative test, 164

section of graph, 77set, closed, 61set, open, 61simple region, 200simple, curve, 73simple, cycle, 73simple, path, 73simple, trajectory, 72skew matrix, 24speed, 96spherical coordinates, 90standard basis, 8Stokes’ theorem, 271supremum, 225surface area, 248, 254surface flux integral, 251, 255surface integral, 250, 255surface patches, 245surface, parameterized, 241, 254surface, regular, 241, 254surfaces, smoothness, 183surjective, 61symmetric matrix, 24

Taylor polynomial, 157Taylor’s theorem, 155Taylor’s theorem, second order, 154terminal point, 71total derivative matrix, 114trajectory, 71transpose, 24triangle inequality, 11trivial solution, 28

vector, 6vector addition, 6vector triple product, 51velocity, 96

William Thompson, 271

300

the calculus of several variables - nagoya universityrichard/teaching/s2016/ref2.pdf · ii di...

Documents