differentiation in several variables

8: DIFFERENTIATION IN SEVERAL VARIABLES

STEVEN HEILMAN

Contents

1. Review 12. Introduction 23. Differentiation in multiple variables 24. Partial and Directional Derivatives 35. The Chain Rule in Several Variables 76. Iterated Derivatives and Clairauts Theorem 87. Appendix: Notation 10

1. Review

Definition 1.1 (Derivative on the real line). Let E be a subset of R, and let x0 be alimit point of E, and let f : E R. If the limit

limxx0;xEr{x0}

f(x) f(x0)x x0 .

exists and converges to a real number L R, then we write f (x0) = L and we say that f isdifferentiable at x0. If this limit does not exist, then we say that f is not differentiableat x0.

Lemma 1.2. Let E be a subset of R, let f : E R, let x0 E, and let L R. Then thefollowing two statements are equivalent.

f is differentiable at x0 and f (x0) = L. We have limxx0;xEr{x0} |f(x)(f(x0)+L(xx0))||xx0| = 0.

Definition 1.3. Let n be a positive integer. Let x = (x1, . . . , xn) Rn. We define the `2norm x of x by

x = (x1, . . . , xn) :=(

ni=1

x2i

)1/2.

Let y = (y1, . . . , yn) Rn. We define the standard inner product , on Rn by

x, y :=ni=1

xiyi.

Date: December 22, 2014.

1

So, x = x, x. We also denote the standard basis vectors e1, . . . en so thate1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), en = (0, . . . , 0, 1).

Definition 1.4. Let n,m be positive integers. A linear transformation from Rn to Rmis a function L : Rn Rm which satisfies the following properties.

For all x, y Rn, we have L(x+ y) = L(x) + L(y). For all x Rn and for all R, we have L(x) = L(x).

Remark 1.5. Given a linear transformation L : Rn Rm, there exists an m n matrix A(that is, a matrix A with m rows and n columns) such that

L(x) = Ax, x Rn.Conversely, given a matrix A, the function L : Rn Rm defined by L(x) := Ax for allx Rn, is a linear transformation from Rn to Rm. So, on Euclidean spaces, the notions ofmatrices and linear transformations are interchangeable.

2. Introduction

Our final topic in this course will be differentiation in several variables. Here the theorysomewhat resembles the theory of differentiation in one variable, however there are manykey differences. The first obstacle we need to overcome is to simply define the derivative inthe higher dimensional setting. We therefore begin with this task.

3. Differentiation in multiple variables

Let n,m be positive integers. Let f : Rn Rm. In order to define the derivative of f , wecannot simply copy and paste Definition 1.1, since we would need to let x Rn and thendivide by x, which is meaningless unless n = 1. We instead use the equivalent definitionwithin Lemma 1.2. In this case, we can successfully define differentiation by replacing theabsolute values by the appropriate norm, and by replacing L by a linear map.

Definition 3.1 (Derivatives in multiple variables). Let E be a subset of Rn, let f : E Rm be a function, let x0 E, and let L : Rn Rm be a linear transformation. We say thatf is differentiable at x0 with derivative L if and only if we have

limxx0;xE

f(x) (f(x0) + L(x x0))x x0 = 0.

Example 3.2. Let f : R2 R2 be defined by f(x1, x2) = (x21, x22). Define the linear trans-formation L : R2 R2 by L(x1, x2) := (2x1, 4x2). We will show that L is the derivative of fat the point x0 = (1, 2). We want to show that

limx(1,2);x 6=(1,2)

f(x) (f(1, 2) + L(x (1, 2)))x (1, 2) = 0.

Now, note that

f(x) (f(1, 2) + L(x (1, 2))) = (x21, x22) ((1, 4) + (2x1, 4x2) (2, 8))= (x21, x

22) (2x1 1, 4x2 4)

= ((x1 1)2, (x2 2)2).

2

So, using the triangle inequality,

f(x) (f(1, 2) + L(x (1, 2))) ((x11)2, 0)+(0, (x22)2) = (x11)2 +(x22)2.In conclusion,

0 limx(1,2);x 6=(1,2)

(x1 1)2 + (x2 2)2(x1 1)2 + (x2 2)2

= limx(1,2);x6=(1,2)

(x1 1)2 + (x2 2)2 = 0.

So, we have proven our desired statement.

The following lemma shows that a function can have at most one derivative at an interiorpoint of E.

Lemma 3.3. Let E be a subset of Rn, let f : E Rm be a function, and let x0 be an interiorpoint of E. Let La : Rn Rm and let Lb : Rn Rm be linear transformations. Suppose f isdifferentiable at x0 with derivative La, and f is differentiable at x0 with derivative Lb. ThenLa = Lb.

Exercise 3.4. Prove Lemma 3.3. (Hint: argue by contradiction. Assume that La 6= Lb.Then there exists a nonzero vector v Rn such that Lav 6= Lbv. Then, apply the definitionof the derivative, and try to specialize to the case where x = x0 + tv for some scalar t, inorder to obtain a contradiction.)

Using Lemma 3.3, we can now talk about the derivative of f at interior points x0, and wewill label this derivative as f (x0). That is, if x0 is an interior point of E, then f (x0) is theunique linear transformation from Rn to Rm such that

limxx0;xE

f(x) (f(x0) + f (x0)(x x0))x x0 = 0.

Informally, we therefore have Newtons approximation:

f(x) f(x0) + f (x0)(x x0).Remark 3.5. We sometimes refer to f (x0) as the total derivative of f , to distinguishf (x0) from the related directional and partial derivatives.

4. Partial and Directional Derivatives

We now relate the total derivative to the partial and directional derivatives. Let n,m bepositive integers.

Definition 4.1. Let E be a subset of Rn, let f : E Rm be a function, let x0 be an interiorpoint of E, let v Rn, and let t be a real number. If the limit

limt0;t6=0,x0+tvE

f(x0 + tv) f(x0)t

.

exists, we say that f is differentiable in the direction v at x0, and we denote this limitby Dvf(x0).

Dvf(x0) := limt0;t6=0,x0+tvE

f(x0 + tv) f(x0)t

.

Equivalently, we have

Dvf(x0) :=d

dtf(x0 + tv)|t=0.

3

Note that in this definition we are dividing by the scalar t, so this division is okay, andDvf(x0) Rm.Example 4.2. Let f : R2 R2 be defined by f(x1, x2) = (x21, x22). Let x0 := (1, 2) and letv := (3, 4). We then compute

((1 + 3t)2, (2 + 4t)2) (1, 4)t

=(1 + 6t+ 9t2, 4 + 16t+ 16t2) (1, 4)

t= (6 + 9t, 16 + 16t).

Therefore,Dvf(x0) = lim

t0;t6=0(6 + 9t, 16 + 16t) = (6, 16).

If v is a standard basis vector, then we write fxj

(x0) orxjf(x0) for Dejf(x0). We refer

to fxj

(x0) as the partial derivative of f with respect to xj. So,

f

xj(x0) := lim

t0;t6=0,x0+tejEf(x0 + tej) f(x0)

t=

d

dtf(x0 + tej)|t=0.

Note that if f : E Rm, then fxj Rm. And if we write f in its components as f =

(f1, . . . , fm), thenf

xj(x0) =

(f1xj

(x0), . . . ,f1xj

(x0)

).

The total derivative and directional derivative are related in the following way.

Lemma 4.3. Let E be a subset of Rn, let f : E Rm be a function, let x0 be an interiorpoint of E, and let v Rn. If f is differentiable at x0, then f is also differentiable in thedirection v at x0, and

Dvf(x0) = f(x0)v.

Exercise 4.4. Prove Lemma 4.3.

From Lemma 4.3, total differentiability implies directional differentiability. Unfortunately,the converse is false.

Exercise 4.5. Define f : R2 R by f(x, y) := x3/(x2 + y2) when (x, y) 6= (0, 0), andf(0, 0) := 0. Show that for any v R2, f is differentiable at (0, 0) in the direction v.However, show that f is not differentiable at (0, 0).

Remark 4.6. From Lemma 4.3, if E Rn and if f : E Rm is differentiable at x0 E,then all partial derivatives f

xjexist at x0, for all j {1, . . . , n}, and

f

xj= f (x0)ej, j {1, . . . , n}.

Also, given v = (v1, . . . , vn) =n

j=1 vjej Rn, we have

Dvf(x0) = f(x0)

nj=1

vjej =nj=1

vjf(x0)ej =

nj=1

vjf

xj(x0). ()

From Exercise 4.5, partial differentiability does not imply differentiability. However, ifthe partial derivatives of a function are continuous, then partial differentiability does implydifferentiability. We will use equation () to prove this assertion.

4

Theorem 4.7. Let E be a subset of Rn, let f : E Rm be a function, let F be a subsetof E, and let x0 be an interior point of F . If the partial derivatives

fxj

exist on F and are

continuous at x0 for all j {1, . . . , n}, then f is differentiable at x0. Moreover, f (x0) : Rn Rm is defined by

f (x0)(v1, . . . , vn) =nj=1

vjf

xj(x0).

Proof. Define a linear transformation L : Rn Rm by

L(v1, . . . , vn) :=nj=1

vjf

xj(x0).

We need to show that

limxx0;xEr{x0}

f(x) (f(x0) + L(x x0))x x0 = 0.

Let > 0. We will find > 0 such that, if x satisfies 0 < x x0 < , thenf(x) (f(x0) + L(x x0))

x x0 < .

That is, we will show, if x satisfies 0 < x x0 < , thenf(x) (f(x0) + L(x x0)) < x x0 .

Since x0 is an interior point of F , there exists r > 0 such that B(x0, r) F . Since thepartial derivative f

xjis continuous on F for each j {1, . . . , n}, there exists 0 < j < r

such that fxj

(x) fxj

(x0) < /(nm), for every x B(x0, j), for every j {1, . . . , n}.Define := minj=1,...,n j. Then fxj (x)

fxj

(x0) < /(nm), for every x B(x0, ), forevery j {1, . . . , n}.

Let x B(x0, ), and write x = x0 + v1e1 + + vnen for some scalars v1, . . . , vn. Notethat

x x0 =v21 + + v2n.

In particular, we have |vj| x x0 for all j {1, . . . , n}. Recall that we need to show

f(x0 + v1e1 + + vnen) f(x0)nj=1

vjf

xj(x0) < x x0 .

Write f in its components as f = (f1, . . . , fm), so that fi : E R for all i {1, . . . ,m}.Applying the Mean Value Theorem in the first variable, there exists a real number ti between0 and v1 such that

fi(x0 + v1e1) fi(x0) = fix1

(x0 + tie1)v1.

Note that, for all i {1, . . . ,m}, for all j {1, . . . , n}, we have

| fix1

(x0 + tie1) fix1

(x0)| fx1

(x0 + tie1) fx1

(x0) /(nm).

5

Therefore,

|fi(x0 + v1e1) fi(x0) fix1

(x0)v1| |v1| /(nm).Summing this inequality over i {1, . . . ,m} and using (y1, . . . , ym) |y1|+ + |ym|, wehave

f(x0 + v1e1) f(x0) fx1

(x0)v1 |v1| /n x x0 /n.In the last inequality, we used |v1| x x0.

Using a similar argument, we conclude that

f(x0 + v1e1 + v2e2) f(x0 + v1e1) fx2

(x0)v2 x x0 /n.And so on, until we get

f(x0 + v1e1 + + vnen) f(x0 + v1e1 + + vn1en1) fxn

(x0)vn x x0 /n.Summing these n inequalities and using the triangle inequality x+ y x+ y, we geta telescoping sum which finally gives

f(x0 + v1e1 + + vnen) f(x0)nj=1

vjf

xj(x0) < x x0 .

From Theorem 4.7 and Lemma 4.3, if the partial derivatives of a function f : E Rm

exist and are continuous on a set F , then all directional derivatives of f exist at every interiorpoint x0 of F , and

D(v1,...,vn)f(x0) =nj=1

vjf

xj(x0).

In particular, if f : E R is a real-valued function, and if we define the gradient f(x0)of f at x0 to be the n-dimensional row vector

f(x0) := ( fx1

(x0), . . . ,f

xn(x0)),

then we have the formulaDvf(x0) = f(x0), v.

More generally, if f : E Rm is a function with f = (f1, . . . , fm), and x0 is in the interiorof the region where the partial derivatives of f exist and are continuous, then Theorem 4.7says

f (x0)(v1, . . . , vn) =nj=1

vjf

xj(x0) =

(nj=1

vjfixj

(x0)

)mi=1

.

So, if we define the matrix

Df(x0) =

(fixj

(x0)

)1im1jn

=

f1x1

(x0)f1x2

(x0) f1xn (x0)f2x1

(x0)f2x2

(x0) f2xn (x0)...

.... . .

...fmx1

(x0)fmx2

(x0) fmxn (x0)

,6

then we haveDvf(x0) = f

(x0)v = Df(x0)v.The matrix Df(x0) is sometimes called the derivative or the differential of f at x0. We

still wish to distinguish the matrix Df(x0) from the linear transformation f(x0), since the

latter is defined in a way which does not depend on the chosen basis of Euclidean space.

5. The Chain Rule in Several Variables

Let n,m, p be positive integers. Recall that if f : X Y and g : Y Z are functions,then the composition g f : X Z is defined by g f(x) := g(f(x)), for all x X.Theorem 5.1 (The Chain Rule in Multiple Variables). Let E be a subset of Rn, letF be a subset of Rm, let f : E F be a function, and let g : F Rp. Let x0 be a point inthe interior of E. Assume that f is differentiable at x0 and that f(x0) is in the interior ofF . Assume also that g is differentiable at f(x0). Then g f : E Rp is also differentiableat x0, and

(g f)(x0) = g(f(x0))f (x0).Remark 5.2. We can intuitively think of the chain rule as follows. From Newtons approx-imation, we have

f(x) f(x0) f (x0)(x x0).Also, using Newtons approximation again,

g(f(x)) g(f(x0)) g(f(x0))(f(x) f(x0)).So, combining these two approximations, we have

g(f(x)) g(f(x0)) g(f(x0))f (x0)(x x0).That is, (gf)(x0) = g(f(x0))f (x0). The rigorous version of this proof irons out the detailsinherent in Newtons approximation.

Exercise 5.3.

Let L : Rn Rm be a linear transformation. Show that there exists a real numberM > 0 such that Lx M x, for all x Rn. (Hint: first, using Remark 1.5,write L in terms of a matrix A. Then, set M to be equal to the sum of the absolutevalues of the entries of A. Use the triangle inequality a lot. There are many differentways to do this exercise, some of which use a different value of M . For example,you could try using the Cauchy-Schwarz inequality.) In particular, conclude that anylinear transformation L : Rn Rm is continuous. Let E be a subset of Rn. Assume that f : E Rm is differentiable at an interior

point x0 of E. Then f is also continuous at x0. Prove Theorem 5.1. (Hint: it may be helpful to review the proof of the single variable

chain rule. It is probably easiest to use the sequence definition of a limit.)

Example 5.4. Suppose f : Rn Rm is a differentiable function, and xj : R R aredifferentiable functions for all j {1, . . . , n}. Then

d

dtf(x1(t), . . . , xn(t)) =

nj=1

xj(t)f

xj(x1(t), . . . , xn(t)).

This follows from the chain rule.

7

6. Iterated Derivatives and Clairauts Theorem

We now investigate what happens when we differentiate a function twice, in two differentdirections.

Definition 6.1. Let E be a subset of Rn, and let f : E Rm be a function. We say thatf is continuously differentiable if and only if the partial derivatives f

x1, . . . , f

xnexist

and are continuous on E. We say that f is twice continuously differentiable if and onlyif it is continuously differentiable, and the partial derivatives f

x1, . . . , f

xnare themselves

continuously differentiable.

Continuously differentiable functions are sometimes called C1 functions. Twice contin-uously differentiable functions are sometimes called C2 functions. One can also define C3

functions, C4 functions, etc., but we will not do so here.Let f : R2 R. As you may have learned, it is often true that

x1x2f =

x2x1f .

Unfortunately, this equality does not always hold.

Exercise 6.2. Define f : R2 R by f(x, y) := (x3y)/(x2 + y2) when (x, y) 6= (0, 0), andf(0, 0) := 0. Show that f is continuously differentiable, and the double derivatives

x1x2f

and x2

x1

exist, but these derivatives are not equal at (0, 0).

Thankfully, if f is twice continuously differentiable, then the order of differentiation doesnot matter.

Theorem 6.3 (Clairauts Theorem). Let E be an open subset of Rn, and let f : E Rmbe a twice continuously differentiable function. Then, for all 1 i, j n and for all interiorpoints x0 of E, we have

xi

xjf(x0) =

xj

xif(x0).

Proof. The claim is certainly true for i = j so assume that i 6= j. By replacing f by f(xx0)as necessary, we may assume that x0 = 0.

Define a := xi

xjf(x0) and define a

:= xj

xif(x0). We need to show that a = a

.Let > 0. Since f is twice continuously differentiable, there exists > 0 such that, for all

x with x < 2, we have xi xj f(x) a < , xj xif(x) a

< .Define

M := f(ei + ej) f(ei) f(ej) + f(0).Applying the Fundamental Theorem of Calculus to the ei variable, we have

f(ei + ej) f(ej) = 0

f

xi(xiei + ej)dxi.

And

f(ei) f(0) = 0

f

xi(xiei)dxi.

Therefore,

M =

0

f

xi(xiei + ej) f

xi(xiei)dxi

8

For each xi (0, ), there exists xj [0, ] such that, by the Mean Value Theorem, wehave

f

xi(xiei + ej) f

xi(xiei) =

xj

f

xi(xiei + xjej).

By our choice of (noting that xiei + xjej < 2), we therefore have fxi (xiei + ej) fxi (xiei) a < .

So, integrating this inequality over xi [0, ], we getM 2a < 2.We can run this same argument with the roles of i and j reversed (noting that M is symmetricin i, j) to get M 2a < 2.

So, from the triangle inequality, we conclude that

|a a| < 2.Since this inequality holds for all > 0, we conclude that a = a, as desired.

9

7. Appendix: Notation

Let A,B be sets in a space X. Let m,n be a nonnegative integers.

Z := {. . . ,3,2,1, 0, 1, 2, 3, . . .}, the integersN := {0, 1, 2, 3, 4, 5, . . .}, the natural numbersZ+ := {1, 2, 3, 4, . . .}, the positive integersQ := {m/n : m,n Z, n 6= 0}, the rationalsR denotes the set of real numbersR = R {} {+} denotes the set of extended real numbersC := {x+ y1 : x, y R}, the complex numbers denotes the empty set, the set consisting of zero elements means is an element of. For example, 2 Z is read as 2 is an element of Z. means for all means there existsRn := {(x1, . . . , xn) : xi R, i {1, . . . , n}}

A B means a A, we have a B, so A is contained in BArB := {x A : x / B}

Ac := X r A, the complement of AA B denotes the intersection of A and BA B denotes the union of A and B

Let (X, d) be a metric space, let x0 X, let r > 0 be a real number, and let E be a subsetof X. Let (x1, . . . , xn) be an element of Rn, and let p 1 be a real number.

B(X,d)(x0, r) = B(x0, r) := {x X : d(x, x0) < r}.E denotes the closure of E

int(E) denotes the interior of E

E denotes the boundary of E

(x1, . . . , xn)`p := (ni=1

|xi|p)1/p

(x1, . . . , xn)` := maxi=1,...,n |xi|

10

Let f, g : (X, dX) (Y, dY ) be maps between metric spaces. Let V X, and let W Y .f(V ) := {f(v) Y : v V }.

f1(W ) := {x X : f(x) W}.d(f, g) := sup

xXdY (f(x), g(x)).

B(X;Y ) denotes the set of functions f : X Y that are bounded.C(X;Y ) := {f B(X;Y ) : f is continuous}.

Let f, g : R C be Z-periodic functions.f := sup

x[0,1]|f(x)| .

f, g := ( 10

f(x)g(x)dx)1/2.

f2 :=f, f = (

10

|f(x)|2 dx)1/2

dL2(f, g) := f g2 = ( 10

|f(x) g(x)|2 dx)1/2.

Let n,m be positive integers, let (e1, . . . , en) denote the standard basis of Rn, let E be asubset of Rn, let f : E Rm be a function, let x0 E be an interior point of E, let v Rn,and let j {1, . . . , n}.

f (x0) denotes the total derivative of f.

Dvf(x0) denotes the derivative of f in the direction v.

f

xj(x0) =

xjf(x0) = Dejf(x0).

Let E be a subset of Rn, let f : E R be a function, and let x0 be an interior point of E.

f(x0) = ( fx1

(x0), . . . ,f

xn(x0)).

7.1. Set Theory. Let X, Y be sets, and let f : X Y be a function. The function f : X Y is said to be injective (or one-to-one) if and only if: for every x, x V , if f(x) = f(x),then x = x.

The function f : X Y is said to be surjective (or onto) if and only if: for every y Y ,there exists x X such that f(x) = y.

The function f : X Y is said to be bijective (or a one-to-one correspondence) ifand only if: for every y Y , there exists exactly one x X such that f(x) = y. A functionf : X Y is bijective if and only if it is both injective and surjective.

11

Two sets X, Y are said to have the same cardinality if and only if there exists a bijectionfrom X onto Y .

UCLA Department of Mathematics, Los Angeles, CA 90095-1555E-mail address: [email protected]

12

1. Review2. Introduction3. Differentiation in multiple variables4. Partial and Directional Derivatives5. The Chain Rule in Several Variables6. Iterated Derivatives and Clairaut's Theorem7. Appendix: Notation

differentiation in several variables

Documents